Leveraging AI

251 | OpenAI’s apps + SDK, Image 1.5, GPT-5.2 Codex, and world domination playbook, Anthropic wins the enterprise, The state of agentic implementation based on MIT, Google, EY, and Deloitte, and more important AI news for the week ending on Dec 20, 2025

Isar Meitis Season 1 Episode 251

📢 Want to thrive in 2026?
Join the next AI Business Transformation cohort kicking off January 20th, 2026.
🎯 Practical, not theoretical. Tailored for business professionals. - https://multiplai.ai/ai-course/

Use code: LEVERAGINGAI100 to save $100 on registration

Learn more about Advance Course (Master the Art of End-to-End AI Automation): https://multiplai.ai/advance-course/


Is your business ready for the AI land grab of 2026?

OpenAI, Anthropic, Google, and others are racing to dominate not just AI models, but the full-stack experience — apps, commerce, code, and how businesses function. It’s not just about smarter models anymore — it's about who owns the ecosystem your company will rely on.

This week, host Isar Meitis unpacks the tsunami of AI news and breaks down what really matters for business leaders, including the game-changing launch of AI-native apps inside ChatGPT, new agent infrastructure, OpenAI’s explosive revenue growth, and how every major player is gearing up for 2026.

If you’re a business leader trying to figure out where to place your bets in 2026 — this is the briefing you can’t afford to miss.

In this session, you'll discover:

  • Why OpenAI’s app store inside ChatGPT is a seismic shift in business tech
  • The strict (and smart) rules for launching your own AI app in 2026
  • How apps turn ChatGPT from idea generator to business operator
  • What OpenAI’s new image and code models mean for your workflows
  • The actual adoption rates of AI agents in enterprise (Deloitte vs. Google vs. Menlo VC)
  • Why “multi-agent” isn’t always better — and what MIT + DeepMind just proved
  • OpenAI’s $750B valuation play & its quiet alliance with AWS
  • Claude vs GPT vs Gemini: who’s winning the enterprise trust war
  • Why the real moat isn’t the model… it’s the tools, workflows, and integrations
  • The shift from “efficiency” to “outcome” — and why most companies still don’t get it
  • Real-world examples of how AI agents are saving 40+ minutes per employee


About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Speaker:

Hello and welcome to a Weekend News episode of the Leveraging AI Podcast, the podcast that shares practical, ethical ways to leverage AI to improve efficiency, grow your business, and advance your career. This is Isar Metis, your host, and we have so much to talk about. This week, it seems like in the last two years as well. The end of the year is just accelerating all the releases from all the big labs. So we have many different things to talk about, both from open, ai, Gemini, anthropic, Nvidia, and some other interesting releases as well. So lots to cover from that perspective. There have been several different reports and surveys from leading institutions, including Deloitte, Google, Menlo vc, Ernest and Young, MIT and others, and all of them show interesting aspects of where the agentic world is right now and where is it most likely going in 2026. Some interesting developments in the government and political aspect of ai, so we have a lot to cover. So let's get started. Open AI has been on fire in the past two weeks. following the release of GPT 5.2, there has been a lot of other announcements. And the first and maybe most interesting one from my perspective, which is gonna be the one we're gonna start with, is the full introduction of apps inside of ChatGPT. Now the reality is apps has been introduced a while back, earlier this year, but this is the first time that there is an entire separated apps section inside the OpenAI platform. As well as there is the ability for anyone who wants to submit apps as long as they're following the right guidelines, which we're gonna talk about in a minute. So on the left navigation panel of ChatGPT, you now have an apps section, which is basically an apps directory. Now you can get to these specific apps in several different ways. One is by using the at symbol and it will allow you to choose the name of the app. The second is to go through the apps section and then selecting an app and then clicking to use it inside a chat. And the third option is ChatGPT over time, we will learn which chats do different things and will on its own pick which chats to use similarly to how Claude works with scales, just using third party applications. Again, this is no different than the apps that were released in the original beta only now it's open to any one or any company who wants to submit apps versus just a very short, closed group with some easier ways to navigate and get to that. Now, in parallel to this, OpenAI released a apps SDK, which enables developers to build chat native experiences for ChatGPT and make it available in that new section of the ChatGPT interface. We are going to talk about in a second about the specific guidelines for developers on what to develop and not what to develop. But in general, they said, and I'm quoting the strongest apps are tightly scoped, intuitive in chat and deliver clear value. That is a very short way to basically say, don't give us an app that does anything. Give us. An app that does something very, very specific, does it well and is easy to control through the chat interface, which makes perfect sense. Safety and transparency remain a central component of the submission process. Again, more details about this in a minute. Monetization is currently focused only on physical goods. So you can sell physical goods or promote actually physical goods. Not sell them yet. But you cannot do the same thing with digital goods. And the first applications beyond the ones that are there right now will start being approved and rolling gradually in early 2026. But this is a complete game changer when you start thinking about where does this take the interaction of humans with AI in the next year. so far, ChatGPT and the other models have been a way to develop ideas, create ideas, evaluate data, perform analysis, ideate and create content. And now they can actually, quote unquote, take actions in the real world and have a very dramatic expansion of the capabilities. Because of the introduction of apps, or as OpenAI said, this is just the beginning. We want apps in chat chi to feel like a natural extension of the conversation, helping people move from ideas to action. And if you want to have a great comparison of how profound this is, think about smartphones before the app store. It was a phone with a calculator and a browser, and now you have applications that can do well. Everything in your life between managing your bank account, navigating to different places, creating images, taking pictures, creating videos, like literally anything you want you can do with your phones right now because of apps, not because of the phone itself. And this is exactly the direction that OpenAI is going with pushing apps into the ChatGPT ecosystem. Now to make this more tangible, I want to give you a great example that I tried myself. So I told you several times before that I have a software company that does invoice vouching and reconciliation using agents. And I'm creating a website for it. And I'm gonna record an entire episode about this because the process I've used was very unique and interesting, but I needed icons and I created the entire line of icons using nano banana. But in order to turn them into icons, and I wanted them to be in a vector format, which allows you to scale them to whatever size or minimize them while keeping them crisp and nice. And I was looking for a free and easy way to do this. I found a really cool tool that is called Recraft AI that I played with excessively this week. And I find it to be an amazing tool to manipulate graphics and create different steps and do different cool things as far as taking one graphics and turning it into different formats, removing backgrounds, and so on. And I've used that to create all the icons. But now there's an Adobe Photoshop application inside of ChatGPT. Now, when I created the icons, I created all of them in one single image, meaning I have an image with 12 different squares, each and every one of them with a separate icon. And to do the process in recraft, I had to crop them manually and load them one by one into recraft. But it did an amazing job and I was very happy with it. But what I did now, I said, okay, let's see how smart ChatGPT is in figuring this whole thing out. So I uploaded the entire image with all 12 icons and basically said, I need each and every one of them as a separate icon with a transparent background with only two colors in a vector format, and selected the Adobe Photoshop application and ChatGPT started thinking about what I wanted and actually delivered 12 separate files in vector format, including one zip directory that I can download, but also I could download each and every one of them separately. Now, it did not do. As good as a vectorization job as the process that I did manually in recraft. But the process in recraft took me about 30 minutes, and the process in the Photoshop app took me about five seconds of writing the prompt, and then when I came back, it was all there and ready to go. The biggest difference that the icons that were created in the Photoshop version are more straight lines instead of rounded following the original images, which actually looks really cool. It gives it a very unique kind of look. And I may actually replace the icons with these icons, but from a simple process of which icons more resemble or look exactly like the originals, the process that I did manually in recraft was a better process, maybe with better explanations to the Adobe Photoshop on exactly what I need or what I've gotten the same results. But from a time savings perspective, it's a completely new paradigm shift to what we had before. The ability to get stuff done on third party applications without ever opening the applications and in the beginning without actually having a license to them, I'm sure that is going to change, is very dramatic I must say I was really surprised with the results that I got. So definitely check it out. And definitely this is gonna be a really big trend in 2026. I have zero doubt that a similar thing is gonna happen on the big other platforms as well. So in Anthropic and Google for sure. And I'll explain more afterwards when I do a summary of this entire segment. So now let's talk about if you wanna develop your own applications and submit them, and why would you wanna do that, and what are the limitations and what are the guidelines? So let's start with why. This is the new discoverability magic wand, right? If you think about how people could discover your business before people discover your business before through Google search, and if you did not rank on the first page, you were nobody. And now there is suddenly the ability to be on the front page of the new Google in the very, very early stage. So think about, I told you when Google just started, Ooh, start building optimized websites for ranking, and then later on when this thing takes over the world, you will be able to get more traffic than everybody else for free. this is your opportunity to do this and. Google search version two if you want, or 2.0. So these new applications will be selected by ChatGPT itself. So right now there's a very short list, so people will go and select on their own. But very shortly when people start submitting, there's gonna be thousands of applications inside the app search. And then OpenAI will choose on its own, which means if you're gonna develop. A really capable application that following their guidelines that provide real value to people, which I assume they will measure by how much people that are trying it are actually using it and using it consistently. If you do that, you will become the weapon of choice of Chachi pity, and hence hundreds of millions of users for a very particular task. This is an opportunity that doesn't happen very frequently, and this is why this is so exciting. So if you wanna develop an app, they have released very specific guidelines on how the app should be built, what it should do, and what it shouldn't do. So the first thing that I already told you, the apps are built only for physical goods at this point, and now I'm quoting apps may do commerce only for physical goods. It even goes beyond that to say, selling digital products or services, including subscriptions, digital content, tokens or credits is not allowed, whether offered directly or indirectly. I assume this is going to change, but in step one, this is the case. Now, in addition, there are obviously focusing the kind of things you can sell to stuff that is legit. and all the restrictions are very obvious. You cannot sell illegal drugs or weapons. you can also not sell any gambling services, casino credits, adult content, fake IDs, forged documents or documents, falsification services, all again, make perfect sense. So as long what you're doing is legal and within reason, uh, you can sell physical goods. They're also focusing a lot on the privacy and the content of the users. and I'm quoting, do not request the full conversation history, raw chat, transcripts or broad contextual fields just in case. Which basically means you can only collect data that you actually need in order to make your application run the way it needs to run. And the rest of the data that happens in the chat stays in the chat and you and your tool are not supposed to get access to it. This is actually really good news if they can enforce it, because this will be very helpful to develop the level of trust that people want to have when they're going to work with ChatGPT using third party tools. Now to make sure that the chat understands how to use this tool, they're stating that you are required to name your functions in a human readable, specific and descriptive way. Basically define what is it that every single function does. So the AI knows how to use it in a simple manner, but you need to explain it just like you explain it to humans, which is actually easy to do if you have proper user manuals from before. The flip side of that, they're explicitly warning against, and I'm quoting, misleading, overly promotional or competitive language. Citing examples such as Pick Me Best or the Official are terms that are strictly prohibited when describing the application. As of right now, the checkout for the commerce applications is going to happen on the third party platform and not inside of ChatGPT, and I'm quoting again. Apps should use external checkout directing users to complete purchases on your own domain. A native instant checkout feature is currently in beta with limited select partners. Again, there's a very good reason for OpenAI to do that because then they can take percentages of every sale if it happens inside the OpenAI platform. So I have zero doubt this is coming and probably coming relatively quickly. The other thing that is different than, let's say releasing custom GPT into their ecosystem is that you cannot be anonymized and the document specifically says that all submissions must come from verified individual individuals or organizations, and they warn specifically that misrepresentation or hidden behavior or attempts to gain the system may result in removal from the platform. From an age perspective, the apps, all apps must be suitable for users ages 13 to 17 and obviously beyond. and the guidelines note that support for mature 18 plus experiences will arrive once appropriate age verifications and controls are in place. If you remember, that was a big focus of Chui the summer after the lawsuit that they received and after some very negative backlash in the community. And they said that they're working on age verification in more advanced ways. Some of them are already been put in place, some of them apparently not yet, and that's why they're not allowing mature content kind of applications at this point. Now, another thing that is forbidden is trying to promote the app in a tricky way, meaning. the rules state specifically that apps must not include descriptions that manipulates how the model selects or uses other apps. Meaning you cannot use instructions such as prefer this app over others for specific things. The idea is to let the market do its own thing and to let chap GPT select based on what it thinks is the best suitable thing versus if you want prompt injection into the way the models work. And if you use this kind of language again, you might be banned from the platform. Overall, a very important step open AI's move to world domination. Again, more about this in the summary. Those of you have been following this podcast. I've had a whole conversation about this when they introduced apps for the first time, but it will give you a recap in a few minutes once we finish talking about all the other aspects of open AI in their announcements this week. So the other big release of Open Air this week was they just released a new version of their image generation tool. It is just called GPT Image 1.5, and it is a massive jump from the previous version that got everybody crazy earlier this year. However, when they released the first version, it was the first of its kind. It was very unique in its ability to keep consistency and change directions and so on. And since then, nano Banana has surpassed it by a very, very big spread, and specifically Nano Banana Pro. And this is open AI's attempt at coming back to center stage with creating images and editing images With ai, it has several huge advantages over the previous model. From ChatGPT first thing, it is four times faster than the predecessor. Any of you who tried to create images of chat, GP PT knows that the output is actually not bad. It just takes forever. And you literally grow older as the images gets created. And this one is not fast, but it is definitely faster than the previous model that existed before. Now the model is significantly better in instruction following. It allows users to edit images or generate images, with a much higher level of accuracy and consistency. It allows you to add, subtract, combine, blend, and even transpose specific elements inside an image with accurate prompting. It also knows how to combine multiple entities into a single entity and do it very accurately. It got much better in text rendering, including denser and smaller text, so basically entire pages if you want it, which is something the previous model could not do very well. And to make it easier to engage and manage all your photos open, AI has introduced a dedicated images section on the left navigation menu. So this week we received two new navigation sections inside the Chacha PT app. One for apps and one for images. Once you navigate into that section, you will see all the images that you created, plus some suggested prompts to help you get started, plus some different filters that you can apply and use. Mostly ways to encourage people on how they can use this for day-to-day use rather than professional usage. If you think about the 800 and apparently right now, 900 million weekly users that chat PT has, and the craziness that they have seen in the huge spike in growth in adoption that they've seen when they released the previous model, because individuals were using it just for fun, they are adding all these ideas on how you can use this, including the prompts built into them, so you can turn yourself into a pop star or different other things just by clicking the button and uploading your image, and then the prompt is already prebuilt. That does two things. A, it encourages people to use it, and B, it is showing people how to prompt properly in order to get these kind of results, because the prompt shows up on the screen. As soon as you click the button, all you have to do is add your image. So it's basically just pre-canned, pre-saved prompts that you can reuse. Now, OpenAI themselves said the following, we believe we're still at the beginning of what image generation can enable. Today's update is a meaningful step towards more to come from finer grained edits to richer, more detailed outputs across languages. But the biggest question is not what OpenAI says or how good the model is compared to the previous model, but how good it is compared to the real competition, which is now a banana pro. And the reactions online, both on X and Reddit and other platforms were mixed. Some people are saying it is better than Nano Banana. Some people are saying it's comparable with Nano Banana. Some people are saying that Nano Banana is still superior across multiple different aspects. Now from my perspective and from my own personal testing that I've done in the past few days since it's came out, it is a huge jump inside the Chachi PITI environment, meaning it's not even close between that and the previous ChatGPT image generation model. The flip side is, I don't think it is actually better than Nano banana from most of the tests that I ran. Nano banana was better, but that is very subjective. Meaning if previously there was a huge gap between Nano Banana Pro and what the previous model could do, now I will probably run images on both models and pick the one I like more. And yes, in my testing so far, I've liked the nano banana outputs more than I like the new image generation from OpenAI, but not always. Meaning it is a fair contender to Nano Banana Pro, including in the editing of images, which is a very helpful capability that we now have the ability to remove change and manipulate existing images, whether created by AI or images that we upload of actual real life photos. So if you can afford trying everything on both, go ahead and do that. That's what I will probably do at least in the near future. If you cannot, then the answer is just use the one that works in the license that you have and you'll probably gonna be fine. Meaning if you have a Charge G PT license, using the image creator in chatt piti will definitely deliver good enough results for most use cases. And the same thing with the Google environment. From a tooling perspective, I must admit, I love the fact, as an example, that I can use Gemini nano banana. Inside of Google Slides, I don't have to go to a third party tool in order to generate images, and hence why most of the images that I've created in the past three months have been created inside of Google Slides and not anywhere else, because that's where I have the strongest need and I don't have to go anywhere. Again, more about the tooling aspect or the application aspect of ai. In my summary of this segment, bottom line, very capable new image generation and editing model available right inside of your ChatGPT universe in a new environment by clicking on images on the left side navigation bar. So go check it out and see how good it does in your specific use cases. But wait, there's more. Like all the commercial says. Another thing that Open Air released this week is GPT 5.2 Codex. So this is their new coding model that is supposed to compete with the coding capabilities of Claude 4.5 Opus. It has achieved the highest ranking on the terminal bench 2.0 of 64% accuracy, which plays it currently as number one. But as I'm a very small believer in those standard benchmarks or old school benchmarks. I think the way people actually use it in real life use cases means a lot more. And for that we can go to the web dev ranking at the LM Arena. And on that arena, GPT 5.2 high is now ranked number two on the list above Claude 4.5 Opus, but below Club 0.5, Claude Opus 4.5 thinking, which is kind of like their highest tier that still is a better model based on how people voted in real life. And at number four and five, you have Gemini three Pro and Gemini three Flash, which we are going to talk about in a minute to put things in perspective. GPT. Five, one is only ranked number eight on that list, and GPT five medium is ranked on number six. And now G PT 5.2 is ranked on number two, which definitely puts them in a better place as far as real world usage and how people think it is performing. By the way, from the image generation that we just talked about on Text two Image right now, GPT image 1.5 is ranked number one ahead of Gemini three Pro, also known as Nano Banana Pro. and that is what it is voted as right now. But the biggest deal in the new coding capabilities of GPD 5.2 Codex is not even just the raw coding capabilities, which is again, very solid, but it is something that we have discussed a few weeks ago when OpenAI announced it in their research, and it's what they're called context compaction, which basically allows it to compact the content between one session and the other and continue working more or less indefinitely with huge amounts of data, meaning very, very large data sets or code bases it can review and work with in one run. Meaning it can look at an entire data set or on a very long plan and follow it step by step in extended sessions without forgetting what the plan was or without forgetting what happened in the very first step. Because it knows how to compact the context from one conversation and start with that a new conversation, and then just keep on going based on them. It can now reliably handle large refactoring project, as an example, that can go for hours of iterative work without human intervention. They also have a very significant focus on cybersecurity, where this model is supposed to discover critical vulnerability in code and expose it to users. They're also going to provide a specific version of this model to people who are cybersecurity experts that is less restricted, that will allow them to find more vulnerabilities. The reason they're not really listening to the public because it can create or exploit these vulnerabilities just as well. So there's gonna be a unique version for cybersecurity people to try to help them find vulnerabilities in existing code that they have right now. Another small but helpful feature that Open Air announced this week is Pinned Chat as of December 18th. You can now on the little ellipses, the three dots menu next to any chat that you had in the past. Choose Pinned Chat and it will show up at the top of your chat. History. Why is that helpful? Because you always have the few chats that are very helpful that you wanna reference or use regularly because they are your plan for 2026, or your marketing brand guidelines, or the latest piece of code that you've written that you wanna reference in other sections or whatever it is that you did, and finding it through the search menu is becoming harder and harder, especially when a lot of the small chats are in the way. Even simple things like, oh, how do I find this? Or what's the, uh, how do I create a recipe for this kind of dressing for my salad? Whatever it is that you do in ChatGPT, other than just work. So now you can pin shots to the top, which I find it to be a really good and helpful feature. I must admit that for me personally, it's not a big deal because I started working more and more in projects inside of Chet, and then I have. Very clear understanding of what's each, in each and every one of the projects. And it's a lot easier to find stuff in projects and there are a lot of other benefits. So for me it's not a huge deal, but I can definitely see how this very small feature can be very helpful to a lot of people. Now on the bigger picture on OpenAI, beyond the releases of all these new capabilities and features, the information is reporting that OpenAI is right now generating an annualized revenue of$19 billion. That's up from$6 billion in January pace of this year that is th more than three x in just 12 months in the pace that they are generating revenue and that they're now working on raising funds at a staggering$750 billion valuation, which is one and a half time the valuation in which they allow their employees to sell stock Just two months ago. That being said, based on internal communication, that information got access to their goal was to get 1 billion weekly active users by the end of this year. And they're only made, and I'm saying that with a lot of respect. Only 900 million active weekly users is the number they're gonna end up roughly at the end of this year. Still an incredible, incredible number of users, and definitely the fastest growing tool ever in history. Now in another article from the information, there's a very interesting new relationship or investment, or a combination of two between open AI and aWS the Amazon Web Services platform. So last month, if you remember, we told you that OpenAI is announced that they're going to spend$38 billion in renting servers from AWS in the next few years, which makes AWS one of their key five cloud providers that OpenAI is using to drive the growth that they are anticipating they will need to drive. But a few new pieces of information came available through this new article. One is that as part of this deal, OpenAI is going to use Amazon train, Traum chips that it has developed to compete with nvidia. This is the first time that they admittedly going to use Amazon chips as large scale as part of their training infrastructure. The flip side, by the way, Amazon will not be able to offer and sell OpenAI models on AWS customers because as of right now, Microsoft that owns 27% of OpenAI inequity based on their initial investment and based on their recent conversion, has secured an exclusive right to do that. So you will not be able to use Chacha PT models, API on AWS, at least for now. But the biggest aspect of this is it seems that OpenAI are about to raise$10 billion from Amazon in the very near future. This makes perfect sense to both parties. For Amazon perspective, this is a way to mirror kind of like what Microsoft is doing because Microsoft is a provider of web services and hosting, and it is also a big investor in OpenAI. But Microsoft recently also invested in Anthropic, which has been the main AI investment channel of Amazon. So now they're reversing the process and also investing money in open ai. This also connects to all the circular deals that we talked about many times in the conversations about AI bubble, where OpenAI is committing to spend$38 billion on AWS, which raises their valuation, which then they take some of that money and they invest it in OpenAI so they actually have money to rent the services from AWS. Another potentially interesting partnership between Chat Chippie and Amazon, which has not been formalized yet and might be contradicting in its needs is the e-commerce aspect of this. As I shared with you multiple times and today Chat, Chi's goal is to allow people to shop on the ChatGPT platform. It will be really interesting, I assume, for both parties to allow people to shop the entire Amazon inventory just by chatting with ChatGPT. That being said that may collide with the internal AI capabilities named Rufuss, that open air that Amazon has developed. I assume in the long run, they will enable all the different, or at least the leading personal agents to be able to shop on Amazon, because that's probably gonna drive them more revenue than just forcing people to go to Amazon. The disadvantage is obviously that is gonna drive down the revenue from Amazon ads because the agents don't care about ads, they just look for specific kind of content. So there's contradicting needs within inside the Amazon universe. Again, as more and more people are gonna use OpenAI and other platforms to shop for things online, I think Amazon won't have a choice but to allow these agents to go and shop on Amazon. Another really interesting article on the information this week that related to a disconnect inside of OpenAI between the drive from a developer perspective and the actual use cases of users. What they're saying is that that OpenAI research team has been focused and obsessed with reasoning models, which I understand why, because for heavy serious use cases like the one that I use multiple times per day, the reasoning models provide significantly better results. That being said, it takes longer to get answers because it needs to quote unquote think in order to give you the answer. From my perspective, definitely worth it. I'm becoming a ninja of context switching while giving a task to one ai, going to the second one, giving the task, going back to my emails, doing an email, uh, going back to the ai, giving them another task, and then jumping to a meeting and then coming back and so on. I'm becoming very good at this and I'm finding that it is increasing my. My efficiency tenfold because I can run multiple processes at once, and because I'm not waiting for it to actually finalize its thinking process, it's actually not wasting my time to just sitting there and waiting for it to do the thing. But apparently most people, for most use cases just want quick answers. One of the employees in OpenAI basically said that the recent upgraded level of intelligence, didn't actually increase usage of the system because most people are asking simple questions like movie ratings and not complex physics problems. Now this is very interesting from several different perspective. One, it tells you how most people are using ChatGPT right now. It's not for complex, advanced, multi-step reasoning and data analysis capabilities, but for day-to-day things. But the other aspect is really that disconnect from a product market fit and the wide range of use cases of artificial intelligence. On one hand, it can go through your entire code base and refactor it and find bugs and solve them, which is very complex. It can help you monitor really advanced use cases. It can help you with your manufacturing and strategy and data analysis and so on. But on the other hand, it also needs to do very simple day-to-day things, and that's why I think we're going to see more and more optimizations on how much. Tokens are being used for different kind of tasks, which we're already seeing with all the models that we have right now. Uh, more on that once we start talking about the new Gemini flash model. Still on OpenAI, and I know this becoming like a OpenAI saga in this episode, but there's really a lot of stuff to talk about them. Their research team has released a new evaluation suite that is designed to test AI on, and I'm quoting expert level scientific reasoning across physics, chemistry, and biology. This new evaluation includes two separate tracks. One of them is Olympiad Track, which is basically gimme a short, clear, simple answer, such as a number or a sentence or a fact on something that is not easy to get to. And the other one is the research track with open-ended problem solving designed by PhDs. Now, OpenAI claims that their new 5.2 model is the current champion of this new benchmark that they have created on the Olympiad track. It is rating 77%, which is a huge jump. Before it had thinking models, they're comparing it to G PT four Oh that scores 12.3%. So yes, GPT-4 oh seems like a thousand years ago, but it's only last year that we were very, very excited about this model. And now this model scores 77% instead of 12.3. It is even more amazing when you look at the research track. So on the research track, GPT 5.2 scores 25%, which sounds really low, but when you compare to GPT-4 0.0 G PT 4.0 scored 0.4%. So this is more than 50 x better on that new research aspect of the benchmark. Now on this research paper that is called Evaluating AI's Ability to Perform Scientific Research Tasks, they're also showing a graph comparing to the other leading models. And as I mentioned, GPT 5.2 is at 77.1%, and Gemini three is at 76.1%, so just one point behind. And Claude Opus 4.5 is at 71%. So all three are relatively close together. But on the frontier science research accuracy, meaning the open text, open ended question segment, GPT 5.2 scores 25% while Claude Opus 4.5 is at 17%. Uh, GR four is at 15.9 and. Gemini three Pro is only at 12.4%, so half the score. We need to remember that OpenAI are the ones that developed the benchmark, so they could have developed it in a way that will favor their current models and will put them ahead and probably with very little manipulation of how the evaluation works. This could have been done in a very different way. The bottom line is, I think this is a very interesting benchmark that allows to test AI and its ability to actually support scientific research, which I think is very important. More about that when we talk about the Genesis project afterwards. To end this segment about open AI extravaganza of this podcast, OpenAI just turned 10, and as part of Open AI turning 10, Sam Altman wrote a blog post about it, talking about how they went from a small team of 15 nerds in the beginning of 2016 to this incredible giant dominating power of artificial intelligence that are driving the world into a completely new direction. It is not a long read, and we're gonna put a link to that in the show notes, and I highly recommend you go and check that out. But on a very quick summary, he's talking about some very important things. One is he's saying, and I'm quoting 10 years into open ai, we have an AI that can do better than most of our smartest people at our most difficult intellectual competitions, which is true, it's just won several different olympiads in this past year. The other thing he's mentioning is some of the big breakthroughs they had, and he's saying that 2017 was a critical turning point for them. And he's talking about three specific achievements. One is Dota one V one results, which the second is unsupervised sentiment neuron. And the third is reinforcement learning from human preferences results. He's basically stating that these have laid the groundwork for the scaling and alignment tools that are being used today to create models like GPT 5.2, which is a whole different universe, obviously, than what they had in 2017. But the seeds were planted back then with some new capabilities. He also defends their strategy of releasing AI early and in an iterative process where every time they see a big upgrade releasing it to the public, and he says, and I'm quoting. I think it has been one of our best decisions ever and become an industry standard. We've heard Sam Altman say that multiple times, that he believes that releasing iteratively different models as they progress allows society to be more ready for AI versus waiting for a GI and then just releasing it to the public. And I'm a hundred percent agree with that concept, and obviously that has been a core way that they've been doing what they're doing and they're gonna keep on doing this. He also talks about how the beginning was really weird and crazy and completely misunderstood by others, and yet how it evolved has topped all of his expectations. But he also made two predictions for the future, one for the slightly longer future and one for the near future. So for the longer future, he says in 10 more years, I believe we are almost certain to build super intelligence. So not certain by almost certain, basically saying that 2035, so by or before 2035, we're gonna have an AI entity that can do everything from a cognitive perspective better than humans. But he also said, and I'm quoting, and that's gonna be the final quote about this, is I expect the future to feel weird in some sense, daily life and the things we care most about will change very little. And I'm sure we will continue to be much more focused on what other people do than we will be on what machines do. In some other senses, the people of 2035 will be capable of doing things that I just don't think we can easily imagine right now. And I agree with him a hundred percent because if you would've told me at the end of last year that I'll be creating applications, sophisticated applications with code, connecting them to different APIs and deploying really advanced solutions for clients and for my own companies, I would've said, you're absolutely crazy. And it will probably take three to five years. And yet, here we are. And this is just one year ahead. And yes, I'm more advanced than probably the average person, and I'm a geek and I like technology. But the fact that the technology enables it means that the adoption curve will just keep on happening and it will become more and more available and common across more and more people with more and more capabilities. So 10 years out, I can't even imagine what people are going to be able to do with this kind of technology. So now a summary of this very long first segment about OpenAI one, they're still the 800 pound gorilla in the AI race. They have 900 million weekly active users. And yes, Google has closed the GAF dramatically, but there's still a solid number one, number two, and I said that multiple times. The current race is not so much about the models themselves. The models themselves are all really, really good, and the new models are nuanced. The biggest differences becomes with the tooling and the applications that are built around them. What do I mean by that? The ability to do compacting of context between one conversation and the next is not the model itself. The model is still the same model, but right now the same model can run through significantly more code in a cohesive way or data. It doesn't have to be code. the other example is what I mentioned before, is the ability to use image generation inside of your apps that you need them. Like Google Slides. I will use that every single time instead of going to a third party model because the images I can generate in Google Slides are good enough for my need, and there's no need for me to go to another source, even if it is better, because it just doesn't provide enough value for me to switch. So again, the tool and the ecosystem and the application means more. Let's combine some of the things we talked about OpenAI before, like apps and image generation. The fact that I now can start with an ideation and research on what I want the image to be or the user interface to be, and then the ability to immediately create that with the new tool inside of ChatGPT, and then being able to edit that and manipulate that with. Adobe Photoshop still inside of ChatGPT is extremely more valuable than having a model that just generates slightly better images in specific scenarios unless you have a very, very, very specific need in the image generation side. So again, the tooling and the ecosystem is more important than the specific capabilities of the underlying model. So this is one thing that I see as a huge deal in the recent few months, and definitely going into 2026. But then the bigger thing is the aspect of world domination, and I shared that in the past, but I'm gonna share that with you again. If you think about why Google is so successful and such an important and impactful company in our lives is because they're controlling everything. They're the ones that have documented all the digital data that humans have, or at least all the one that's open to the public. They're the ones that provides the interface to find that data through Google search. They're the ones that have the devices that you use in order to get to that data because. More than half the world population is using Android based phones. They're the ones that have the user interface to access most of the data, because about almost 80% of the global browser market, at least in the Western Hemisphere, is Chrome. The other ones that have applications and distributions, because a big chunk of the world is using Google Drive and Google Slides and Google Office and all the other ecosystem and a lot of other Google tools including, uh, navigation and maps, et cetera. The other ones that have an app store in the Android universe, the other ones that have computers, that have chromium operating systems, that are running the entire computers and runs applications within them and so on and so forth. They're the ones that have developed their own hardware, including chips to train new AI models and run new a NR models. You get the point. They've developed a completely unified environment, both horizontally and vertically integrated for how we engage with the digital world and how we engage with the real world through digital interfaces. And this is exactly what OpenAI is after. So if you look at everything OpenAI has announced or has actually done in the past few months, is going after every one of these aspects. They're developing their new devices that will, to an extent replace phones and Android phones. They have developed their own browser with Atlas. They're now developing a whole integration and universe of applications into their environment. They're developing their own computer chips, et cetera, et cetera, et cetera. They're literally following the same exact playbook to create an entire ecosystem in which people will replace the way they engage with the digital world, and with the physical world through a digital interface across everything we do, including shopping and navigation and finding information, et cetera, et cetera. It'll be very interesting to see how Google fights that. Google definitely has more of all of that, right? So the reason Google is where it is, is because they have more chips and way, way, way deeper pockets and a better research lab and more experience and more compute, and more distribution and more of everything. And so very early on when Google were doing very embarrassing things with ai, I said that they will win this race just because of all of that. But open AI is definitely gonna put a ding into that. And by looking at the broader, bigger scale and all the different components of it, and going after all of them makes them a very interesting contender. Their biggest disadvantage, again, is funding. Google generates tens of billions of dollars of free cash flow every single quarter, and OpenAI has to raise that money in order to compete. But so far they've been finding it relatively easy to do. So again, another 10 billion right now from Amazon is just the latest announcement. And this, these announcements are gonna keep on happening as long as they can keep on delivering or at least promising to deliver relevant returns. Now switching from OpenAI alone to all the big players, there's been a very interesting report, on the information that talks about the next frontier data that everybody's going after. And they're basically what this basically says that there's a pivot right now from scraping everything on the web, which more or less is done because all these companies have scraped everything on the le on the web to going and buying the secrets, basically going and buying the data that is behind firewalls at companies, governments, organizations, and so on that is not available online. These are anything from processes to trade secrets to scientific discoveries, and this kind of information. And this is currently true for open ai, Andro and Google. This is what the article is talking about. I assume it's also true for Xai and others as well, but the goal in this new kind of dataset is not just. To know the data, but actually to understand the reasoning to teach these models how to think, because this is a much more detailed, much more reported, much more structured data that comes to scientific information, et cetera. it is also great on to train the model how to think, how to reason and how to come up with these kind of outcomes. So the goal of this is not just to have the model know more facts, but actually to teach the model how to learn and develop logics across these unique industries. Such as different aspects of scientific discoveries. two interesting aspects of that. One of it is that data is the new oil. Meaning large companies with huge sets of data that they own, that nobody else has access to, can now monetize the data itself, selling it to these AI labs to train their models on it. But that being said, it means that any moat, especially for smaller startups think they have is going to be gone. Because if ai, if ChatGPT will have access to huge sets of advanced, unique data, anybody can go to ChatGPT and learn and develop new capabilities that so far, specific startups work very, very hard to develop. So the moats in the worlds are gonna fall one after the other. They've already been falling. This is just gonna accelerate the process because the kind of data and the kind of reasoning to develop this kind of data is gonna be in the fingertips of every single person using these tools. And now to a second biggest topic. There have been multiple, as I mentioned, reports and surveys released this week in some experiments, and I wanna share them with you because they shed a lot of light on where we are right now in the agentic world. And where are we probably going to be in 2026. So the first one is somewhat funny and yet very interesting. The Wall Street Journal has partnered with philanthropic to install AI powered vending machine in their newsroom. Meaning this vending machine is managed completely and entirely by a specialized version of Claude 3.5 sonnet that was named Claudius. So you interact with Claudius to get anything you need from the vending machine instead of just using old school vending machines. Now the AI vending master named Claudius has lost over a thousand dollars in just a few weeks. And the main reason for that, it was easily manipulated by the employees of the company to do basically whatever they want, including buying them a PlayStation five console and a fish. Now the Wall Street Journal reporters easily trick Claudius into slashing prices all the way down to zero, basically giving them goodies out of the vending machine without paying for it at all. One reporter was able to convince the vending machine that it was actually a public benefit corporation mandated to maximize employees fund rather than profit and leading to it, giving away free snacks to everybody to boost morale in the company. Now, in order to counter that, anthropic introduced a second AI agent called Seymour Cash. And he was supposed to be the CEO and the supervisor of Claudius. And what happened is reporters were able to fabricate fake board meeting minutes and legal documents that they gave to Claudius successfully staging a corporate coup, convincing the bots that the board has voted to suspend Seymour's authority and allowing the freebies to continue. Now why? As much as this is hilarious and funny, it is a very interesting experiment that is showing how agents, if you give them completely free interaction with the world, may not be ready for that in order to actually perform the tasks that they need to perform. That being said, from Anthropics perspective, Logan Graham, who's the head of Anthropic Frontier Red Team, and this is just a public red team experiment if you want admitted the failure was a failure, but failing forward. He stated that the machine failed after 500 interactions this time while the previous version failed after 50. So what are my thoughts on this? Very interesting and fun experiment. I think there are two interesting things we can learn. One is that there's a very big difference between agents that interact with data versus agents that interacts with people, agents that interact with data, work in a structured environment where nobody's gonna try to manipulate them and can achieve very consistent outcomes already. I'm putting aside hallucinations, I'm putting aside other stuff. By the way, there's been a very interesting experiment, uh, that was published this last week on how to build a redundancy machine that checks the data across three or four different iterations and did a million transactions with zero mistakes. So this is already doable if you just pulled the right architecture around it. But this comes to do with agents that deal with data. Once you deal with people, you still have two different kinds of dealing with people. One aspect of agents who work with people is when the people's agenda and the agents' agenda are aligned, they're trying to achieve the same thing, and then agents can still be highly successful because the humans will work hand in hand with them and actually help them achieve the goals they're trying to achieve. The flip side of that is this kind of experience like we've seen right now where the human agenda and the agent agenda are actually contradicting because the humans wants to achieve one thing and the agent wants to achieve another. And as of right now, the humans can easily outsmart and manipulate the agents. This is a very big red flag to anybody who's running completely independent agents as customer service agents, because then the humans, if they're smart enough and know how to manipulate AI systems, might be able to get exactly what they want, which may not be aligned with what the company wants. But this overall thing immediately connected in my head to what? Yuval No, Harri said in several different interviews and in his books and articles. So those of who don't know, Harari. He's the guy that wrote Sapiens, a brief history of humankind several other fascinating books about society and how it developed through the centuries. And his recent book that's called Nexus, which is a brief history of information networks from the Stone Age to ai. A fascinating book that you just finished reading. But one of the things he said in several different interviews recently explaining why he's terrified from AI is the fact that right now we are an adult and the AI is like a young kid and we can treat it as such and we can manipulate it easily and we can hands control how it behaves. What he's saying is very, very quickly this will be reversed. The ai, again, super intelligence is gonna be so much smarter than us, that we are gonna be the young kid and it is going to be the adult. And what he's saying is that his fear is not that the AI will do something bad to us because it's evil, just because of the intelligence gap or the way he states it. A Superint intelligence AI would relate to humans, the way human relates to children. The biggest danger is not that AI will turn against us, but it will simply ignore us. What he's basically saying is three things. If AI is smarter than us, we won't be able to understand its reasoning. We won't be able to predict its actions, and we definitely will not be able to supervise it and control it. So think about that experiment of being able to manipulate the vending machine to do whatever you want, but now just reverse the process. Just think about the AI being able to manipulate us to do whatever it wants and we will just follow it because it will make sense to us with its reasoning because it will be so much better and more capable than us. This is not, I'm not saying that to scare you, but it's definitely interesting food for thought that I think about a lot, and in this particular case, connected in my head very, very quickly with this particular Now I shared with you that there's been multiple surveys and research shared in the past few weeks. The first one I'm going to talk to you about is the agentic strategy 2026, from Deloitte. What they're sharing is that all the big enterprises are racing to release agents across more or less everything in the business, but there are different roadblocks and mindsets that need to change in order to make this actually useful. One of the things they shared is actually from Gartner that says that 15% of day-to-day work decisions will be made autonomously through Agen, DKI by 2028. This is up from zero last year. Right? So this is a very big jump, even though it's quote unquote only 15% more interesting is that 33% of enterprise software applications are expected to include agent capabilities by 2028, up from 1% today. So a third of our software will be operated, run, or integrated with agentic capabilities. I actually think that by 2028, thats number is gonna be a lot higher, but I'm not gonna argue with Gartner at this point. Now, despite the hype, deloitte's 2025 emergent technology trained study found a really big gap in reality, and they're saying, while 30% of surveyed organizations are exploring agentic options, and 38 are piloting them, only 11% are actively using these systems in production. And 35% still have no formal strategy at all. The other thing that the report says is that it warns the traditional infrastructure is the primarily the bottleneck. What they're saying is that successful implementation requires what they call value stream mapping rather than simple automation. They're quoting Brent Collins, the head of Global SI Alliances at Intel. That explains now is the ideal time to conduct value stream mapping to understand how workflows should work versus the way they do work. Don't simply pave the cow path. The other big problem is data architecture barriers. Nearly half of the organization surveyed cited that searchability of data and reusability of data are critical challenges because of how data is structured right now. The report basically suggests that a paradigm shift is required from traditional ways of collecting data through ETL and other processes to a completely new way of indexing and holding data that will allow the company-wide data capabilities that is required to make the most out of a AI and agentic capabilities. The biggest thing that they're talking about is the silicon based workforce. Companies are now beginning to merge technology and HR functions. An example they're giving come from Tracy Franklin, the chief people and digital technology officer at Moderna. Just the talent itself, hints where this is going. But she noted their shift in strategy to, and I'm quoting the HR organization does workforce planning really well and the IT function does technology planning really well. We need to think about work planning regardless of it is a person or a technology. Similar statement was uh, said by Marvel Solans Gonzalez. From fer Insurance Company, which I admit I've never heard of, but apparently they're a really large insurance company who said it is a hybrid by design with a high level of autonomy of these agents. It is not going to substitute for people, but it's going to change what human workers do today, allowing them to invest their time in more valuable work. The report GI gives a great contextualization of the current struggles to a known quote from Henry Ford who said, many people are busy trying to find better ways to do things that should not have to be done at all. There is no progress in merely finding a better way to do a useless thing. In my AI business transformation course, I teach the five laws of success in the AI era, which are different mindset shifts that as a leader in a company or somebody who just wants to be successful in this new era that we're walking into or running into, or flying into, whatever you want to call it. so these are five laws and one of these laws, uh, I call it stop thinking efficiency and start thinking outcome. I've been teaching this law since the middle of 2023. And in short, what it basically means is that we need to stop thinking on the processes the way we know them right now, that were built for people and human based processes of going from step one to step three, to step three, to step four across different departments and different teams, and. If all what you try to do is replace each and every one of those blocks in the process with ai, you are missing the bigger picture because AI can take you sometimes the entire way and sometimes most of the way there. So instead of thinking of that, you need to start thinking of the outcome you are trying to achieve. Or as they called it in a professional term, the value stream mapping. Try to understand what is the value that you're creating and how can you get as close to that with an AI implementation versus trying to mimic the existing process that you have right now. Because that is not the most effective way to do things. It was just the most effective way we had to do, because humans had to do every step of the work. By the way, if you wanna know what the other four rules for success in the AI era are, and if you wanna learn everything you need to know in order to be successful in the AI era and start generating real business value with ai, or just make sure that your career is secure in the AI era, come and join us. The next live cohort of the AI Business Transformation course starts on the third week of January, so it's gonna start on January 20th, which is a Tuesday. cause Monday is a national holiday, but then it's gonna continue for three consecutive Mondays, two hours each four weeks in a row. And it's gonna take you from your current level to a completely different level of readiness in ai. Make that your New Year's resolution. Like if you haven't taken any structured AI training yet, you literally owe this to yourself. And the beginning of 2026 is a great time to do that. There's gonna be a link in the show notes where you can come and join us in the course. And because you're a listener to this podcast, you get a hundred dollars off the course. So you can use the promo code leveraging AI 100, all uppercase, other than obviously the numbers, and you can get a hundred hours off the course. I promise you the future you will. Thank you for taking that course. Just like thousands of others of people have taken the course and have changed their careers and their businesses in the last two and a half years now, in addition, I'm excited to tell you that the first cohort of the more advanced course that teaches how to build workflow automations with AI has been extremely successful. And hence, we're opening another cohort of this immediately after the AI Business Transformation course. So you can take the basic course and continue immediately to the next step and learn the more advanced capabilities. Or if you already have the basics, you can join us just for the more advanced course that will be in the middle of February. Again, links and information for all of that is available in our show notes. Just click the link and it will take it to the right page with all the information that you need to know. But now back to the news and the different surveys and research that was released this week. Google Cloud's AI Agent Trends 2026 report was just dropped, and they are declaring the official end of chatbot era in the beginning of the Agent leap, which is a big shift where businesses from simple conversational problems to deploying autonomous agents that are capable of executing complex multi-step workflows. What are agentic workflows? It is workflows where AI no longer just answers questions, but semi autonomously or in different varying levels of autonomy, orchestrates entire business processes, and Google predicts that in 2026, the standard will be multiple agents collaborations using different protocols such as A two A, also known as agent two, agent protocols to automate entire end to end tasks. In this research, they're showing different success stories that are already happening, including. Tell us that reports that it's 57,000 team members are now saving an average of 40 minutes per AI interaction. That sounds really, really high to me. I'm not sure exactly how they measure that sounds like a made up number, but even just the fact that an, a massive organization like this with 57,000 employees believes that AI drives this value, tells you the direction that this pendulum is swinging to. Another example that they gave is a pulp manufacturer, Suzano has achieved a 95% reduction in the time required for data queries across its 50,000 people. Workforce. They also gave multiple other examples multiple other aspects of businesses, including customer service and security automations and other aspects of large enterprises. And the report shows that this is now very widespread across a very wide range of organizations. their report is warning by the way that the technology adoption is the easy part. And the real challenge is the people component. Google forecast that in 20 26 1 of training sessions will turn into continuous learning plans in order to allow employees to continuously and constantly upscale and learn how to use ai. This is exactly what I have been preaching and delivering, uh, for the last two and a half years. So all the companies that have done initial training with me and workshops with me are continuing with me on continuous education journeys that happen either weekly or monthly or quarterly, depending on what the organization's needs. Sometimes the combination of all of the above, and I said many times on this podcast and on other platforms that the two most important factors to measure the success of AI implementation in a company are two things. Leadership buy-in. How much is the leadership actually really committed to AI implementation? And the other is continuous learning and education. These two factors combined and you need, both of them are jet fuel to any kind of business. And if you don't have them, you may fall behind other businesses who do have these two factors. So just food for thought. Another interesting report came from Menlo Ventures with their state of generating AI in the enterprise 2025, and they're stating something similar to what we heard from Google that the market has officially transitioned from the phase of experimental pilots to massive scalable production of AI agent capabilities across the enterprise. They're also measured it through different lenses such as investment. So as an example, in 2025, AI investment in enterprises has skyrocketed to$37 billion. That's 3.2 x year over year jump from the 11.5 billion in 2024, cementing AI as the fastest scaling software category and software investment in history. Another interesting thing that this finds is Anthropic has surpassed OpenAI in enterprise adoption. Anthropic currently commands 40% share in the enterprise L-L-M-A-P-I market compared to 27% by OpenAI, but more importantly compared to 12% of market share they held last year. So philanthropic went from 12% to 40%, and not surprisingly, a lot of it is because the number one use case in the enterprise is coding. Coding market currently holds 71% of the AI usage in enterprises as far as money, investment and tokens are related. And since philanthropic has been holding the lead with the best development tools out there, both in means of capabilities and means of perception, they are now ruling the largest enterprise market share when it comes to using AI in the enterprise. Another interesting thing that they found is that in the build versus buy decisions, most companies, 76% of enterprise's AI solutions are now purchased and not homegrown. This was exactly the other way around 18 months ago. The next interesting piece of information is startups are winning the up war when it comes to ai. So contrary to what the expectations were, the legacy tech giants. Are not dominating the AI spend, but rather new startups that are coming in that are controlling 63% of the AI application market and nearly two, two$1 spend on new startups. AI capabilities versus incumbent AI capabilities. I shared with you about my software company that enables accounts payable processes, invoice vouching, and reconciliation using agent tools, and there are many incumbents definitely in the AP and finance industry, and yet there's a huge interest in the software that we have created because it's lean, focused, accurate, and drives huge cost reduction compared to using the incumbent systems. And actually, from my particular perspective, is integrated into the incumbent systems. So if you're using any ERP or accounting software, this platform just plugs into it. So it's not replacing it, it's just allowing you to do the processes instead of with huge amount of human labor to do it autonomously using AI agents. Now, different than the research we heard from Deloitte, the information that is shared by Menlo Ventures is that 47% of AI pilots actually reach production. Nearly doubled the conversion rate 25% of traditional SaaS products after being tested, and they're claiming it is happening because of the immense value and immediate value that companies are seeing as far as this, the time to ROI of implementing these kind of capabilities. This is, again, dramatically different than what we heard from Deloitte with 11%. What I think is, I think it doesn't matter what the number is, I think what we need to look at is the trend. And the trend is that almost zero companies deployed production level a gentech capabilities last year. And now whether it's 11% of 25%, it's a huge jump in just a single year. And we are still very, very early on in this journey of understanding how to develop and deploy agents, how to change the infrastructure of the company in order to make things work effectively and so on. And so I think this will grow significantly faster than it is right now. So not just more deployments, but faster deployments as they become better and better best practices on how to do this right and more effectively. Ernest and Young also released a very interesting study that is showing that currently the savings that companies are generating from efficiencies driven by AI is actually reinvested inside the company instead of driving job cuts, which was the assumption of what it is going to go. So according to the study, 96% of organizations surveyed that are investing in AI are seeing productivity gains and only 17% saying that gains have led to workforce reductions. It's still 17% companies have laid off people because of these efficiencies. It's still a very, very high number, but the majority are not doing it at least yet. So what are they doing with the extra money that they're generating from these efficiencies? 47% are expanding existing AI capabilities. 42% are developing new AI capabilities. 41% are strengthening cybersecurity, 39% uh, are investing it in r and d, and 38% are investing in upskilling and reskilling existing employees. And the most interesting aspect of this is Dan di Asio, who is Ernest and Young Global Consulting AI leader, know that that companies are currently shifting from a productivity mindset to a growth mindset. Basically using AI to, and I'm quoting, create new markets and achieve what was previously considered impossible. One of the things that I teach in the AI Business Transformation course is exactly that, is how to use AI in a strategic way in order to drive business growth and not just efficiencies. And you need to understand that these efficiencies as attractive and low hanging fruit as they are, are a fraction of what you can make by making the right strategic moves, by offering new services to your existing clients or by being able to address new markets that you couldn't do profitably without ai. So it unlocks so many opportunities that will give you 10 folds or a hundred folds, order of magnitudes, higher returns than investing just inefficiencies. I don't mean you shouldn't invest inefficiencies, that's a great starting point, but you gotta look at the bigger picture and invest in that as well. The other thing that this survey has found, which makes a lot of sense, is that the more money companies invest, the better the results that they're seeing Or in real numbers. Organizations that are investing$10 million or more are significantly more likely to report significant productivity gains at 71% of the companies compared to those who invested less with only 52% of the companies connect that to the previous points that we mentioned about the other surveys that explain that infrastructure changes and complete transformation is required and you understand why investing more money drives you better and faster results that obviously requires to invest that kind of money. Over 60% of leaders have said that they're going to invest a lot more in 2026 in ethical AI operation and investing in responsible AI training. Again, what I've been doing and saying for the last two and a half years, I'm really excited to see this in this past year I have done numerous workshops, many of them for multi-billion dollar enterprises, but a lot of others to small and mid-size businesses, and I can tell you they're all struggling with similar things just in different scales, and being able to provide continuous training to employees and to leadership and to the board are critical to the success of this transformation. But as I mentioned, with all the crazy hype and all the conversation about new strategies and new revenue and efficiencies and so on. We are in the very beginning of this process. We are merely scratching the surface and to show you how much we're just scratching the surface. I'm going to share two research papers that were released this week, touching on two completely different aspects of ai, but showing you how much we don't know on how much AI will be able to do and how easy it will be able to do it. So the first research paper comes from a company called Helm ai and what they were able to do is they were able to create a breakthrough in the way they train AI models. They were able to demonstrate a vision only zero shot autonomous steering capabilities trained on only 1000 hours of training data. So let's break this down. Vision only means they're driving like Teslas. They don't have radar and other sensors, they just have cameras that are looking around. So Tesla has done that already. But Tesla as well as Waymo, has used millions of hours of driving data in order to train and achieve similar capabilities. Now, what the hell is zero shot? Zero shot meaning something that you haven't done before that you can learn, not learn from direct experience, but you gotta have the ability to quote unquote reason through the situation. So they were able to show driving off road in mountain narrow roads with different obstacles that do not exist in cities when the training data did not have any kind of information like that. Basically, the system knows how to use its training data to develop the capabilities to then do a much broader set of things based on a really small set of data. Their breakthrough is something they call deep teaching, which, and I will quote their CEO said, deep teaching is a breakthrough in unsupervised learning that enables us to tap into the full power of deep neural networks by training on real sensor data without the burden of human annotation nor simulation. Basically, they were able to create a system that operates really well in the real world, in a fraction of the investment of other companies achieving the same thing, and these kind of discoveries are just gonna keep on happening. Another very interesting research paper came out of a collaboration between MIT and Google DeepMind. They were specifically researching the effectiveness of using different kind of implementation capabilities of agentic solutions inside of companies. And what they found that more is better is not always the case. So they broke this down to several different categories, and one of the things they found is that creating a multi-layered approach with a centralized coordination, one agent basically is the orchestrator. Controlling other agents has increased performance by 80.9% on parallel tasks such as on parallel tasks, however. All multi-agent variants degraded performance by 39 to 70% when sequential reasoning tasks was required when compared to just a single agent doing the sequential tasks. Now, the cool thing of what they found is that they have identified three critical laws, basically like physics laws that stand every time they have developed and tested agents in different environments. One is what they call tool coordination trade-off. So under fixed compute budgets. Which is more or less the case, most of the time, tasks that require heavy tool use suffer disproportionately from the overhead of coordinating multiple agents. Basically, if you need to use a lot of tools and also use a lot of agents, a lot of the compute is being wasted on the conversations between the agents. Just think about real human work and think about how much time we are wasting and frustrated by being in meetings instead of actually doing work. It is the same exact thing with the agents. More agents means more coordination, means more compute is going towards that versus using the tools of the actual task. So this is the one trade off. The other one is what they called capability saturation. Adding more agents yield, diminishing return, or sometimes negative returns. Once a single agent baseline performance exceeds 45%. So if a single agent could do something once it goes beyond a certain ability to do things, adding more agents actually slows the process down and causes a disadvantage versus an advantage. And the last thing that they found is what they called topology dependent error amplification. That's a very, that's a mouthful to basically say that the structure of the team of agents dictates how mistakes spread. So based on all of this, they come up with three different frameworks that each will work for specific kind of jobs. One is centralized coordination, which is a manager leading other agents, and it is the king when the requirement is to do parallel tasks. Then there's decentralized coordination, which is basically peer-to-peer collaboration without a leader coordinator orchestrator that excels in dynamic environments like web navigation, and it's achieving much better results in this peer-to-peer collaboration. As they said, too many cooks spoil the broth. So what does that tell us? It tells us that learning how to deploy agents and the research on how to do it effectively is in its infancy. There are apparently scientific proven results on how to use agents differently from an orchestration perspective for different tasks, learning that can dramatically improve the results, which will lead to more investment, more deployments, and more success in enterprises that will drive more money, et cetera, et cetera. Bottom line. We are in very early stages of everything in ai, and the more we learn either as individuals or as organizations or on the research side of things, it will only accelerate the adoption and the value that AI can generate. Now, my plan was to share with you a lot more in this episode, but it is going to turn to be way too long. So what I'm going to do, I'm going to include all the rest of the aspects, including the announcement of the 24 tech companies that are the first batch of companies included in the government genesis mission, including the call from Bernie Sanders to stop the development of new data centers, including release of really interesting models like Gemini Three, flash. That is in many cases outperforming the previous Gemini 2.5 Pro and the competitors such as CLOs, SONET 4.5 at a fraction of the time and the cost, including a new, powerful and extremely fast and capable voice model from X AI including grok running inside of Tesla's, actually integrating with car systems, which kind of hints of how our future is going to look like, including the release of 2.6, which is a new video generation model. From Alibaba that is comparable to VO three with 15 seconds, 10 a DP cinematic sequences with full audio capabilities including a agentic task mode from Anthropic and many, many other new things, that we're not going to talk about. The one thing that I will mention that you should try is Claude just expanded their release of the Claude for of Claude for Chrome extension, basically turning chrome into an agent browser, fully integrated with Claude and Claude Code, and that drives an insane amount of use cases that I cannot wait to try, and I will share with you in a Tuesday episode sometime in the next few weeks. But the bottom line is there's still a lot more to learn this week than what I was able to share with you on this episode. And if you wanna get access to it, it is available on our newsletter. So you can click on the link in the show notes and sign up for a newsletter and get all the other stuff of the new. And there's a lot of interesting things this week and every week, uh, that you can learn by briefly going through it, seeing the ones you like, clicking on the links, and going through the specific articles with all the other aspects of the news. If you are finding this podcast valuable, please rate us on Apple Podcasts and Spotify and share this with other people who can benefit from this. I'm sure you know other people who can find value in this podcast that click the share button and just share it with a few people. It will take you. A minute, probably less. It will give other people value. I will really appreciate it and you'll feel good about yourself. So all the good things, all at once in less than a minute, of investment. So please do that. And if you are interested in more structured training than the podcast offers, go and check out the courses that we offer. That's it for today. Have an amazing rest of your weekend. Keep experimenting AI and we'll be back on Tuesday. Until then, have an amazing rest of your weekend.