225 | Agents are trained in “AI gyms” so they can take your job, How people are using ChatGPT and Claude, Why LLMs hallucinate, and more important Ai news for the week ending September 19, 2025 Artwork

Leveraging AI

Dive into the world of artificial intelligence with 'Leveraging AI,' a podcast tailored for forward-thinking business professionals. Each episode brings insightful discussions on how AI can ethically transform business practices, offering practical solutions to day-to-day business challenges.
Join our host Isar Meitis (4 time CEO), and expert guests as they turn AI's complexities into actionable insights, and explore its ethical implications in the business world. Whether you are an AI novice or a seasoned professional, 'Leveraging AI' equips you with the knowledge and tools to harness AI's power responsibly and effectively. Tune in weekly for inspiring conversations and real-world applications. Subscribe now and unlock the potential of AI in your business.

All Episodes

Leveraging AI

225 | Agents are trained in “AI gyms” so they can take your job, How people are using ChatGPT and Claude, Why LLMs hallucinate, and more important Ai news for the week ending September 19, 2025

September 20, 2025 • Isar Meitis • Season 1 • Episode 225

Is AI really ready to revolutionize the workplace—or are we all just beta testers with fancy job titles?

In this episode of The Leveraging AI Podcast, Isar Meitis dives into the latest reports, product launches, and behind-the-scenes drama shaping the future of artificial intelligence. From jaw-dropping usage stats from OpenAI and Anthropic, to Microsoft’s digital agents with KPIs, to the billion-dollar race to build gym-trained AI employees—this is the stuff the headlines aren’t telling you.

In this session, you’ll discover:
- How 700M people are using ChatGPT every week (and why 73% of it isn't even for work)
- Why Claude is powering a silent automation revolution (77% of its tasks are full-blown process automation)
- Why most employees are *still* barely scratching the surface of AI at work
- The alarming global divide in AI adoption (Spoiler: Israel and Singapore are leading by miles)
- OpenAI’s shocking take on hallucinations—and why your AI might be confidently wrong, often
- Why Salesforce’s AI agent rollout is a cautionary tale in overhype and underdelivery
- Microsoft + Workday’s plan to treat AI agents like actual employees (KPIs and all)
- Meta’s AR glasses and the “Zoom avatar” future that’s either cool or creepy
* 🛒 Why OpenAI wants ChatGPT to start shopping for you (and why that’s a data privacy nightmare)
- The billion-dollar training gyms building AI agents to take over real-world business tasks
- The inevitable tension between faster, cheaper, and... accurate AI
- What business leaders need to *urgently* understand about reinforcement learning and economic displacement

📘 OpenAI: Real-World Usage of ChatGPT
🔗 Economic Research: ChatGPT Usage - https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf

📘 Anthropic: Claude Usage & AI in the Economy
🔗 Anthropic Economic Index – September 2025 - https://www.anthropic.com/research/anthropic-economic-index-september-2025-report

About Leveraging AI

The Ultimate AI Course for Business People: https://multiplai.ai/ai-course/
YouTube Full Episodes: https://www.youtube.com/@Multiplai_AI/
Connect with Isar Meitis: https://www.linkedin.com/in/isarmeitis/
Join our Live Sessions, AI Hangouts and newsletter: https://services.multiplai.ai/events

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Speaker: 0:00

Hello and welcome to a Weekend News episode of the Leveraging AI Podcast. The podcast I chairs practical, ethical ways to leverage AI to improve efficiency, grow your business, and advance your career. This Isar Mateis, your host, I've got really bad allergies, uh, today, so I have. Apologize in advance for my stuffy nose, but we have a lot of interesting things to cover. We're going to deep dive into two papers that were released in parallel from open AI and from anthropic revealing how people are actually using AI in the real world, which is fascinating to learn. We are also going to talk about the impact, or in some cases, lack of impact of agents around the world and where are agents going to take us in the very near future with updates from multiple companies on this topic. And we're also going to deep dive into the release of GPT five Codex. Which is ChatGPT's coding agent that was just released this week. And then we have a lot of other RapidFire news starting with a lot of cool, interesting new releases from OpenAI and from other players as well. There's a lot of interesting investment news and a lot of other stuff that happened this week, so we have a lot to cover. So let's get started. As I mentioned, both OpenAI and Anthropic released two papers that are sharing how people are actually using ChatGPT, and Claude in the real world. And this is really interesting for several different reasons, but the main one from my perspective was perfectly captured by Peter McElroy, who is Anthropics Economist who said, what we really hope with this data is to help people understand and anticipate how AI at the frontier is changing the nature of work. Now, the reality is that's more the anthropic side on the open EI side. We will see it's not just work, but it's actually creeping more and more into our personal lives. So a few numbers from OpenAI. They have over 700 million weekly active users as of July of 2025, sending over 18 billion messages every week. To do the math very quickly, that's 29,000 messages every second. This is an insane number. Now the most interesting thing about the OpenAI paper is that non-work-related ChatGPT messages surge from 53% in June of 2024 to 73% in June of 2025. So right now, about three quarters of the messages that are sent through ChatGPT are not for work related stuff. Now on the Anthropic side, it is definitely not the same, but over there on the Anthropic side, the most interesting aspect is that 77% of API tasks that are using cloud in the backend, are built towards automation of processes rather than augmentation. Meaning building work related things that replace humans rather than help humans do the work more effectively. While the trend is not a surprise, the number of, again, 77%, three quarters of the usage of the API is driven for automation. the reason it's not a surprise because Claude's success has been riding on it becoming defacto the main use solution for coding. And so most coding tasks around the world when they're built into automated processes are now done autonomously, obviously, or somewhat autonomously, which drive these numbers. But if you take that beyond what's happening right now, and you try to project what that means, it means that as these models are gonna get better at other things beyond just coding, the same exact trend is expected over there. Meaning we're expecting models that will be ahead of the others in building and understanding specific automations to do a lot more automation than they're doing augmentation right now, which means significant potential impact on future jobs. So back to the open AI side of things. What are the top things that people use ChatGPT for? 29% of users on the personal side use it for practical guidance. 24% use it when looking for information and 24% use it for different writing skills. And on the work side on ChatGPT, most of the work that is done is done for editing and translating user generated text. Diving into clothes breakdown coding is at 36% of overall tasks. Education task rose from 9.3% to 12.4% and science related task research and so on rose from 6.3% to 7.2%. But as you can see, the vast majority, 36% of cloud's usage is for coding. And going a layer deeper. The amount of full stack delegation on these coding tasks has rose from 27% last year in December of 2024 to 39% in August of 2025. So the direction is very, very clear. More automation, less augmentation as these systems are getting better. Another interesting thing that came out of these two papers is that the global AI adoption is uneven with higher per capita usage in wealthy nations and more tech concentrated places like Israel and Singapore, with Israel being seven X the average, and Singapore being 4.6 x, the average with Canada 2.9 x the average, while emerging economies like India and Nigeria are very far behind with 0.27 and 0.2 uh, usage compared to the average of the entire survey. This means that the inequalities that exist today may actually be increased rather than decreased by the introduction of ai, which is really sad to hear because AI does provide the opportunity for regions with less technology and less access to knowledge to close the gap, which is what I really hope that will happen eventually. Diving more into demographics on the ChatGPT side, it is very much skewing towards the younger generation. 46% of all messages come from people under 26, which are significantly less than 46% of the population. So a big concentration on the younger generation with work related usage, growing with age and education, meaning people who are older and more educated use Chachi pity more for work and less for personal stuff, and younger generation use it more for personal stuff. Diving even deeper into the open AI classifications, they have classified 49% of the messages as asking ChatGPT something 40% as doing something and 11% as expressing something. So what is the bottom line from both of these papers? And again, there's gonna be links to both of them in the show notes. You can take the links, these are actual research papers, so you can drop them into a tool like Notebook lm and get the highlights or read the whole thing if you're interested. But the bottom line is that AI adoption is growing fast. It is growing very unevenly, and it is growing faster in established countries, in established regions versus everybody else. The other thing is that Open AI or ChatGPT is the household name for ai. Especially for younger generation, it is being the place to go to get any kind of information, any kind of consultation that is not work related. aNd the thing that I take out of all of this is that most people don't really have a clue how to use AI at work. Because if most people are using AI work, and now I'm talking specifically about chat g pt, as far as asking it for knowledge and asking it to rewrite and translate things, they're missing about 95% of what AI can do that it's actually really good at that provides significantly more business value. So what this means is that means that even people in surveys that are saying that they are using AI at work are really not using its to its full potential. And that means that there's a huge still opportunity in the workforce to really benefit from AI way beyond what we are benefiting from it today. Now OpenAI released another interesting paper this past week, which talks about the causes and the potential resolutions for AI hallucinations. So the paper that is literally called Why Language Models Hallucinate shares the results of a research performed by an OpenAI that is shedding a very interesting light on why these models hallucinate. So those of you who don't know, which I assume is very few of you, AI models make stuff up and they make stuff up in surprising ways and in surprising places. And they make stuff up in the same convincing way. They share the real truth and accurate information with us, which make it very, very hard to find. And so what OpenAI found in the research is the reason why large language model hallucinate is driven by the way they are trained and evaluated post training. Basically what they're saying is that the models get incentivized to guess over admitting that they don't know the answer. So think about when you are taking a multiple choice test, and if you don't provide any answer, you by definition get a zero on that particular question. However, if you guess you have a 25% chance, if there are four potential answers, and since AI gets scored on similar kind of tests, it is preferring to guess and make stuff up, then actually not provide any answer. Since OpenAI found this out, they have trained GPT five differently. So this results in a very interesting output. GPT-5 thinking mini shows a 52% presentation rate, meaning not answering a question because it is not sure about the answer with 22% accuracy and 26% error rate. So 22% of the time it answered correctly, 26% it answered incorrectly, and 52% of the time it decided not to answer at all. This is comparing to oh four Mini that has 1% presentation in meaning it chose not to provide an answer only 1% of the time compared to 52% of the time of GPT five, but it got 24% accuracy versus 22% accuracy of GPT-5. Why? Because it was guessing and sometimes those guesses were actually correct. The problem was the error rate was 75%, so it got it wrong 75% of the time, and it slightly increased the accuracy from 22 to 24% just by guessing all the times that ChatGPT five did not provide an answer. So what does that mean? That means that if we learn how to train model, encouraging them not to guess and providing them a higher score. When they provide an answer that is uncertain in saying, I don't know this, or I'm not certain, or This is the level of accuracy, I think about this particular thing, and give that a higher score in the training process, rather than guessing an answer that sometimes we write, which will give them a higher score in the evaluation, will lead to significantly less hallucinations. In one of the examples that they gave in doing a summary about a specific individual where his birthday was not available, the model chose to guess the birthday. Think about the chances of that. It's one in 365, right? It's a very low chance, and yet the model chose every time to guess the birthday in order to provide all the information that the user requested. This obviously does not make any sense, and so if the training will be better, these things in these kind of situations, the model would just say, I don't know that information, and we'll move on. Now, part of the problem in fixed business problem that the business incentives that are basically driving AI consumer development remains misaligned with reducing hallucinations that I explained. Yes, everybody wants less hallucinations, but less hallucinations means a lot more compute in thinking about the problems and figuring out when not to answer the questions that leads to three different results. Result. Number one, it's gonna be more expensive to get answers result number two, it's gonna take longer to get answers and result. Number three is we're not always gonna get answers. All three of these things are things consumers do not want. So there will have to be some kind of a balance between these forces of saying, I want faster, cheaper, and accurate results. Where the reality is if you want faster, cheaper, and better results, it's not something you can get, right. So the choice is between getting faster, cheaper, less accurate results to getting more accurate, but slower, more expensive, and in many cases not getting results at all. At all. Because the AI will just tell you that you, that it doesn't know, which in my perspective is actually better. But when people are chasing cheaper solutions all the time, this may not be the only incentive that will drive how the future models will actually operate. The bottom line is, is it's a very interesting path to dramatically reducing hallucinations and driving a situation where the AI would say it doesn't know, which, as I mentioned, is something I've been doing while prompting for a very long time and incentivizing the AI through prompts to tell me that it doesn't know or the information doesn't exist where we were looking for it rather than making up an answer. It works very, very well. It actually works extremely well on ChatGPT compared to, let's say, Gemini, where it works not so well. So from research papers into the impact of agents on the workforce, where we are right now and where we are going. And so where we are right now is actually very interesting. There were two independent, unrelated articles about how slow. AI agents adoption actually is compared to the promise and the hype earlier this year. Now, to put things in perspective, AI agents has quadrupled since last year, so it's still growing very, very fast. However, one article talks about the early predictions of Mark Benioff, the CEO of Salesforce related to the future of agent force, which is what Salesforce is going all in about, which is agents within the Salesforce environment and if you remember, Benioff said, absolute year of Agent Force when he was talking about making predictions for 2025. And this prediction basically crumbled under the weight of consumer skepticism and complex implementation and cost and many other things that actually slow the progress, as well as really fierce competition from multiple different angles. So the reality as of right now is that fewer than 5% of the 150,000 companies that use Salesforce are actually paying for Agent Force nine months after its launch. If you remember earlier in this year, Mike Benioff himself was quoted calling their architect technical team crazy and pushed to firing all of them after they weren't big clients of Agent four setup complexities. The reason he was saying that is he was claiming that it takes minutes to deploy, and yet his actual implementers say that it's a very complex and tedious process. This team was dismantled at the end of 2024 and it was put back together in August of 2025 after many consumers said that they need this kind of assistant that suddenly was not there and not available. They were also saying that this team was right, and it is a very complex process and it does require a team like that to help put Agent Force in place. Now it's not all bad. Agent Force has hit a hundred million dollars in annual order value by May of 2025 with 6,000 paying customers. But even that number is not completely accurate because this is bundled with new database structures and database deployments and we don't exactly know how much of that revenue comes from the new databases versus how much of that revenue actually comes from the agents being deployed and being used. The other big problem, as I mentioned, is competition. So the initial pricing was$2 per conversation, and that is more or less double than some of the rivals are offering right now. So there's a very big pressure on Salesforce to reduce prices, which may or may not be profitable for them. So there's a lot of other issues, that is slowing the adoption of agents, at least in Agent Force right now. Now in general, we see when it comes to adoption. And the same thing is true for Salesforce is that smaller tech startups embed AI significantly faster than bigger legacy firms that take a very long time to deal with their legacy data that is broken and that there are processes that are not aligned and sometimes not mapped properly and so on and so forth. So smaller, faster tech oriented companies do faster, better implementation slower, old school companies take significantly longer to implement these AI agenda solutions. Now, if you think about the fact that right now S-A-P-I-B-M, Oracle AWS, Microsoft and startups like Sierra, are all offering similar solutions. And you understand that the competition is dramatically intensifying. So while the pie is growing, there's gonna be a lot of people taking pieces of the pie. The overall success of each and every one of these providers is limited. aNd we shared with you last week, the irony in which Agent force was supposed to be an enhancer of employees in the company. And yet inside of Salesforce, they just let go of 4,000 job cuts in their customer service because Agent Force resolves 83% of their own support queries. Another company that is facing the same kind of hurdles is Microsoft. So Microsoft is pushing its AI office apps very aggressively, and yet there's a very serious pushback against the$30 per month per user. Investment that is required in order to have the copilot upgrade across the board. Now, combine that with the fact that we mentioned last week that they're planning to integrate Antropic'ss models as part of the solution, especially now that it can generate PowerPoint decks and Excel analysis very effectively. Probably the best in the market right now, but with the anthropic models being a lot more expensive, you understand that puts even more pressure on Microsoft in how they're going to grow that aspect of their business. Now, the good news for Microsoft is other than copilot, their Azure cloud is booming because of the AI server rentals by multiple companies, mostly open ai. Which is providing a huge boost of capital into AI in Microsoft to allow them to potentially sponsor at least the initial investment in getting people to use copilot. And once they're hooked, you can do whatever you want. But that's about it. As far as the headwind when it comes to implementing agents in the workforce, all the rest of the news in the agent stuff of this episode and in general is very positive and very aggressive moving forward. A very interesting news from this week is that Microsoft is teaming up with Workday in order to provide a unified solution that will allow to manage agents just like you manage employees. So Microsoft Enterra agent ID pairs with Workday's agent system of record, also known as a SOR to give AI agents built on Azure AI Foundry and copilot studio verified identities. Basically making them employees of companies that has clear set of permissions and business context and secure levels of access to specific pieces of data and so on and so forth. And if you take this a little bit to the future, these agents will have career goals and KPIs they need to hit and so on. Just like any other employee. And all of that would be able to be managed on the Workdays platform just like any other employees. So, as an example, agents that are built on top of A SOR will log usage and users and impact on productivity reports. While the intra ID from Microsoft will let admins audit and provide access or revoke access from different people and agents, from different tools and data within the organization. And Garrett Kasmir provides a great quote that summarizes it. AI is not a single vendor solution. It is an ecosystem that emerges on the shared data, shared governance, and shared intelligence across a network of systems. Basically what he means is you can do this on your own. You have to be integrated into the actual company systems and processes, and they are now jointly providing that kind of solution. If you think that's the end of the story when it comes to Microsoft and agents, well, Microsoft just announced that they're going to providing a plethora of different co-pilot agents that are gonna be running within teams. So these family of agents will be joining your teams meetings, taking notes, suggesting time slots for different topics, even doing overrun alerts for these topics, answering questions based on data in your entire Microsoft ecosystem, creating documents, creating tasks, and you'd even be able to do this without one tap mobile activation for just hallway conversations with people from a mobile app. So these family of agents, some of them are built to face the conversation. Basically participate in the conversation and take notes and be active in the conversation. Some of them are more backend oriented, where they will gather data that you need from SharePoint and other sources, summarizing information and providing immediate answers for information that you need versus you saying, okay, let's go find this information and meet again in a few days. The information will automatically pop up and be shared with the relevant people in the relevant format right there in the team's conversation. I think this is an amazing promise and step in the right direction as far as creating enterprise level efficiencies. will it actually work and will people actually use it? That's a whole different question, but combine that with the previous announcement from Microsoft and Workday, and you understand where this is going. The future of the workforce is a blended workforce with agents and humans working in tandem across everything in the company. With agents taking more and more of the tedious task and humans can focus on the more high value tasks, the question is how many humans we actually need in that situation. In most of the tedious work, which is an X percentage of overall work is just done effectively and efficiently by agents. Another company that is moving in the same exact direction that competes with teams is Zoom. Zoom just announced an upgraded AI companion that is not just available in Zoom. You'll be able to run it in Zoom, Google Meet, and Microsoft Teams for transcription note taking and data finding across, again, all these platforms, and that is obviously to compete with other platforms that has already been out there. I've been using one of them for a very long time. So you have platforms like Read ai, OT Fireflies, granola, fathom, uh, Circleback. All of these are working across all the different platforms, and what Zoom is doing is basically saying you'll be able to use the Zoom AI agent companion to do all these things across the different platforms and not being just locked in, into the Zoom solution. The other thing that they're adding is live humanoid avatars that can join on your behalf to meetings. So while the CEO of Zoom, Eric Qan is saying, this is great because when you're not camera ready, you'll be able to have your avatar join instead of you. I see this as a very problematic new step in the wrong direction. A because of deep fakes, you can have people use this technology in order to make belief that you are talking to a specific person when you're not talking to that specific person. This had already happened a year ago in Singapore driving a tens of millions of dollars of funds transferred from a company to a potential new supplier after a controller was having a Zoom conversation with someone who he thought was the CFO of the company that he knew well, that that wasn't done with a Zoom agent. That was done with a deep fake platform. But it doesn't matter. The fact that it's gonna be built into the platform, just gonna make it easier to actually manipulate. I also am not very happy from a very personal level to have other people's avatars join instead of them to the call. Like, if I can't have a call with you, I don't need the call. I definitely don't need your avatar to show up instead of you. So while I really resent that concept, I think this is part of the future. Now, I think some individuals and companies would love that and some companies will help, will hate that, and it's gonna be a while for us to get used to what's allowed and not allowed and what's acceptable or not acceptable. We might be able to block them in specific conversations or specific platforms, but this is where it is going, whether I like it or not. Now in addition to just joining the calls and taking notes, these Zoom agents will be able to do things like create emails and documents of different aspects that related to the meeting, prep for the meetings, do deep research in preparation for the meetings or after the meetings and there's even a custom agent builder that provide with MCP support. So it's going to be a very powerful AI based solution for the Zoom suite, and as I mentioned, beyond Zoom as well. But maybe the craziest and the scariest article this week when it comes to agent talks about both anthropic and open AI and how they're investing over a billion dollar each in creating a reinforcement learning environment based on real life scenarios. They're calling them learning gyms that basically mimic real life operations in Salesforce, Excel, Zendesk, et cetera, all the tools that we use regularly. So how does this work? You create a cloned enterprise environment with all the apps that the enterprise environment will have, and then you have AI agents play in that environment and get feedback from humans just like any other reinforcement learning that teaches them how to actually work in an enterprise environment across a very wide range of applications. As I mentioned, OpenAI spent about 1 billion on this kind of process this year, and they're projecting to spend eight billions on this by 2030. Combine that with the ability to do computer use, basically take over your computer, run different applications, and you understand that these tools, after being well trained on how to actually work in an enterprise environment, will be able to basically do anything any employee in the company does across all the different apps in the tech stack of any company. One of the OpenAI senior executives said privately, the following, the entire economy becomes an RL machine. RL stands for reinforcement learning. Basically, what they're saying is the world becomes their training ground for the next platform, which will then replace everything that happens in the real world. So what does this mean? This means that the risk for job replacement displacement higher unemployment is someone is growing dramatically because the labs now have the budgets and they have the infrastructure to train models beyond just the stuff that they're doing right now, and to actually train them based on the work we do on the day-to-day on most companies around the world. And once agents will know how to do that well. Most of these tasks are gonna be replaced by agents because they'll be able to do it for a fraction of the cost and at a fraction of the time. Now staying on the agent universe, but diving deeper into the code world. OpenAI just released GPT five Codex, which is a fine tuned version of GPT five that is designed specifically to be an AI coding assistant. Now the new model is available already on Visual Studio Code extension Codex, CLI and Codex Cloud. So different ways to access it, and it was built to be a real competitor to c Claude and their huge success in the AI coding universe. As we mentioned earlier in this episode. Now some historical context to what happened. Those of you who has watched the release of GPT-5, it was very, very obvious that Open AI is focused all in on taking back the lead in the coding wars that has driven Claude's revenue from 1 billion last year 5 billion right now. Most of it, as we saw earlier in this episode, 70 7% of it geared towards automation encoding. So a little bit of history. In March of 2024, anthropic released Claw three, which was the first model that was actually somewhat good at programming and started getting on the map in that particular field. June of 2024, they released Claude 3.5 sonnet, that was the big first boom of Claude into a platform that is really good at writing code in mid 2024. Cha GPT released GPT-4 0.0 that was supposed to be the contender that will take back the lead, but it just wasn't good enough. That was followed in February of 2025 with Claude 3.7 sonnet, which was really good at coding and completely took the field and became the default tool behind the scenes for most coding platforms out there. And that started the craze of pushing the success of Claude very, very aggressively forward in May of 2022. Claude four was introduced both Opus and Sonnet, and that drove them even further ahead in the coding world. And that led us to the release of GPT five this summer that was supposed to take that back shortly after, followed by Cloud Opus 4.1. That is supposed to be an improvement over that. So that kind of brings us to today the open air are still a little bit behind and they needed to fine tune GPT-5 specifically for coding and that what Codex does. So GPT five Codex outperforms GPT-5, the regular model on the SWE bench verified, which is the top way to verify coding models. Right now, there's still no formal score on the actual leaderboard. So right now, Claude four Opus is the leader with 67.6% success versus GPT-5. The regular GPT-5 with 65%, so a very small spread between the two. And now based on open AI themselves, GPT-5 Codex is better than GPT five. They did not say if it's actually better than Cloud four Opus, but I'm sure we'll know that within a few days or a couple of weeks once people start using it. It has a few interesting features. The two most interesting features. One is that the model dynamically adjusted the time it needs to think about different tasks based on the complexity of the task that it needs to do. Now that's very different than what GPT five is doing. GT five is deciding whether to think, whether to use a thinking model or not with a router. In this particular case, it's actually decides in real time how much time it needs to invest in specific aspects of the task. So it's a much more granular way to approach the problem. Simple task can run very quickly and run faster, and they're claiming that they have observed it work for up to seven hours straight on its own, on a single prompt. Now the other interesting feature is that it has what they call a Codex Cloud extension that can be attached to most IDE platforms that will then delegate specific tasks to the cloud that can run in parallel to what is happening in the IDE. So the IDE, those of you who don't know is the platforms in which you develop code. So tools like visual Studio or Cursor. And what this allows you to do is in addition to the model that is running within Cursor or within Visual Studio, there's a parallel path where the model on the IDE can delegate some of the tasks to the cloud platform to run in parallel while keeping the context of what is happening within the IDE. That's a very interesting approach, and I said multiple times in the past few months on this podcast that the tooling are gonna make as big of a difference as the quality of the models moving forward, as the gaps between the quality of the underlying models is shrinking, and the way it's going to be used and the way it's gonna enable people to benefit from them is gonna make a very big difference. Now I wanna dive in for a second to the seven hour work capability. We shared with you several times the research by a company called Meter that has done a long running research in the past few years that is looking at the growth of AI models capabilities by evaluating how long would it take a human to do the task that the AI is doing with the ai, doing it at a 50% success rate. Now that sounds really poor. Who the hell wants an AI that does things that are 50%, but the actual result doesn't matter? The improvement in that over time is what matters. What they found is that AI doubles the length of time of the human task. It replaces every seven months. So if the AI can do a task that a human did in 10 minutes, after seven months, it can do a task that a human does in 20 minutes, then 40 minutes, then an hour and 20 minutes, and so on every seven months. I shared with you last week that Repli released agent three, which I now use every single day, and it blows my mind. Agent three can think for about two and a half hours. That completely breaks the duplication every seven months because it can think 10 x on what the previous model did. That was earlier this year, so it's not doubling every 10 months. Now it's 10 X every seven months, which is a very different scale. GPT Codex five claims it can work for seven hours. That potentially also breaks the meter scale maybe. And the reason I'm saying maybe because what the meter scale evaluates is not how long the model works, but actually how long the human works. So this task that took GPT five Codex seven hours to do, how long would that take a human to do? So I can tell you from a personal experience that using rep agent three is not necessarily the fastest way to do tasks. There are many cases in which I can actually stop the agent and actually give it comments myself and dramatically accelerate the process. So saying, what's the point? Why would you wait for the Rapid agent three to do the work? Well, the reality is I don't need to do anything. Last weekend I used Repli to develop a whole new feature in a new application that I'm working on, and it worked for 59 minutes on its own. In those 59 minutes I had dinner with my kids, and when I came back, it was still working, and then I sat down and played bass until it was done. And if this was during work hours, I could have done my emails, work on other things, and so on while it's running in the background. So whether the hour of work or seven hours of work is less and more efficient, that the human work at the same time is less relevant, as long as it's doing the work effectively and it's actually doing it properly, because that means that while the agent is running, you can do something else. That's it for the deep dives for today. But now we have a lot of stuff to talk about in the Rapid fire And since we just talked about OpenAI, we're gonna share some new features and cool capabilities from OpenAI. The first thing is that OpenAI released what they call developer mode beta, and it's already available. I already have access to it and already turned it on. And what it enables you to do is it enables you to connect any MCP client to your chat. Those of you who don't know what MCP is, it stands for Model Context Protocol. And what it basically means in simple English is that you can connect any third party tool, application and software into your AI with five minutes of effort. Meaning you can connect Salesforce, JIRA, Asana, Microsoft platforms, anything that has an MCP server that somebody has already created, whether the company themselves or a third party straight into your chat and ask questions. That makes it absolutely magical because it is a hundred x multiplier on the capabilities of the chat tool without those MCP capabilities now. Claude already has that for a few months now, and I've been using it in Claude, and I think it's absolutely amazing. It's magic what it can do. The problem with that is that introduces a lot of backdoor and data security capabilities while you are using it, and that can come from just the fact that there's another backdoor to your data. It comes from the fact that these tools can now read and write data on your actual platforms. So let's say Salesforce is an example. Your ChatGPT conversation can now change the data in your Salesforce database, which you may or may not be ready for, and you may or may not like what it's actually doing because it may or misunderstood what you ask it to do. Combine that with potential hallucinations of the model, plus prompt injections or just malicious MCP servers by third parties that are actually built to steal your data. And you understand it is the perfect storm of risk versus reward. So companies and individuals will have to define what is acceptable and how to reduce the risk while enjoying the benefit. The specific note, if you are going to activate this mode on your ChatGPT says it allows you to add unverified connectors that could modify or erase data permanently use at your own risk. So if you still wanna do this it's actually very easy to do. Go and click on the bottom left on your user. Go to settings, connectors, scroll all the way down. There's an advanced section, click on advance, and then there's a tuggle button to turn on developer mode. Then you're gonna have a red circle around your prompt line instead of the regular black circle. It's kinda like highlighting that you are dealing in uncharted territory, which I think is a good idea from T's perspective. Kinda like highlighting the risk on the day-to-day usage versus just the one-time click of the radio button inside the settings. But the capability is extremely powerful, and I definitely see a future where we'll figure out a way to reduce the risk while enjoying these amazing benefits that MCP provides. another half new feature in ChatGPT is a branching feature. You can now start a conversation in a regular chat in ChatGPT and then branch out new conversations in several different places while remembering the history of what happened before. So if you want to explore different aspects of a business plan, you wanna dive into the finance and in another conversation you wanna dive into the operations. And in a third one into marketing, after you've done all the initial setup and explaining what the plan is, that's an option. If you wanna do this as a study guide and you wanna deep dive into different aspects of learning, you can do that as well, and so on and so forth. To access this feature, you just have to scroll down to the end of an answer from chat GPT, click on the ellipses, which are the three little dots, and then there's an option called branch in a new chat that's gonna open a new tab in your browser and it's gonna have the same conversation, but you can continue it in a different direction. The reason I'm saying this is just half a feature is because that capability kind of existed before I. now, if you go to any prompt in the chat, there's a little pencil button that you can click on. If you click on that, it actually creates a new prompt that is a new branch in the conversation, and you can continue moving back and forth between the branches because there's a button in the prompt that shows you are you in version one, version two, version three, how many versions that you want, and each and every one of them, you can continue a separate conversation, but the user interface of that is not very user friendly. Now, it literally creates a new chat in your left panel that shows you that this is a new conversation, but it is based on the old conversation. So again, going back to tooling, it's a better way to access the same feature, which makes it more accessible to people. Go play with it if you find this valuable. I think it's great. I've been using the old feature for a very long time and it provided a lot of value to me. Another thing that OpenAI has released this week is something that we're talking about for a long time. So in the desktop application, there is now a new Build for orders section in the settings. Basically paving the way for native checkout on the ChatGPT desktop app. What does that mean? It means you'll be able to give it your credit card and ask it to go buy stuff for you, and it will be able to do the research and find the right product. And actually check out right there on the Chat G PT app. This is another step into an agentic future in which AI can do stuff for us. Going back to OpenAI, pushing very hard on the non-business aspect of Chat two P users, and you see why this is a very aggressive step in the right direction. Combine that with another thing that OpenAI just released or is about to release, which is parental control. So we talked a lot about this in the last previous episodes last week and two weeks ago, as far as the lawsuits, and the risk to younger individuals. So there are two things that Open AI are rolling out. One is parental control for kids under the age of 13, which will block the young individuals from accessing different aspects and different types of chats, and will allow the parents to set different limits, tweak response behavior, and get distress alerts if specific things happen in the conversations. This is supposed to start rolling out later in September, and they're also building a solution that will automatically detect users that are under 18 based on how they conversed and will change the model's behavior to fit an age sensitive group of users. So again, all these things are directly connected to what we started with in the beginning, that more and more chat, GPT users are using it for personal tasks. And the younger generation is a very big focus of that. So open air are both capitalizing on that field. They're gonna make millions and potentially billions of dollars from commissions through those checkouts. That's part of their plan for the future that they shared several times as their vision moving forward, but it also is coming to address the risks of younger individuals and how the use of AI may impact their livelihood. So overall, I'm happy with the direction that OpenAI is taking on both these aspects. Other interesting launches, this past week, Gamma, the AI part presentation platform has unveiled version three of their platform, which is a major jump ahead for version two. Like everything else, a lot more agentic and a lot more capable in doing multiple things. Gamma has been a very successful platform when it comes to generating presentations and generating landing pages and webpages based on single prompts. And now they're taking this into a completely different level, and the age agent side of it is supposed to really understand your intent and act accordingly. So even very short prompts can yield very significant results. One of the examples they gave is make it more visual and that will prompt an entire set of changes to your presentation just by providing this really small feedback. Or it can understand rough notes and screenshot that was taken on a whiteboard and then turn them into a complete presentation based on that and so on and so forth. So again, a very big jump forward to a very capable system already. Another huge announcement this week comes from HeyGen. So HeyGen is a platform that allows you to generate AI avatars, either of yourself or synthetic avatars that they have many of in their database. I've been using HeyGen for over two years now. It's a great platform where they just announced that they are merging or they acquired Elisa, which is a company that is specializing in ai, video editing and manipulation. And the combination of these two companies is going to be extremely powerful. So think about video agents that act as directors paired with Hyperrealistic avatars. So what does that mean? It means you have a bunch of agents that can create scripts, edits, translations, captions, feedback loops, all with a single prompt that will then drive the output of the avatar. This is going to be a complete change of what we know from avatars right now because it will make the whole generation solution significantly more efficient. Or like the announcements say they're focusing same day workflow. So it starts with a prompt, and then it uses on-brand avatar, invoices, specific company assets, pacing and captions, channel ready formats, or whatever formats you need. Brand kits, compliance checks, multi-language translations, and one click push to LinkedIn. YouTube, your CMS, et cetera. Something that used to take weeks to entire teams will now take hours to a single individual. Another really interesting release this week comes from Gens Spark. So Gens Spark is a company out of California that has built a generalized agent that can do more or less anything you want from writing code to browsing, to creating presentations to video generation and so on and so forth. They just released their own AI browser for Windows and for Mac. Now what makes it unique compared to all other AI browsers right now is that it includes 169 free open weight open source models that are running on the device itself, which means these models run on your computer and none of your data gets sent to a third party cloud provider, which does two things. A, it makes it lightning fast, and B, it makes it private because no data is being sent to the cloud. It also means it'll probably be a lot cheaper to use because you don't need any subscriptions for API of any of the third party tools because the models are running locally on your computer. Other than that, it works very similar to other AI browsers, and this is definitely going to be the way of the future. We shared about some of them in previous episodes, but it, they can browse the web on their own. They can have context across multiple tabs, and they already include built in super agents that can scan pages for contextual data, auto running price comparisons across multiple websites, finding data across multiple YouTube videos and creating summaries of it and so on and so forth. Extremely powerful capability, and it has an autopilot mode similar to all other AI browsers where it actually takes control over the web browser and can navigate and click and change things and so on. So while this is not the first AI browser, it is the first that I know that actually runs on device, that is a very interesting angle that we'll see if it's gonna catch other browsers as well. Now in addition to the technical news from OpenAI, there's a bunch of interesting, more strategic news from OpenAI. The first one relates to their new agreement with Microsoft that we shared last week, that they finally signed an LOI or an MOU, for their new partnership. So OpenAI is project to take down the 20% revenue sharing that they have with Microsoft right now to 8% that is supposed to give them a net of over$50 billion between now and 2030 that they desperately need in order to pay for compute. As you remember, they just signed a deal with Oracle that for$300 billion worth of compute, so they need to find ways to pay for that. And these 50 billion is one way to pay for that stupid amount of money that they're supposed to come up with. and another interesting news that s related to that is that OpenAI just recruited mike Liberator, who is the former CFO of Xai. So this is a very interesting piece of information because of two different reasons. One, XAI has done some things that are absolutely incredible when it comes to startup setup. They were able to raise about$10 billion in just a few months, which is almost impossible to any company in history unless Elon Musk is behind it. But you also need somebody on the finance side that is very, very good at putting these kind of deals together. They also built the most capable AI data center in the world. Still today in about 90 days, which again is unheard of. It usually takes nine to 18 months to do something like this. And the person that was behind that acceleration process on the financial side is now working for OpenAI. The other obvious reason is that this adds more gasoline to the fire of the very interesting relationship between Sam Altman and Elon Musk. So a very quick recap. Elon Musk was one of the co-founders of OpenAI. He was one of the most senior backers from a financial perspective, and he left after fighting with Sam Altman because he wanted to run the company. He wanted to turn to be a part of Tesla. That didn't work out. He left. He then started suing them since then, this relationship has been escalating to very negative directions. And this is just another way for Sam Altman to stick it to the man if you want while hiring his previous, CFO, he actually resigned from ex AI in July of this year. but now he's been hired by OpenAI and he will probably drive the speed in which OpenAI can do its financial operations. The information had a very interesting article analyzing the new structure of open AI and trying to understand what shares will belong to who in the new restructuring. This is not formal data, but the information usually gets the relevant information, pun intended. based on this article in the new formation, the nonprofit arm of OpenAI will own 27% of the new company that currently values the nonprofit arm at$135 billion, which makes it potentially the highest valued nonprofit in the world. I don't know that, but it's definitely up there. Microsoft is going to own 28%, and it's gonna be the largest shareholder that values that investment at$140 billion right now, which is more than 10 x, which Microsoft actually invested in OpenAI. 25% is gonna be owned by OpenAI current and past employees. That values that at 125 billion, which is gonna make a lot of new multimillionaires in the Silicon Valley area. 13% will go to investors that invested in the company in 2025. That values that at 65 billion. The 2024 investors gets 4%. That values that at 20 billion. 2% goes to the original shareholders, which is$10 billion and 1% goes to open AI's very first investors, which is$5 billion. Now, that sounds like a very small amount of money compared to the other amounts of money, but these people invested less than$200 million to get a profit of 5 billion about seven years after. That's not a bad investment from open AI to Antropic's Antropic'ss, CEO, Dario Amide in the Axios AI plus DC Summit has shared that currently Claude is writing most of the code for the next version of Claude. The exact quote is, the vast majority of future Claude code is being written by the large language model itself. Now, Jack Clark, one of the co-founders, clarified that Claude cannot yet manage all of its own development, but that the portion of what it can manage is growing rapidly. Combine that with what we shared in the beginning of this episode is that most of cloud users are using it to replace workers rather than enhance workers even right now, and you understand that the next version will enable a lot more of that and it will come faster because the AI is writing the code, and so we're getting into a very dangerous segment of the impact of AI as it's going to accelerate even faster and enable even more automation, which will allow it to accelerate even faster and so on and so forth. As part of this process, Antropic's is dramatically growing its offices in Washington DC in an attempt to impact the government and the decision making process and trying to push for higher security and safety. Their head of policy, Jack Clark, and their CEO Dario Ade just held a marathon event, started at September 16, pitching the house and senate leaders and committee chairs on AI exponentially crazier surge. That's a quote, basically warning that the deployment of AI as it is today is putting at risk a lot of the stuff that we know and that they need to be more involved in order to reduce these risks. It is very obvious to me and to more or less anybody who is in this field that the 2026 midterm elections and the 2028 presidential race will be highly impacted by the impact of AI on jobs. And it will become a major topic and a major concern. And so definitely the parties will be deeply involved in trying to understand what actions will they take or can they take? That's the bigger question, because I don't think any of the parties have the tools or the vision to actually fight what's happening right now. I don't think they grasp how fast it is coming. I don't think they grasp the impact that it can have. And I think by the time they figure it out, it will be too late to respond. So my personal feeling is that we're going into some very murky waters in the next three to five years before we figure this out. What happens in those three to five years might not be very happy times. In parallel, anthropic is deepening its relationship with the US and UK AI Safety Labs, granting them more and more access to their models as they're being developed, not as a one-time event, but as an ongoing partnership that helps them actually find different risks in the platforms and reduce it and address it before the models are released. I salute Antropic's for doing that, and I really, really hope that all the different labs will do the same. And I hope even more that the governments will make that mandatory and maybe an international body of experts that will evaluate models before they're being released, can be established to address not just US models, but models from all over the world. With all of that happening. There was another big event this week. The event was METAS Connect 2025 event, and they've done a lot of unveilings in that. The focus was obviously all their smart glasses and there are three new sets of glasses. The first one is their consumer ready smart glasses with built in high resolution, see-through digital display. It's basically a heads up display that can project high resolution video onto the glasses themselves while not blocking what's behind it. The price tag of these glasses are$799. They're a little bulkier than regular glasses, but the capabilities they wrap inside the glasses is absolutely incredible. The glasses are controlled via hand gestures, using a, what they call metas neural band, which is a wristband that is powered by neural technology. So very small hand gestures can impact how you control the glasses and what's gonna be the display and so on. And it can be used for things like watching videos, reading and responding to messages, receiving video calls, following map directions, and so on and so forth. All projected, straight into your glasses. It also includes a 12 megapixel camera that allows you to take videos and photos and obviously upload them or connect them directly to WhatsApp, Facebook, uh, et cetera. Now at the price point of$800, that's not a cheap toy. But if you think about the vision that this will replace cell phones, and we are very much used to paying over a thousand dollars or$1,200 for top cell phones. If I can do everything I can do on my cell phone with glasses and my hands stay free, and I can still see the world around me, that makes it very, very attractive, and that's definitely the direction that meta is pushing. The second set of glasses that they released is the highly anticipated Oakley Meta, Vanguard Smart Glasses, which is built specifically for athletes. So these are more of a wraparound, sporty looking for cyclists, runners, skiers, and so on. The price tag is$499. It's supposed to be released in about a month, on October 21st. It can capture three K resolution videos with a 12 megapixel high wide angle camera, and it is really built for sports, so it has a much longer battery life, so you can be outdoors and don't have to charge it for about nine hours. The buttons are on the bottom of the frame so you can access them easily while wearing a helmet. It connects seamlessly to Garmin smart watches, to track fitness stats, and it integrates with Strava as well for those of you who cycle. And it is IP 67 dust and water resistant rating, which means you can wear it outdoors with no problem. And then there's the second version of the RayBan Meta. That is going to be priced at 3 79. that is an increase from the 2 99 of the original model and they're already available for sale. They've doubled the battery life to eight hours and it also has the three K Ultra HD video capability. Now, the demonstrations didn't actually go very well at the event, so Mark Zuckerberg was trying to connect to a video call to their CTO Andrew Bosworth, and that didn't work. After several different attempts, they also tried to do a live AI feature on the RayBan meta glasses, and that failed as well. So while this is a consumer product that is supposed to be ready for prime time, it is probably not fully baked yet. That being said, I said that multiple times on the show. I think this is the future. It's a very scary and weird future where everybody will have wearables that can record and analyze everything around them. And from a privacy perspective, that breaks almost every law and social agreement that we have right now. But we also didn't think that everybody will share everything they do on social media 20 years ago. And now everybody shares everything that they do on social media. And so I think it's just a process that we will adapt to. Definitely the younger generation that is already used to sharing everything they're doing on social will have no problem with that whatsoever. It does raise a lot of concern when it comes to places with security information or private spaces and so on, and I do see that becoming a thing where there's gonna be private spaces in which this technology will not gonna be allowed. They also shared, uh, new things like, horizon TV Entertainment Hub for their Quest headsets and Hypers Escapee, which is enables quest to scan the actual physical environment and create a digital twin on it. So you can play games or do things in a virtual environment that mimics a live environment. And they announce substantial improvements to the Horizon Engine and to the Horizon Studio platform that allows people to develop AI solutions integrated into their 3D worlds, which combines everything that Meta has worked on in the last few years. So the Metaverse and AI and AR and VR capabilities all into one environment. And as of right now, meta is definitely the leader in that field. And going from meta to Gemini. Gemini just topped the number one spot on the US App Store for the first time. It kicked ChatGPT to number two. It also climbed from number 26 on the Google Play Store to number two on the Google Play Store in the us. And all of that is driven by NATO banana. So the new image generation model that everybody wants to use has driven a huge spike in the demand for the Gemini app. And it also drove interesting revenue to Chachi piti. So Gemini's iOS earnings has hit$6.3 million year to date with August alone, delivering$1.6 million, which is over 1200% growth from their revenue from that channel in January of this year. So what does this tell us? It tells us not that people like Gemini more, but that people really like creating images, and that's their platform to do this. But it's a very smart move from Google because it drives people to the Gemini ecosystem. I must admit, I use Gemini a lot. I use both the Gemini chat as well as Gemini in the different platforms. So within, the G Suite tools that I'm using for work, and I love the results. I just did an incredible analysis yesterday with multiple PDFs that I had inside attachments. On Gmail that Gemini helped me to identify that I dropped into Google Drive, that I then created an initial analysis of data that is grabbing information from tables in these actual PDFs, turning it into an Excel file, and then continuing the conversation with Gemini in the Excel to do multiple analysis, something that would've taken me hours to do and took me about 25 minutes. So those of you who are not using Gemini within the G Suite environment are missing out if you're on the Google universe. That's it for today. There are many more news items that did not make it into this episode, including a very new interesting release from Alibaba with their new deep research model that actually beats ChatGPT's Deep research with a model that is 25 times smaller and more efficient. Many new updates and interesting news about humanoid robots and even in the$5 billion investment of Nvidia in Intel stock and their future collaboration on chip development, that will bring NVIDIA's capabilities into a chip that will be together with the CPUs that are developed by Intel. So lots of interesting news in the newsletter. So if you want that, just go and sign up for our newsletter. There's a link in the show notes, and then you can get all those others news as well. We'll be back on Tuesday with a very interesting and different how to episode that is built on a presentation I did for teachers this past week that is showing how to build really advanced applications inside of ChatGPT, Claude, and Gemini. You can use it for business as well. So look out for that on Tuesday morning. if you are enjoying this podcast and finding it valuable, please give us a five star rating on your favorite podcasting app, and please share the podcast with other people. Pull the phone right now out of your pocket unless you're driving. Click on the share button and send it to a few people that you know that can benefit from it. I know you know people that can benefit from it. I know that you understand how important this is, and I would really appreciate if you do that. And until next time, keep on exploring ai, keep sharing what you're learning with the world and have an amazing rest of your weekend.

People on this episode

Isar Meitis

Host