Leveraging AI

79 | When will we achieve AGI, 3 new models released that are better than GPT 4, all ginats are developing their own AI chips and a lot more exciting AI news from the week ending on April 13th

April 13, 2024 Isar Meitis Season 1 Episode 79
Leveraging AI
79 | When will we achieve AGI, 3 new models released that are better than GPT 4, all ginats are developing their own AI chips and a lot more exciting AI news from the week ending on April 13th
Show Notes Transcript

About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Hello and welcome to a weekend news episode of Leveraging AI, that shares practical ethical ways to leverage AI to improve efficiency, grow your business and advance your career. This is Isar Metis, your host, and every week has a lot of AI news, but I somehow feel this week has been maybe the craziest week ever. Definitely the craziest week for AI a while with a lot of news across multiple aspects of this all out war between the giant companies for AI dominance. There's been news about chip dominance. There's been news about new models being released. There's been news about poaching talent from other companies and video generation and a lot more. So really a lot to cover today. So buckle up, let's go. The first thing I want to mention this week is that XAI released Grok 1. 5 and more and more news about it and capabilities are being released. The main thing is that it's beating some of the leading models on multiple benchmarks and it even has its first multi modal capabilities. And I'm quoting from their blog post, Grok 1. 5 V is competitive with existing frontier multimodal models in number of domains, ranging from multidisciplinary reasoning, understanding documents, science diagrams, charts, screenshots, and photographs. So it's not only doing better just on text. It's doing very well on a lot of other capabilities that, again, previously belonged only to models like GPT 4 and Claude 3 3. Crazy thing is how quickly they have developed these models to catch up with the leading models out there. That obviously comes at a cost, which we'll talk about in a few minutes about safety. Now, in parallel, in a live interview on X this week, Elon Musk shared that he believes That AI will surpass by the end of 2025, meaning within the next two years, he believes we'll be able to achieve AGI. Now, if we look at recent weeks, we've seen AGI predictions made by Jensen Hung, who believes it will take about five years. We have Ben Gortzel, who believes it will be about three years. And then we have Yan LeCun, who's the chief AI scientist for Meta, and he believes it's not even doable with the current technology. So Yalakun does not think large language models are the path to AGI. He mentions four different cognitive challenges that he does not believe that large language model can solve reasoning, planning, persistent memory, and understanding of the physical world. And that's why he's been working and promoting a new approach within meta to have these models learn like babies, basically by looking at the world and understanding how it performs in order to go beyond these challenges that being said, other, as I mentioned, really smart people do not agree with him. So there's a disagreement all the way around on when and how AGI can be achieved. So you can make up your own mind. Go listen to these people. They all have really good arguments of why they think they are right. when I say all of them, that's other than Elon, that has a history of making announcements, that are very aggressive that did not happen. That being said, if you put aside Yann LeCun for a second, most of the other people believe that this is two to five years time range when we achieve AGI, which nobody is ready for. And I don't think anybody's preparing fast enough for that. Now, if we go back to what Yann LeCun is saying about these four things in an article in the Financial Times and a follow up article in the Business Insider, OpenAI and Meta said that they are close to unveiling new AI models that can reason and plan, which is two of the things that yann LeCun saying are impossible open the ICO Brad LightUp told the Financial Times that the next version of GPT would show progress solving hard problems like reasoning. Meta VP of AI research, Joe Pinot, said that LAMA three, which is their next model that is supposed to be released within the next coming month, will have the ability to talk, reason, plan, and have memory. So some of the other players, not just think it's solvable, but are claiming they're solving some of the problems, at least to an extent in the next coming models. So I obviously can't tell you which one of them is right. All of them are way smarter than me and know a lot more than me about how AI works and developed within the very near future. I think we will learn a lot more once these models are released now I told you we're going to talk about security. So security researchers has conducted experiments to test the security and safety controls of different AI chatbots, including OpenAI ChatGPT and Metaslama Grok. They've used different techniques to try and jailbreak, basically bypass the chatbots safety restrictions in order to get information that they're not supposed to give like creating dangerous materials, building bombs and other fun stuff. And they found major differences between the ability to hack these bots. The safest one was Metas Llama. Next was Anthropic Cloud three. Next was Google Gemini. Only then OpenAI, and in the fifth place, Grok, and in the very last place, Mistral Large, which is an open source model. There are two interesting facts about the last two, number five, Grok and number six Mistral, both of them were released as open source, which makes it even worse, right? Because beyond the fact that it's easy to hack them, they're open source, meaning you can take the actual code and branch out and do something else with it. So even circumventing the limitations completely, the other is as I mentioned before, Grok was developed extremely fast compared to the other models that was mentioned. And obviously when you do things very fast, you have to skip a few steps. One of them is probably makIng it safer. and since we already mentioned open AI, let's continue with them for a second. OpenAI just dramatically expanded their custom model program. So they released that last year, which is a program that we're working together jointly with some large corporations to allow them to train OpenAI's models on specific companies, use cases, data domains and applications. So they have just announced that the program now includes two new components. One is assisted fine tuning, which is going to help organization to set up data training pipelines and evaluation systems to boost the model performance. And the other is custom train models, which is models built from open AI based models and tools to customers that need deeper fine tuning to be trained on detailed domain specific knowledge. Now, they already have several companies working in this new program, and I assume they're going to expand that further. So OpenAI making big steps into having its models run in the enterprise environment, which is maybe the deepest pockets and the biggest frontier from a revenue perspective that these companies can chase.\ We're going to keep on talking on OpenAI, but we'll jump shortly after to Cohere, who are also a big competitor in the enterprise AI space. So OpenAI just released a new cool feature that allows you to edit segments of images created by DALI. So one of the biggest problems that all these image generators have is that if you want to change an image, the only way to do this was to update your prompt, which will recreate that image the image from scratch, meaning it will not look like the previous image. So if you just want to change one small aspect of an image, remove something, change the way it looks, change the clothing on the people, the background of an image day to night in an image, all these things were not possible because once you reprompted it, it recreated the entire image. And it may look similar to the previous one, but it would never be exactly the same. Now, previously, you could do things like that, meaning select a section of an image and prompt just that on tools like Firefly by Adobe. But now you can do this within Chachapiti using DAL E. So if you have the Chachapiti Plus model, the paid version, you can do that. And the way you do this, once you create an image with DAL E, you can click on the image. And on the top right corner, you have a little edit button and then a brush. tool shows up that you can make bigger or smaller. You can paint a section of the image and then prompt what you want to change in that image, and it will change that section. So a very cool and powerful capability, because now you can impact only just one segment of the tool. I've tested it intensively this week and I must say that I have mixed feelings about this. Some of the results were really good and some of the times it didn't change anything at all. And so it's not perfect yet, but definitely a useful tool that is worth trying if you have this kind of need. And the last piece of news from OpenAI this week is that GPT for TurboVision is now available through the API. So it was available on the chat for any paid customer, but it was not available to API users. And now it is available. So a quick recap of why GPT 4 Toolbar is important. First of all, it has a much longer token window of 128, 000 tokens in a single chat. That's about 300 pages of text. It costs about half as what the regular GPT would cost. Costs and, and as I mentioned, now it supports JSON mode to function call vision request, meaning you can upload images with the API and have it analyze what's in the images, which is an extremely powerful capability that I'm sure a lot of people will be very happy for having through the API. In addition, they've announced voice engine technology, which is the ability to generate natural sounding voice, basically mimicking the voice of any person based on only 15 seconds of audio sample of that person. I mentioned that last week, this is going to be an amazing capability if used right and a very scary capability if used for bad things like deep fakes, but this is where we are going. Voice engine was not released yet, but it's probably going to get released soon. And from OpenAI, as I mentioned, let's jump to Cohere. So Cohere is a one of the smaller players in the space. And they have been focusing since day one on serving enterprises with closed models that can be used and trained safely on enterprise data. They've just announced the release of Command R plus, which is their latest model. And it has support for more languages than it had before, and more advanced retrieval augmented generation or RAG capabilities, which means the ability to retrieve information from documents and other company proprietary data, it outperforms on RAG other models like GPT 4 Turbo, Cloud 3, Mistral Large, and it beats them on several enterprise AI benchmarks. The new models is currently available through Microsoft Azure, which is now the third model that is available in Microsoft Azure, other than open AI and Strahl. So Microsoft is definitely diversifying their offering through Azure that initially was just open AI until not too very long ago. And on Amazon, it's currently available only on SageMaker, but not on Bedrock yet. So SageMaker is their middle tier layer for AI development. I assume it's going to make its way to Bedrock as well sometime in the near future. No word about Google Cloud yet, but again, if it's going to turn out to be a good model. I'll be surprised if Google doesn't offer that as well. Going back to the safety research from before, the core focus of Cohere is to provide data privacy and security to enterprises that uses their data. Now, is that going to be enough as a play to keep this company competitive? I'm not 100 percent sure. To give you some more background about Cohere, Cohere raised 445 million valuation. They are claiming a 60 percent growth in revenue in the first 10 weeks of 2024, which sounds very promising, but on an article in the information, it was claimed that at December of 2023, their revenue was just over 1 million, meaning putting them at a pace of about 30 million annually, which is definitely not enough, even with a 60 percent growth to justify 445 million in revenue. That being said, they're expecting with this new model and the new partnerships with Azure and Amazon to hit 1 hundreds of millions of dollars in revenue this year, which I seriously doubt if the 1 million per month is their current revenue rate. whether they will survive or not, I don't know, but they're definitely taking an interesting, unique approach. That being said, that approach is not that unique because, as we just mentioned earlier, OpenAI and everybody else is going to go down the path and the fact that it's offered across with other capabilitIes, so in summary, Cohere is focusing on safety and providing tailored capabilities to the enterprise world. And is that going to be enough to fight off the big boys that have much deeper pockets and are also going very hard towards the enterprise space? I don't know. And since we mentioned several different new models being released, or at least released on API, let's add another one. Mistral, the French company that has been releasing and pushing the frontiers of open source AI just released their latest model. It's called Mixtral 8x22b, which is expected to outperform 8x7b, which was the previous model that was really capable, and it's supposed to be better than the latest models by OpenAI, Meta, and Google. So it's trained on a much larger number of parameters, 176 billion parameters, and it has a 65, 000 tokens context window, which is about half of OpenAI, but it's bigger than the previous model that they've released. As I mentioned, this war keeps on intensifying with Google Gemini Pro 1. 5, which we're going to talk about later, most likely releasing GPT 5 or 4. 5, depending on what they decide to do sometime in the near future, and Meta Llama 3, that is expected later this month or early next month. So definitely the war is on. Now, if you ask yourself how these models are developing and how they're developing so fast, they're developed based on a lot of data. Now in a recent article by the New York Times have reported that leading AI companies like OpenAI, Google and Meta resorted to legally questionable methods to acquire more training data for their models. As an example, they're saying that OpenAI has transcribed over 1 million hours of YouTube videos to train GPT 4. And that's despite knowing that it probably violates the copyright rules of these videos. Google and Meta did similar things, transcribing YouTube videos and using them to train their models. Now you may think that because Google owns YouTube, they can do that, but the reality is they cannot because when you upload a video to YouTube, you agree to their terms and conditions that are saying that they're not going to do that. So Google actually went ahead and tweaked its privacy policy to expand data usage rights for itself, which was done after the fact, meaning people who have loaded videos two years ago, never agreed to their videos being used to train Google's models but that's not even the end of the story. The end of the story is based on current projections in this article. They're saying that these companies are outpacing the content creation pace, meaning they're training models on more and more data faster than humans are generating data and they may consume all content possible 2028, so the question is, how do you keep on training the models to be better? If you're running out of data and definitely running out of, Access to data because that's now in the spotlight and you may not have access to everything that you want. part of the solution is training these models on synthetic data. So it's data generated by the models themselves based on previous data in order to train the models. And there's multiple companies that are pushing that frontier as well. Is that going to yield the same quality results that these companies are expecting it will yield? I don't know, but that's where we're going. And now let's talk about meta. So in the past two weeks, we talked a lot about the very aggressive poaching on employees between the big players around AI development talent. And the bigger losers this past week was Meta. Meta lost three of their top AI employees to X, which is Elon Musk's company. I hope I'm not butchering their names, but Devi Parikh, which was Meta's senior director of generative AI, Eric Major, which is the company's senior director of engineering, who led its machine learning research team, Das, who is a research scientist on META's fundamental AI research team. So three major players in META has left 2x. That's in parallel to the news that Mark Zuckerberg himself is attempting to recruit researchers from Google DeepMinds to fill up the gaps. And on the same topic, Elon Musk this week was accusing OpenAI for aggressively recruiting Tesla engineers. With high paying offers, leading Musk to have to increase salaries to top Tesla engineers. So again, on this front of talent, the war is also very much on. Now, since we mentioned Meta, let's talk a little bit more about them. Meta has changed their deepfake platform. Playbook, basically how they approach deepfake and the main change is they went from deleting or disqualifying AI data to applying a made with AI label to more types of AI generated, including deepfakes and adding additional contextual information labels to content that has been either created or manipulated with AI that possess a high risk of deceiving the public on important issues. So they're shifting away from. Deleting content to share that content while letting people know that it's AI created. They believe, and I'm quoting, it's a better way to address this content. Whether that's true or not, I don't know. I think they just understand that it's going to be impossible to fight the creation of AI generated content. And if they want to stay in the game of content sharing, which is their only game, they have to allow this content, but they will just label it. Can they really catch. All the content that is AI generated. I don't know. I think it's very questionable, but I think that's the world we're walking into where I said that many times before, we will not be able to tell what's true and what was created with AI, which is. Extremely problematic from a social perspective. And if meta at least are making efforts to identify and detect this kind of content and let us know that's the case And hopefully the other players will do the same thing We at least have a fighting chance to benefit from this technology While avoiding some of the very serious risks that it represents. And from that to a lot of news from Google. So Google just held its cloud next conference and it has revealed many new things that it's doing in general and specifically in AI. So let's start with code assist. Google just unveiled its new code assistant capability. It's a rebrand of its previous duet AI code system service, and that adds to again a very fierce competition on the code generation capabilities. We mentioned two companies just in the last two weeks, one open source and one's closed that are in the frontier of that, together with obviously GitHub Copilot and others. So what are the key features of their code generation assistant? First of all, it includes the incredible token window of 1 million tokens. That is way larger than anything else in the industry and hence allows to upload and work with significantly larger blocks of code. It can also be fine tuned on the company's internal database. And that's similar to co pilot enterprise that was released a few months ago. it Can be utilized with code bases across on premise GitLab, GitHub, Bitbucket and GitBucket, which are all the main repositories where content can live and it integrates with knowledge bases, such as stack overflow, data dog, elastic, and others. So overall, an extremely powerful code generation model. You heard me say this on the show many times before, writing code is one of the most advanced capabilities of these models because it's a very well defined universe and hence these models, if they're trained properly, become very good at this. Now, you also heard me say that the problem that it's going to generate is we may not need junior data coders in the near future. And that raises the question, if we don't have junior data coders, how will we have senior data coders? And after that system engineers, Google just announced Gemini cloud assist to help teams to design, operate, and optimize their application lifecycle. Their application lifecycle through AI generated architecture configurations and diagnostics. Where is this going from the whole ecosystem and universe of humans writing code, designing systems, and creating software? I don't know if anybody knows, but we're walking with open eyes and very fast into very serious disruption in that field. Google also made several big announcement about changes in Google workspace, which is their platform for businesses. I use Google workspace in my business, and I also have the AI capabilities on top of that. So one of the things they released is Google vids, which is a new AI powered video creation applications for work that can create videos through storyboarding and scene creation with voiceovers. And these videos are supposed to be used to create business related applications like training videos, marketing videos and so on. This is expected to be released in June of this year to Workspace Labs. Another interesting feature that they released is automatically translated captions in Google Meet. So it enhancing that capability that existed before, before they had only 17 different languages, now 69 different languages are going to be included, which means you can have a Google Meet with multiple people around the world speaking different languages and get translations in real time to everything that's being said. They've also added AI security add on for 10 a month that will monitor privacy that is supposed to identify, classify and protect sensitive files in Google Drive using AI. They've also released two very cool new features for Gmail. One of them is voice prompting. So if you're on the go, whether you're driving, walking your dog and doing something, you can prompt through your voice, through your phone, Your gmail account to write an email for you on a specific topic, which I found really cool. And the other one is instant polish, which allows you to take a bunch of notes that you scribbled and turn them into a draft of an email within gmail. They also announced a lot of other goodies within Google Docs and Google Sheets, but we don't have time to go into all of them. Just go into their announcements and you can see everything else that they've released. Now, Google also announced an update to Google Gemini 1. 5. Bro, which is their free version, which is currently better than their paid ultra version. So Google pro 1. 5 just got the capability to hear, which they're claiming is allowing you to understand and extract information from audio files without transcribing it. so far, we had to transcribe models in order to analyze the data in them in order to use them in AI models. And they're claiming that Gemini 1. 5 Turbo doesn't need to do that because it literally understands the conversation as it is happening. Google's also updating Their imagine text to image generation. That was a big controversy with the whole walk approach that we discussed a few episodes ago, but now they have new features like in painting, meaning the capability to change stuff within an image out painting, meaning paint what's outside of an existing image. As well as removing and changing elements in images. So all these capabilities did not exist before and are going to exist in the new Imagine Text to Image version. Imagine 2 will also come with their new Synth ID, which is a digital watermark feature that allows to detect images that were created with a I. The only problem with that is that it's only working on Google's platform, meaning it doesn't know how to detect all the other tools. I really hope that sometime in the near future, there's going to be either through government or through collaboration by these big companies, an agreement on what's a standard digital watermark that cannot be removed from these files that will allow to detect them easily across every single platform. Another thing that Google shared is that they're working on a way to allow Gemini responses to be up to date using Google search, meaning addressing some of the biggest problems off current models that are always looking at outdated information. So if Google will be able to look at its training data in combination with real time research that prevents that problem, which is a big problem with most models right now. And now to another thing that's very practical to all of us, Google just released a 45 page prompting guide to help users effectively utilize Gemini that works in its chatbot and workspaces like Gmail and Google Docs, etc. There's nothing really new there that people who know how to prompt did not know before, but the five main things are define the persona, basically who the chat needs to be in order to do the specific task, define the task in the best detail as possible, provide as much context to the model and define the format in which you want the outcome to be. It also recommends to speak very naturally. And to break down complex tasks into multiple iterative steps. Again, nothing new, but highly detailed with a lot of examples on how to do specific business use cases. So if you're not very good at prompting and you want to learn, that could be another very good resource. There's a similar guide from OpenAI for a very long time now. And about two weeks ago, Anthropic released a similar guide on their side, so there's a lot of free resources to learn from on how to get better on prompting. And the last thing I want to mention about Google, which will lead us to the last topic of this episode, is that Google just announced the development of its own ARM based CPU processor called Axion, which is going to be optimized, you can guess for AI generation. So as chip shortage and the need for these AI chips has driven NVIDIA to be a 2 billion company because they had the best capability to both train and use these models, but now all the big players are developing new chips for that. So Google's action is supposed to provide 30 to 50 percent better performance at 60 percent better energy efficiency. This comes in parallel to similar announcements from Nvidia, Intel, Amazon, and Microsoft. So all of these companies are now developing new chips that will hopefully help to a reduce the amount of energy. These things are consuming and B will allow us to develop better and better AI models. On the same topic, intel just announced the release of a new AI chip called Gaudi three at its conference that they held in Phoenix lAst week. And they're claiming that Gaudi three is 50 percent faster at training and 30 percent faster at processing large language model compared to NVIDIA's H100 GPUs, which is the most commonly GPUs right now. But as we know, NVIDIA are just releasing their newer model that is already faster and better. So again, this thing is accelerating very fast. Staying on the same topic, Meta just announced that they're releasing a new version of their AI chip that they're calling Metatraining and inference accelerator, or for short, MTIA version 2 is obviously running faster, better, less energy than version 1, which is the chip that they had before, But these chips are built specifically to help them rank results faster and using AI for Meta's regular historical needs. And they're claiming that they're planning to eventually expand these chips for the capabilities to train and use AI models like Llama3. So in summary, that front of chip generation is also at full throttle with Google, Microsoft, Amazon, and obviously NVIDIA and Intel are developing new AI capable chips and other accelerators that's it for this week. I know this show was longer than usual, but really a lot of important stuff has happened week. The fierce competition across every aspect of the AI world is intensifying and things seems to be moving just faster and faster. Where's it leading? as we know, a lot of good things will come out of it, but there's also a lot of scary aspect of this. If you've enjoyed this episode and if you like this podcast in general, I would really appreciate it. If you share it with people, you literally pull your phone out right now, click the share button and send it to four to five people that you know that can benefit from the show. And while you're at it already on your podcasting app, I would really appreciate it if you review this podcast it really helps us get to more people and helps more people learn about AI. So hopefully together we can drive it for the better versus for the worst. On Tuesday, we'll be back with another fascinating deep dive interview, talking about practical ways you can use AI in your business. And until then have an amazing weekend.