Leveraging AI

73 | NVIDIA's Project GR00T to revolutionize humanoid Robots, Sam Altman shares expected releases this year, and many more important AI news for the week ending on March 24

March 24, 2024 Isar Meitis Season 1 Episode 73
Leveraging AI
73 | NVIDIA's Project GR00T to revolutionize humanoid Robots, Sam Altman shares expected releases this year, and many more important AI news for the week ending on March 24
Show Notes Transcript

In this episode of Leveraging AI, Isar Meitis discusses groundbreaking developments and the potential future of artificial intelligence, focusing on the implications for businesses and personal lives.

IMPORTANT: Don't miss this 2nd open publicly available cohort of the AI Business Transformation starting on April 1. We won't know when the next open-to-public will be offered so check it out here: https://multiplai.ai/ai-course/ 

Topics we discussed:

  • NVIDIA's Project Groot and its potential to revolutionize the field of humanoid robots.
  • Insights from Sam Altman's interview with Lex Friedman, highlighting OpenAI's future directions.
  • Elon Musk's company, XAI, releasing the AI model Grok as open source.
  • Satya Nadella's perspective on Microsoft's independence from OpenAI.
  • Mustafa Suleiman's new role at Microsoft and its significance for AI integration.
  • Advanced video generation technologies from Google and Stability AI that could reshape media and raise deepfake concerns.
  • Apple's research on efficient AI training methods, promising less resource-intensive model development.
  • Devin.ai, a groundbreaking tool that could democratize software development.
  • Open Interpreter's new voice interface, hinting at the future of human-computer interaction.

Stay ahead of the curve in the AI revolution. 

About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Isar Meitis:

Hello and welcome to a weekend news edition of Leveraging AI, the podcast that shares practical ethical ways to improve efficiency, grow your business and advance your career. This is Isar Meitis, your host, and we have a very exciting week of news from the AI world. Some really interesting news the biggest personas in the AI world, as well as very interesting news from different companies and new developments. Before we get started, I would like to remind you that our AI business transformation course, open public cohort is starting on April 1st. So roughly a week from the time this podcast gets released. If you are interested in learning how AI can impact your business, or if you're looking for a way to advance your career with the developments of AI, this course is an incredible kickstart to that process. We teach everything from the basics all the way to frameworks on how to implement AI strategically in your business across different departments and everything that comes with it. We have been teaching these courses since April of last year. Most of the courses we teach are private sessions for companies and organizations who book it in advance. So we have opened to the public courses roughly once a quarter. We did one in the beginning of January. We're doing one in the beginning of April, and I'm not sure when the next one is going to be. So if this is something that's interesting to you, and it should be, check out the course. There's going to be a link in the show notes. Don't miss it out because again, I'm not sure when the next one is going to be. We're fully booked for the next three months with private access closed session courses. Speaking of which, if you are a head of an organization or association or company, and you're looking for a way to dramatically advance the AI knowledge of people in your organization, reach out to me on LinkedIn, and I will gladly let you know how we can set this up. And now for this week's news. we're going to start with the biggest news of this week. NVIDIA came out with a huge announcement this week. They have announced what they call Project Groot, which is actually not said Groot. It's just spelled GR00T. And the reason they're saying GR00T is so they don't get in trouble with Marvel and Disney. But if you try to read GR00T, it spells GRUT. So GRUT is a new general purpose foundation model for humanoid robots that is built on NVIDIA, both software and hardware infrastructure. and what they're developing is basically an operating system and infrastructure for humanoid robots. The goal of this platform Is to create an infrastructure for all the capabilities of humanoid robots, including understanding of natural language, human movements, learning new skills, etc. all this infrastructure that is going to be required for any new development of humanoid robots is coming from this NVIDIA platform. And as I mentioned, it will include more hardware and software or what NVIDIA calls System On a Chip or SOC. And that is optimized for performance and power specifically geared towards humanoid robots. Now, in addition, they've announced multiple upgrades to their Isaac platform that they've announced previously. That is a set of foundation models and a set of tools that will allow other companies to develop new models and new AI capabilities on top of their architecture. They're expecting the new Isaac platform capabilities to be released in the next quarter. So this is not sometime in the very far future. This is coming in the immediate future. If you have been following everything that's happening in the humanoid robots world, you know that this is accelerating very fast with some of the biggest players in the world jumping in, but also smaller startup are coming in. And this new architecture will enable new startups and companies and even existing established companies to start with a very solid starting point. It also places Nvidia in a very interesting junction, not just for training and running models which drove their huge growth so far, but also as the next infrastructure for humanoid robots, which is the next frontier that will allow AI to not just address knowledge, work, and knowledge capabilities, But also do things in the physical world and if you want address blue collar jobs as well. As I told you, there have been some very interesting interviews and announcements on the personal level of people at the top of the totem pole when it comes to AI development. I will start with Sam Altman, The CEO and the founder of OpenAI, he was interviewed by Lex Friedman this week. I highly recommend listening to the podcast because there's a lot of nuances, but they talk more or less about everything. They talk about the issue Sam being fired and then put back as the CEO of OpenAI. They're talking about the relationships and the lawsuit with Elon Musk, and they're obviously talking about new and existing developments of OpenAI. It's a fascinating interview, if you want to understand. what's going through the head of the person that's leading the charge of AI development in the world today. I want to share two important aspects the way I see it in this interview. The first one is that Sam is really driven by trying to make our world and human lives better with AI. I after listening to this interview and other interviews he's done in the past, I think it's very clear he's very sincere. In his wish to make humanity better using AI tools. I think it's also clear that he believes that it will require some serious governance that may or may not exist right now. And he admits that in order to control what these systems can do, whether through a company, an organization, governments, international cooperation. But I think it's very clear listening to him that he's still not sure that there's a clear solution for that and yet they're moving forward very fast in the development. So that's on the conceptual side. On the very practical side, he wouldn't release any clear information or when or what they're going to release and he wouldn't name what exactly what's coming in GPT 5, but he did say when Lex asked him about what he thinks about GPT4 he said, and I'm quoting, it sucks, which really tells you that what they have right now is probably significantly better or the things they're seeing for the future are way better than what we see in GPT 4, which is still most likely the most capable model we have out there today. Some would say that Claude 3 is better, but it's still top of the line model. And if Sam Altman is saying that it sucks, it means what they have right now is so much better that it will make a very big difference. He said that they jump GPT 3. 5 to GPT 4. is a jump that is not even as significant as we're going to see going from GPT 4 forward. But, as I mentioned, he wouldn't name exactly when things are going to be released, whether asked about Sora, or when asked about GPT 5, but he did say that they're going to release something significant that is going to be much smarter, and you can decide what that means, Still this year. Specifically about Sora. He's saying that it still has a lot of issues. And despite the fact that it looks amazing and everybody's really excited, there are still a lot of issues that they're still fighting through, and it's still not good enough to be a product that can be released to the public. So overall, very interesting interview, giving a glimpse into how Open AI thi nks and approaches the development of new models how much they feel the responsibility of doing it right as well as the fact that we're most likely going to get a really advanced model from them sometime this year. As II mentioned, as part of the interview, Sam Altman refers to his interesting and complex relationship with Elon Musk and everything that happened with Elon leaving and now Elon suing the company, but in a very interesting timing, when Sam is being interviewed by Lex about this topic, Elon Musk's company XAI has released the source code and everything you need of their AI model Grok. They literally took everything they have other than the training data itself. So everything they have, meaning The actual source code, the base model weights, the network architecture,$314 billion parameter expert model and have released it to the public to be used however people want. That's as part of obviously Elon Musk's statement that these really powerful model have to be open source in order to ensure that the future of humanity, the way he refers to it, but at least the benefit for all humanity versus closed models that will benefit a few giant companies, which is the reason why Elon Musk jump in and finance the beginning of Open AI is to be a counterweight to Google after they bought DeepMind and made it into a closed source controlled environment by Google. He wanted to have an open source alternative, and that's why he joined the team to start OpenAI. And that's part of the reason for the beef between Elon and Sam Altman right now. As I mentioned, in a very interesting timing, they've just released Grok. Now, Grok hasn't proven yet to be a very powerful model. They've developed it extremely fast, which is interesting. But they haven't shown anything that is as impressive as some of the leading models right now. That being said, I will never bet against Elon Musk because everything he has done so far has been extremely successful after he pushed it far enough and hard enough, and he's definitely pushing hard on this very particular angle. As of right now, Grok is a fully open source model, one of a few that could be a very powerful player in the future. Another very interesting person In this AI race is obviously Satya Nadella, the CEO of Microsoft, and he was quoted saying something very interesting about their relationship with Open AI. Most of what Microsoft is releasing right now as part of their AI capabilities is based on OpenAI's ChatGPT, architecture, infrastructure, models, and so on, and they're including them as part of everything Microsoft from Copilot to operate to future operating systems to Office 365, AI capabilities, and so on and so forth. There's been a lot of negative feedback from within Microsoft to the deep reliance on Open AI. And obviously there was all the questions about Open AI themselves with the issue of firing and returning and the control of the board over everything happening in Open AI by back at the end of last year, as well as the current lawsuit by Elon Musk and so on. So Satya Nadella was just recently asked about it and he said the following. We have all the IP rights and all the capability. I mean, look, if tomorrow Open AI disappears, I don't want any customers of ours to be worried about it, quite honestly, because we have all the rights to continue the innovation, not just to serve the product, but we can go and just do what we were doing in partnership ourselves. And so we have the people, we have the compute, we have the data, we have everything. Well, this Is not surprising. It's at least putting it out in the open that while OpenAI has benefited, obviously, from this partnership with Microsoft, Microsoft made sure they are fully covered in this relationship. And as Satya Nadella says, if OpenAI disappears tomorrow, Microsoft can continue from that point. And in a very interesting development on the same topic, Mustafa Suleiman, Who was one of the co-founders of Google DeepMind, who has left DeepMind to found inflection ai. The company who gave us Pie.Ai. You can go and visit the website at heypie.ai. Inflection was developing a new kind of AI model that will be more personal companion, that you can have real deep conversations with. A lot of people really Pie for that reason, because you can really have a deep conversation with it, including over voice, which is very interesting and different. If you haven't tried it, I suggest doing it because it will give you a glimpse on what these systems will probably do in the very near future. But Mustafa Suleiman just announced that he's leaving his position as the co founder and CEO and Inflection and moving to Microsoft to serve as the executive vice president of Microsoft AI and he's going to be joined by several other senior leadership team that are leaving Inflection together with him. He's going to report directly to Satya Nadella, which shows you the importance of this role. He is going to be in charge of everything AI Microsoft including developing future consumer products and AI efforts that puts under why umbrella copilot and being an edge and everything else that they're doing with AI right now. It was very clear that Microsoft is all in on AI and everything that they're doing is going to be AI based moving forward. It had made them the most valuable company in the world right now. so that definitely tells you that at least from a shareholder perspective, they're moving in the right direction and putting somebody with that level of experience in both the research side from DeepMind as well as creating products from Inflection is a very logical move, and I'm sure it will yield very interesting results for integrating AI into everything Microsoft does. And from these big tectonic moves or announcement for big people, there were a lot of small research related news this past week. Many of them are video related. So Google researchers just shared that they have developed what they called Vlogger, which is an AI system developed to generate lifelike videos of speaking people. So an avatar of people, including the gestures and the movement and so on from a single photo and using that photo together with text to generate A complete video clip of that person speaking, they have trained that model with over 800, 000 diverse identities and over 2200 hours of video, which, per them, enabling it to generate videos of people from varied ethnicities, ages, clothing, poses, surrounding environments, and so on, without bias. Those of you who have been using these kind of tools, like Synthesia and Heygen know that these tools are already very capable. And if Google now develop the capability to do this at an even a better quality, It is very promising and at the same time, very disturbing. The reason it's very promising is it allows to create training videos, marketing videos, product explanation videos, or any other video you want to create any person with for any target audience, very easily without the need for camera and lighting and editing and so on. So the creation of videos for specific needs that we have today will become significantly easier. easier. That being said, it also generates very significant concerns because deepfake will also become very easy because you'll be able to take a single image of any person and make them say whatever you want them to say in whatever scenario you want to create. And so that's obviously very troubling because right now, and I don't see that changing in the near future, there's no real way to detect these videos. Now, right now, they're not fully realistic and you can tell that they're not real, but sometime in the next six to 18 months, we will not be able to distinguish between a real video and a fake video. And those of you have seen Sora know what I'm talking about. So combine that with the research capabilities from Google, and you have the perfect storm on one hand, amazing efficiencies, on the other hand, very serious risks of deep fakes. Staying in the field of video generation, Stability AI just released what they are calling SV3D, which stands for Stable Diffusion 3D, which is an AI model that enables to generate 3D videos building on top of their previous model that was called stable video diffusion. So what you can do right now is do one of two things with SV3D. You can either create 360 orbital video of e commerce object. So the trick here is obviously keeping the consistency of the object from 360 degrees without actually having the object. So that was the core of the development and the innovation that they are sharing. You can actually do a 360 degree video of a thing that you want to sell without having the thing and without actually having the capability to shoot a 360 orbiter video. The other part of this model allows you to actually define the 3d path around the object and have the camera move however you want it to move versus just a 360 orbit and still shoot the 3d object. while having it staying consistent and while having the background move in a way that looks realistic. So we are moving forward to a situation where the consistency of object, which is the biggest issue with creating videos with AI, is getting resolved from multiple angles, whether it's Google or Stability AI, or MidJourney that announced something similar. So MidJourney announced that they are working on a new 3D video and real time creation of models that will allow to simulate the entire world. So those of you have seen the Sora, examples from OpenAI and have been in the discussion. The discussion basically said that what Sora is doing is not just generating video, it's actually simulating the real world, hence allowing it to generate these highly realistic videos. And this is exactly the direction that MidJourney announced that they're going. Those of you who don't know MidJourney are currently the top image generation model as far as creating realistic images and they have shared this news in part of their office hours on discord where people asking them questions and they're sharing news about what they are working on. So this new development is focused on creating a real world model that will allow people to create video games and shoot movies and do everything they want in like a sandbox of the real world. They also shared that the jumps from version six to version seven in the capabilities of MidJourney is going to be significantly better than the jump from version five to six and so big, new, interesting developments coming from Midjourney. They also shared that the capability to create 3D models most likely is going to arrive before the capability to create videos. And this comes shortly after they released another capability into version six, which is the ability to create consistent objects and people across various images. So as I just mentioned, the biggest thing here is the ability to create consistent objects, people and consisting items and consistent background across the video and across various angles, because that is the key to creating a highly realistic video and looking at Sora and listening to these kind of news on what's in development. This is the direction that we're going and we're most likely going to have this capability still this year from probably multiple companies. And if speaking about research, a company that we don't talk about a lot, but is definitely doing a lot, at least behind the scenes is Apple. So Apple just released an interesting research where its researchers have developed what they called MM1. It's a set of large language models, but the main point of the research was to show that the data that is being used to train the model is as important as the amount of compute or the amount of parameters you're going to use. And what they've done, they were able to achieve state of the art performance by using couples of data, such as images and text and voice and text and so on, and by doing so, achieving very successful results with significantly less parameters and significantly less compute. In the past few weeks, we shared several different companies and several different processes on the research side that are showing multiple ways of how to achieve these amazing results that these models are achieving or that needs to achieve in the future to achieve AGI. And while investing less resources, which is very important because right now the process of training the models and running the models requires a huge amount of compute and a huge amount of resources and hence have a very bad impact on the world. So learning how to do this more effectively such as these methods that were just recently shared by Apple's researchers, is a very important step in the right direction. And two small but very interesting releases that happened this week One is Devin. ai was announced and Devin is what they call the first ai software engineer It can generate complete source code from a single prompt. It can generate hundreds of lines of codes and perform the debugging and handle the deployment Off the code. So this is a very step forward from the code generators that we had before that could generate snippets of codes that you had to put together and rig. The other very interesting thing about this tool is that it's basically an agent that is geared towards creating code. So the tool can search the Internet and go through tutorials to learn how to accomplish different tasks and troubleshoot issues with its existing code or existing code that you give it. So it is a software engineer in a box. It can really do all the steps of the process, including developing new algorithms based on existing algorithms by learning how they work, understanding the needs, understanding the gaps, searching the internet, and then creating new models based on all of that information. The direction is very clear. Creating code, creating software will be something extremely different than it is right now in the future. I don't know how far into the future and I don't know how much of software development will change, but most likely. Almost every person that wants to will be able to create new applications and new software using natural language and everything else that comes with a software will be done by the AI. Will this actually replace software architecture on a large scale? I don't know. This new tool definitely hints that this is the direction this is going. I think that's going to be very interesting. The biggest difference between major big software and small applications, everybody will be able to develop. Major software will probably still require some major architecture and planning beyond what AI can do at least now, but that doesn't mean it won't be able to learn that as well in the next few years. And the last thing, a company or group called Open Interpreter has released an open source code to a new, very interesting tool, which is a voice interface that controls your home computer. It's a little device that you can wear on you or hold in your pocket, however you want, that can understand your voice when you speak to it and can take actions from your home computer with everything in it. So that includes access to the internet to search anything you want. The ability to connect to your emails, to your calendar, and basically any piece of data that you have on your computer and take actions on your behalf while you are not next to your computer, or in theory, while you are next to the computer, it's just an easier user interface. So two very interesting things about this. I think the direction this is going is very clear. We will be able to do a lot more by just talking to computers. Exactly how the interface is going to look like, whether it's going to be a wearable or our phones or something that will replace our phones, or whether it's going to talk just to the cloud and the world and the universe or to our personalized computers Or maybe all of the above is something that still takes time and will evolve into something that I think will become common, just like cell phones are not common, are now common, but the direction is very clear. Voice communication, natural voice communication will control everything in our future lives. And it's just a matter of time until this becomes a product that we use every single day. That's it for this news edition. We are coming back on Tuesday with a fascinating episode, interview, deep diving into a specific topic. So don't miss that. And a final reminder, the AI business transformation course is starting in about a week. Don't miss out on that because the next one may be months out. So check out the link in the show notes and until then have an amazing rest of your day.