Leveraging AI

90 | How To Use AI to Chat With Your Data with Joanna Stoffregen

May 21, 2024 Isar Meitis, Joanna Stoffregen Season 1 Episode 90
90 | How To Use AI to Chat With Your Data with Joanna Stoffregen
Leveraging AI
More Info
Leveraging AI
90 | How To Use AI to Chat With Your Data with Joanna Stoffregen
May 21, 2024 Season 1 Episode 90
Isar Meitis, Joanna Stoffregen

Joanna is an expert in leveraging Large Language Models (LLM) combined with RAG (Retrieval Augmented Generation) turning data into knowledge and action. Discover what it really takes—and the hidden but high costs of NOT implementing such solutions.

This webinar is tailored for business leaders who are considering or are in the midst of deploying LLM technology. We will cover every step of the project lifecycle, from initial planning and prototyping to testing, deployment, and ongoing maintenance. Joanna will share detailed examples and insights on tool selection, highlighting the pros, cons, and lessons learned from her extensive experience.

Expect to dive deep into the real costs and implications associated with NOT using AI to understand the insights hiding in your data. You'll gain invaluable perspectives to guide your decisions.

Joanna's recent viral posts on LinkedIn have sparked a widespread discussion on reducing LLM costs and optimizing project strategies. In this session, she will expand on these themes, providing clarity and actionable advice.

About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Show Notes Transcript

Joanna is an expert in leveraging Large Language Models (LLM) combined with RAG (Retrieval Augmented Generation) turning data into knowledge and action. Discover what it really takes—and the hidden but high costs of NOT implementing such solutions.

This webinar is tailored for business leaders who are considering or are in the midst of deploying LLM technology. We will cover every step of the project lifecycle, from initial planning and prototyping to testing, deployment, and ongoing maintenance. Joanna will share detailed examples and insights on tool selection, highlighting the pros, cons, and lessons learned from her extensive experience.

Expect to dive deep into the real costs and implications associated with NOT using AI to understand the insights hiding in your data. You'll gain invaluable perspectives to guide your decisions.

Joanna's recent viral posts on LinkedIn have sparked a widespread discussion on reducing LLM costs and optimizing project strategies. In this session, she will expand on these themes, providing clarity and actionable advice.

About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

GMT20240516-154739_Recording_as_3440x1440:

Hello everyone and welcome to a live episode of leveraging AI, the podcast that shares practical ethical ways to leverage AI, to improve efficiency, grow your business and advance your career. This is Isar Mehtis, your host. And maybe the question that I get asked the most from people who are a little more advanced and started playing with AI is. How can I chat with my data? Like I have all these data sources and I have data in my CRM and I have data in my emails and I have data in my ERP and I have data in my Google analytics and et cetera, et cetera, et cetera. And I want to be able to chat with my data and ask holistic questions. And I don't know how to do that. And most people don't know how to do that. But the reality is beyond the fact that I get asked that a lot. It's a huge need and not being able to access your data in an effective way actually costs businesses a huge amount of money in overhead mistakes, missing clients, losing clients. And so on and so forth, just because they cannot get the right information accurately. And in a timely manner, a problem that having a capability to chat with your data across everything solves. So how do you do that? So I'm really excited to tell you that our guest today, Joanna is an expert on exactly that. Her agency helps businesses develop these kinds of solutions. But what we're going to talk about today beyond the. Custom develop solution. There are other ways that you can develop on your own with tools that exist in the market. You can start very small, test things out and then slowly upgrade to more and more advanced capabilities. This is obviously maybe the holy grail for businesses when it comes to using AI and machine learning is the ability to really know what's going on across. Everything that you do in your business from finance to customer service, to marketing, to projections, to emails, to et cetera, et cetera. And hence, this is obviously a very important topic. So I'm really excited and honored to welcome Joanna to the show, Joanna. Welcome to Leveraging AI. Hello. Hello. Thanks for having me excited for today. Yeah, listen, I, you and I have been talking about this for a while and the content that you share on LinkedIn on this topic is actually pure gold. And I think most people hear the, the technical aspects of this. So it's called drag. I'm like, okay, I don't know what that is. okay. It's a, and, Once people start diving into the technical stuff, they get really scared. And so they stop, even though it's could actually be very simple. So I know you prepare like a full kind of like step by step process to take us to first of all, why, and then how, and then other considerations like safety and security and costs. So I'm sure everybody's going to join us. He's going to learn a lot. So let's get started. I will give you the stage. Thank you. So yeah, let's get started. I will now demystify RAC for businesses and hope you guys will get value out of it. let me share my screen. can you see it? Yes. And by the way, those of you who are listening to this as a podcast afterwards, and you cannot see the screen, a few comments about that. First of all, join us on the next live. We do this every Thursday, almost every Thursday at noon Eastern time. So you can join us and then you can see the screen and even ask questions and participate. So that's number one. Number two is we're going to describe everything that's on the screen. So you can understand if you're driving and so on. And number three, you're We have a YouTube channel under the multiply brand. It's, just in spite, instead of a Y it's spelled with an. AI in the end. So if you want to watch this afterwards, once you're done walking your dog or doing the dishes or yoga, whatever it is that you're doing, listening to this podcast, you can later on, go and watch it again on YouTube, but let's dive in. Let's dive in. And I will also try to make it more understandable in case someone listening to it in podcast afterwards. okay. So we start basically with, the billion dollar problem that And I want to, what is not to move now, and I want to introduce you basically to Julia. so Julia is a product marketing manager that works for softtech. com is a tech company based in Silicon Valley, has 300 employees. She's working remotely from Berlin and in her day to day, she actually, talks with a lot of teams to do her job, like design team, tech team, sales, marketing, and of course uses also a lot of apps. For communication, task management, like Gmail, Slack for team communication, Jira for, product tickets, she has tasks in Asana, Figma. So basically all this familiar apps that we all know, if we're working in tech or not. So today, basically, Julia, she, has, she was working on a promotional campaign to launch their new product feature. And she urgently needs basically a market research report that, you know, that her colleagues, Prepared a month ago or so, so she remembers that actually someone sent it on, on, on Gmail. So she looks there, but she cannot find it. Then she thinks, okay, maybe in marketing, Slack channel. So she searches there, doesn't find it either. Then she looks again, and maybe in Google drive, she said, maybe it's in the marketing folder. Does not have success either their common scenario. It happens to me all the time, even if I'm not even working in such a big corporate. So then she, remembers that actually her colleague, Max, my actually an overshare because he's also involved in this project. So she decides to slack him. So she 20 minutes. There's no answer. Then actually she remembers that some is a current in San Francisco is 3am there. And actually, there, she will take a lot of time and she will get her answer. So frustrated, she basically decided to move to another task, but actually she lost a lot of valuable time. She's now behind on her project and she might even, lose the deadline for the submission of her project. So very common scenario that I guess, a lot of, us, working in tech and in big companies or like a smaller companies experience, right? And we have also this The data to support that, not our research, but, from the data Institute Corporation, they found out that 2. 5 hours per day on average is spent, basically on searching information. So imagine like one out of five days of work. We basically spend or the average worker to search through different apps to find the information that they're looking on average, we use like around 11 apps and it takes around eight searches before basically we find the right information. And that obviously has a lot of costs to it, because time is money, right? their estimate that a company that employs thousand employees loses, around 2. 5 million per year, On actually this problem that we are not able to find to search and retrieve the right information that has obviously impact in not being able to, find the right information for, your project like Julia, or, you might miss an important update because you did not get the information on time. Fail to use basically important updates of your project. So all of that basically also making wrong decisions because you don't find the data. It costs a lot of money for companies. And this is basically the billion dollar problem that Rack solves and also companies like Gleam, help to solve. Gleam is a 2. 2 billion dollar company. So hence the, they're working on a very, good problem to solve and just basically to explain you a bit on, on high level, what Gillian does and not just Gillian, there's a few companies that do the same, they help employees like Julia, essentially search through, Their company data and retrieve this relevant information. So for instance, we can see here, there's this, employee, that he's looking for the status of a project. And normally what he would do, he will search Slack. He will search Google docs to find this. Status, he might search, the, GitHub repository to see the commits of the code. But now with the solution that also clean of like technologies rack, pre, solve is that he just asks a query and then the system provides him. Basically all the information in natural language in one place to see. So imagine, you don't wait, it's directly, you can ask anything you want, the system, because it knows basically what, your company knows, so this is also their promise, know what your company knows. yeah. I want to touch, I want to touch on one short thing from everything you said. every single company is sitting on a goldmine of data that is just not accessible today. And even small company people, I know a lot of people who are listening don't run, billion dollar companies, they run, a 20 million company, a 50 million company, a 100 million company, and they're like, I don't have any unique data or any proprietary data. And that's not true. Because every email of every employee. Every phone call that you have recorded for, customer relationships, every proposal you've ever written, every data in your CRM or your ERP, all of that can be helpful in making future decisions and understanding what's currently happening in the business and so on. And it's right now requires a lot of time and effort to do the analysis across these platforms. Just because it's built into silos. So yes, we do this every now and then there's like a project manager who will spend once a month and once a quarter writing a review by collecting all that data, it's I am consuming, it's not a hundred percent accurate and it's not accessible in real time when you need to make decisions, like it's once a quarter when the project manager did that review. And it, again, may not, may or may not dive to information that other people need and having a solution like this literally enables, think about having a Slack channel with your entire data. Where you can literally ask any questions about anything and you don't care. Where it's stored and what is the source. And this is exactly what this thing does. And then every employee in the company, if you're the customer service person, you can find out the answer to answer the call you're on right now. If you are the VP of marketing, you can understand stuff about your recent strategies that you deployed. And if you're the CEO, you can understand trends across everything that's happening in the company. What are things that are happening? What's not moving fast. Like literally any questions you want to ask, if you connect the right data sources, you can ask and get. Answers. Accurate answers in seconds. So it's a game changer compared to everything that we know. Exactly. And also another very common use case for salespeople before actually they're, before they having the, they're called with the lead. They need to access a lot of information in their CRM in, in, in Slack and in their product to really understand what is the, their company and the lead about. And then spend a lot of time, and now basically, systems like, like rack systems. You could just write, Hey, give me all this information about Lead X and it. Pops it up for you. so super helpful. Yeah. And also you don't need to be a hundred, a million company or whatever. Even I know companies with 10, 20 people that have exactly the same problems, super helpful for any company size. so exactly now. So what, what is RAC? So yeah, so RAC stands for retrieval augmented. generation. And it's essentially a way of where large language models like, chat GPT to be aware of data that they have been not trained on. So for instance, chat GPT is trained on all of this public data, but it's not trained on your Slack conversations on your basically, company data on your CRM data on your JIRA tickets. Or basically, this data that are proprietary for you or for your company. So it's essentially a way how, we can basically, make, a model, be aware of this data. And of course we have this problem of, hallucinations. And, that they're not, models are not up to date, in their training data. So it's also solves, that as well. And to better understand that, I want actually to, showcase, an example, a case study that we were, working on. So basically, we developed for, car mechanics of Mazda, cars, basically mechanical repair, Mazda cars, in a way how they can basically, get a repair instructions from the system. So their problem was that it will, take very long for the mechanics to search through manual, Mazda instructions to find out how to, repair the particular problem of a car. And that had huge business impact on basically unhappy customers because they would, it would take very long time until they would receive their, repaired car. And often they would also, The mechanics ordered the wrong, repair parts because they found the wrong information that what they needed to repair. Yeah. So you can see here. So let's say there's this mechanic and ask how to fix a broken airbag of Mazda G4 car. You can see here, if you ask this question to chat GPT, it will give you basically just generic data. generate information, and then it will also prompt you basically, Hey, ask a specialist. The system that we developed is a rack system, so it has all the Mazda manuals. So it's able to tell you exactly how to repair, let's say, a broken airbag for the particular Mazda car. So now obviously the mechanics, instead of searching, manually and doing all these mistakes, they have a system that they directly basically tells them how to repair easily. So this is basically what RAC is, and how basically it is designed. Able to, become aware of your data, and an example to showcase, the difference between, a normal chat with GPT that might have general knowledge about car repairment, but not specified knowledge about, the particular problem that, we are looking for to solve here. Yeah. So I'll touch on two things that you said, just as a quick clarification to everyone. One, I like to say about all these large language models that, they follow the common, say that they're, the jack of all trades, but the master of none, as you start deep diving into specific topics and really getting needing like expert advice, they usually don't know the data. if you're trying to get deeper and deeper on a specific professional topic, they just don't have it. And even if they have it, they may not have it for your business, your company or a thing. So this is. One thing that it's coming to solve is you give it the data that you need it to know and then ask it to retrieve just from that data. And as Joanna said, the other thing that it solves is hallucinations, the level of accuracy, and it would still hallucinate probably to an extent with rag, but it's going to be a significantly lower percentage and a much higher level of accuracy. Accuracy when it comes to providing you the right information, if you are using rag versus just using a open, like a market available, large language models. So how has the magic done? take us through the process. How is the magic? Yes. So on a very high level, non technical level. there is the user that asks a question. In our case, the mechanic asks how to fix a broken airbag of master car. So now the system has basically all this. Mazda manuals in a vector database stored, right? And then the system will actually semantically search for in the database for the relevant chunks or the relevant, basically, documents for this question asked, so you can see here, maybe search three, four documents. they're half like the information, needed in specific, And then this is the retrieval part. Then the augmented part is where basically the user query it's augmented with the retrieved, information from the first step. And then the last step, which is the generation step. Step is where basically the system generates the answer, right? so remember this example from before where you ask a question and the answer might be in very different sources. So this is. the different sources and then as what as a user, it's actually, the generated answer from this different sources as a, answer to your question. So how do you. Yeah. No, it's a great explanation how the, I think what we're missing here for the common person is, I assume people are not technical. The, combination of the words vector database, give them diarrhea. And so how do you get, how do you get the data from your CRM from your emails, from your slack channel, et cetera, into a vector database that then the system can retrieve from. Yeah, sure. So basically, first like a vector database is storing this and how I would say, it's basically storing the semantic information. That is relevant basically to the query. So I'll just take it from the beginning, how you would go about it. you will have all this data you would need basically to use, to load them into, the vector database and you would use low data loaders like. Lama index or a long chain, and you will always need to basically pre process that. So you would need to have basically unified, format, usually in Markdown. Then basically, you would need to split that, data into chunks, that, this chunks, it will be easy basically to create embeddings. And then this embeddings. We basically store them into this vector database, so the system can, retreat from. And, the vector database, basically these embeddings are basically the semantic meaning, of basically this, So what is the semantic meaning? Let's say, a king and a queen, it will be on the same kind of the same, level, you have the same semantic meaning as opposed to sound that it will be basically not relevant to king or queen. So this is basically the semantic meaning. Yeah. Sorry. To the vector of databases. I want to make this a little more specific. Is the common person, can he use tools? So people who don't know link chain and don't know how to do this from a technical perspective, are there easy ways, like something that a common business person can do or specific tools that they can use in order to get to that outcome without. Maybe as a first step, right? Let's do a test case, see that it's actually working, see that it's giving us some things, and then come and hire a company like yours to actually do the more complex, sophisticated, solution. Yeah, sure. exactly. There might be levels to that. so first, as I mentioned, we would need to collect and prepare the data that we have available. It could be internal data, it could be your PDFs, invoices, All of that, but it could be also external data. It might be basically websites, blogs, or basically YouTube videos, the transcripts of it and so on. And you could basically, how you could get started very easily with, first of all, if Your use case, it, it might be, you want to have, and, your company knowledge data, so basically use rack for your knowledge, company knowledge. Use out of the box tools, don't try to basically start this from scratch, develop this from scratch. Use tools like basically, Gleam or similar. but depends on your use case. for instance, if you have, Couple of documents, with your company policies. Let's say you want essentially to train a chatbot to have it in the chatbot for the, the users on your website to ask questions, about your policies, about how is your refund or about products. you might just, prove, validate the use case with a simple GPT. So we start basically very simply, with simple GPT from OpenAI. So take the documents and just upload them into, a GPT, and validate the use case there. Is it basically, does it answer correctly my questions? is it actually useful? and then you can also, let's say you want to get buying from stakeholders. That's the perfect way actually to showcase them. Hey, look, it works. It answers, questions, that our customers answer us. And we have now a person answering them. we could, so that's the first level two. So before, before you go to level two, those of you don't know what GPTs are. So open AI under their platform within chat GPT, by the way, until a few days ago, it was only in the paid version. As of this Monday, it's available to everyone, which is a great thing. It's a, it's an image. It's a magical tool that they now made available on their free platform. It allows you to develop these mini custom automations that again, sounds very technical, but you do them literally by giving it instruction in regular words. I want you to allow the user. To, ask you questions about the information in the attached documents, questions they may ask might be one, two, three, four. I would like you to start the conversation by telling the user, what do they want to know about HR questions? And it's, that's, what's going to start. It's going to say, Hey, what do you want to know about HR questions? And then the person will type, and if you've uploaded all the HR documentation into that GPT, which again is a simple drag and drop, there is no, all the stuff that Joanna talked about before happens in the back end magically. Like you don't have to set up the data, structure, the data, convert the data. Like all of that, literally all you have to do is drag and drop the HR documents, your employee handbook, your guidelines, these kinds of things, drag and drop them into the GPT and that's it. And people will be able to chat with that data on a very high level of accuracy. Again, right now it costs exactly 0 to do, and to figure out how to create the GPT after a little bit of training will take you minutes. Including the training will take you an hour and you have something that you can chat with the data. The biggest problem may be of GPTs is the amount of data you can upload to them. So the HR example is a great example because I don't have huge amounts of data, but you're limited with the amount of data you can upload into these GPTs. But as I mentioned, It's one of the most amazing capabilities we had in businesses ever. Definitely. Now that it's completely free, not that I think that 20 a month should make a difference, but, now you don't even have that excuse. So it's definitely worth checking out and testing it. And as Joanna said, it's a great way to do a test case. and see what results you're getting before you go to level two. So now let's talk about what is level two. Yeah. And before we go to level two, if you have still issues with data privacy, you don't want to share your HR data or whatever, you can actually generate synthetic data, based on, to model your real data and create a GPT is to still basically validate the use case. So that's also creating, synthetic data might be, viable as well, Use case, sorry, level number two. So in this basically level, you could use, a drag and drop tool, like a stack AI, for instance, that, it's no code, low code that you can create this rack system or rack chatbot, by drag and drop. they do also pretty much everything for you. it's just more advanced version than a GPT, more, capabilities, there. and of course, if data privacy is a concern, you could actually cost a locally, like a tool like, Verbum, maybe we can show it. I don't know. Do I share now something? I see a dog on your screen. Oh, here we go. Cool. They changed also the user interface. Yeah. Great. So basically, yeah, this is the no code, sorry, not no code. basically a way how, if you have proprietary data, let's say if, you're a lawyer, let's say, and you want to, chat with your cases, but the cases are, prepared data pri private data, you can, use a tool like vr. it's, yeah, so you upload again, your, documents here and the chat with your documents. Yeah. Now they just changed the user interface. Crazy. Okay. Anyway. Um, so yeah, that's level two. And of course, level three is where you build that from scratch. the, solution that I basically showed you, with the car mechanics, Yeah. That it's basically built from scratch. And there you use basically frameworks like long chain or LAMA index. yeah, you would need an embedding model to do you to do the embed embeddings. yeah, the, to store, the embeddings, in a basically vector database can use. Different ones, Quadrant, Pinecon, Baviate, one of the most famous ones, and we use also Streamlit for the prototype, prototyping, so you can actually create basically this prototypes, as I showed you, on the use case, the case study basically before, I want to summarize very quickly, the three use cases, the three levels. So level one is use an existing model. You can use a GPT and open AI. You can even just use cloud and upload documents, but that will be like a one time thing, like every chat, we'll have to upload the documents again. by the way, if you want to chat with a lot of data as a one time thing, Gemini 1. 5 pro from Google. As of this week for developers. So if you sign up has a 2 million token context window, and I'll explain what that in a minute, the platform that's available, if you don't sign up for the new program is still 1 million tokens. So 1 million tokens is 700, 000 words, which is a huge amount of data. It's way bigger than any of the other tools that exist today. So the second best one you're going to get is Claude. And that has 200, 000 tokens. And. The cool thing about Gemini 1. 5 pro is that it's free. So if you go to, their AI studio by Google, you can upload documents and chat with those documents. So if it's a one time thing, if you're working on a proposal and you want to look at what's in the RFP or what's in the proposal so far, or if you're working on analyzing existing data and it's not an ongoing thing that you'll need to do again and again, using a model is actually not a bad idea. Like I said, Gemini. 1. 5 pro has two benefits. One, you can load a huge amount of data and two, they're getting a very high level of accuracy in, getting correct responses, which is north of 97%. And so this is obviously very helpful. So that's level number one, level number two is using tools that were built to do exactly that. There are no code tools that allow you to upload and connect various levels of information and then chat with that information. And then there's again, some of them work. In the cloud, meaning you actually give them the information and it gets loaded over there. And some of them like verbal works locally, which means you don't have to give your privacy and you don't have to think about who, who am I actually giving access to my data in order to be able to chat with it. And then level number three. It's a custom build solution that is built specifically for your needs based on your exact data based on your use cases and so on. The differences is obviously the amount of data you can upload and how much is it going to be customized to your needs. But the flip side is. How much time and money it's going to cost you to develop it. So that's the thing. And it's not a bad idea just to go through all three steps. Start with step one, make a quick test, see if it works, go to step two, that works well, but you can solve everything with these tools, then go and have somebody, Like Joanna build a custom solution for you to solve that problem. Yeah, no, I totally agree. Like a hundred percent. If you will have a tool or out of the box solution to do your job, don't even bother to build stuff from scratch. Yeah. There's a question. There's a question. Verba, the open source version of data stacks. I don't know if you know that answer. Question from Verba. Verba? Sorry. Sorry again. Is Verba an open source version or like an open source option for data stacks? I don't know. what do you mean from data stacks? Yeah. It's probably a different tool that does. Yeah. It's from Vaviate actually. It's from the vector database Vaviate. So they basically created this, Verba tool. it's open source. Like you can basically try it. but, no, sorry for not being able to answer. Okay. so let's continue. Yeah. So basically, you're still able to see my screen, right? Okay. oops. Ah, okay. what to be aware of, when you basically, The rack solution go into production, besides basically the technical issues, not being able to retrieve stuff, which basically there is like a very, technical solutions to those. But the main two things is actually, the inference cost when. You scale this applications to, your production. So for instance, to generate one paragraph with GPT 4 costs the same, money to ask if you would generate a whole book with mixture model, but, and why I'm saying that because, maybe I can open that because some use cases of rack, at least from my experience, you need to use like models like GPT 4. now GPT 4. Oh, it's actually half the cost. So it's a bit, better, but I need to click that and I can't click that. I will add something to what you said as you're opening the document. for those of you who don't understand how this works, all these models behind the scenes are actually using APIs to different large language models, once you have the data in the database, in order to generate the actual responses and each and every one of these tools, whether they're open source or closed source, has a different cost associated with using the API. And when Joanna is saying inference, inferences, when these models are generating results, like the actual generation of content in tokens is actually called inference. So that's basically the generation of anything in these models. And you pay per token, which again is about 0. 7 words. It doesn't matter why it's this way. Just, Take that as this is what it is. So the price range varies dramatically between these models. In some cases for a million tokens, so 700, 000 words, you will pay 20 cents. Like a mixed trial seven B, which is an open source model from a French company. Lama is another open source model for meta similar pricing. If you go to a Claude three Opus, which is right now the most advanced model from Claude from Anthropic, that's 70 instead of 20 cents for the same amount of generation. Quality, you're probably going to get a higher quality at least now. So you got to pick and choose in which cases you want to use. Which models, and I assume that's where you were about to go. Yeah, exactly. and I would have to say that the price will be commoditized at some point. the cost of intelligence would go to zero probably. We all, we see that basically with GPT 4. 0 that it helps the costs with the same quality. So yeah, that's after GPT 4. 0 turbo halved it from GPT 4. 0 original. Yeah. Yeah. Yeah. It's crazy. Yeah. Yeah. Basically, as I said, I did that, I did not update my presentations on that, but yeah, this is how it, so basically just give you an example. So for instance, use case, like the Julia example that we were talking about, about, having 300 users to use the system 10 times per month, it will cost with GPT 4 14, almost 15 per user, So imagine a business that, sells this, you need also to be viable. so yeah, it, it costs a lot, but you can also directly see how much more less it's with smaller models. So one technique basically, I want to go back to this just for a second to give the numbers for 300 users. War over 4, 000 a month with GPT for 300 a month ish with GPT 3. 5 and 159 with mixed trial seven B. So it's 20 times more to do it with GPT for turbo. But sometimes you have to, like there's use cases where these better models will just give you better results. But I agree with Joanna that this price is going to go down and down all the time. Yeah. For instance, this some mechanic, rack use case, we needed to use GPT for two, actually. And obviously their quality matters, right? And of course, when you have little users, it does not even matter. yeah, but in case, you scale with users, one solution is just to, to, having a router that Depends on the complexity of the query, it basically roots it to a cheaper model. So this is one of the many, yeah. Um, then we look into security. Very important, when you have especially a front facing, a rack app. we had this use case, not the use cases, sorry, examples that there was this chatbot that sold, a car for 1 and it was actually legally binding, and also another example of a Air Canada chatbot that, gave basically a bad advice, on, on, on regarding a price. Yeah, exactly. It's, there was this refund that the user wanted and it gave it to them. And it was legally binding. So who is liable there? Basically what happens here? It's, we need to imply some, guard trails, to eliminate basically any, data leakage or, prompt injections and, yeah, for the guide drills part, there's tools like nemo guardrails or guardrails. ai. A lot of them basically. So it's a way how, before the, it could have it both on input and output query, but it's basically a step in the middle where before it gives the answer to, to the user, it actually, filters it to or from or any harmful or basically inappropriate content and can also protect personal information. very important step, as I said, for front facing, customers, facing, solutions. And of course, also for the, token, to monitor basically the token usage. And we also need to, a solution there, very important to understand where costs are coming from. And there are tools like Landsmith or Phoenix, from Arise. And that, you can directly see, how many tokens are, Queries from your users and you can even track cost per user and per query. So very important. And when you think to basically put, the app finally into production, but also before especially the, the token usage monitor. yeah, I think that's it. No, it's great. I, before, before I dive into, there's a bunch of questions from the audience before we dive into those questions, I want to summarize that aspect of it. So we said, There's three different levels that you can test this out. one is very simple. The other is a little more. Advance, but still does not require any third party custom development. And then the third requires custom development in the custom development universe, you have to pay attention. And actually on the security side, probably on the other two as well, like you got to understand what the data you will be sharing. With who? And that company that you're sharing with how are they keeping your data secure? If they're keeping your data secure and different companies will have different comfort levels with different solutions, right? So if you're Joe Schmo and you're selling shoes in the market, there may be, you don't have any information that is problematic for, to give to Chachi PT or to Claude or one of those. But if you're a doctor, a lawyer, A, a financial advisor, all these highly regulated companies, you just cannot, it's not even an option, which means your company that you're working for, and if you own it, then it's on you to figure out how to run this in a secure way that does not expose your data, which means putting the right measurements in place. So that's problem. Number one, problem number two is that, that Joanna mentioned is how do you protect yourself from just mistakes that these models make either because they're just made a mistake or because people know how to manipulate these models in order to give them the information that they shouldn't, as Jonah said, people can use it against you because if the model is going to commit to a price, it committed on your behalf. And so you're legally. Bind to whatever the model says, if it's in a chat with a client. And so you got to take these things into consideration with the solution that you're providing and make sure that the solution that you're putting in place is aligned with the needs of your business, and this could be. Anything, right? It could be I don't care. That's a fair enough solution. it's fine. Like I said, there's cases where if it's an internal tool that people use and you give them the ideas and they like, okay, fine. So it's going to help you 90 percent of the time. And the other 10 percent people still have to do the old manual process. Okay, so there's different justifications and different environments will require a different kind of level of security, both in means of data security as well as a means of protecting you from getting the wrong answers from a model. I want to jump into a few questions. I have one and then there's people from the audience who have a few. So I'll start with the first. The first question is. can you please re mention how AI systems are priced? So do you want to take that one? Yeah, sure. I can actually show you, here. Do you still see my, Yeah, so here we have basically, all the costs for an LLM app and specifically for Iraq app. So we have here the, embedding costs, which are basically, the embedding models, which it's very minimal as you can see for, To upload 10 million documents, or let's say co uploads take a million, but let's say 10, 000 documents, it just cost you 65 cents and 10 cents with, with a smaller model, embedding model. very minimal. We don't even Yeah. No point in mentioning it. Yeah, exactly. But then here, the inference cost, it's basically, each of this, companies like, OpenAI, they charge. pair, talking like, so for instance, tokens is basically you have a query and depends on how long is this query. It charges you basically for the talking that, of the query. So for instance, here you could see that, the cost per 1, 000, 000, input tokens. It's,$10 compared to GPT through, through five. Tour ball is just 50 cents and 27 for mixture. then they have a different price for the output tokens, so for basically for the tokens that they will generate to give you, to generate the answer to you. So it's 30 for GPT for tour ball, and so on for the other, APIs. models. And, yeah, we also have basically, the costs from the document contexts and the prompt that the system of our app has. So for instance, back to the example with, With the mechanic car instructions, the prompt template will be basically the instructions that are in the backend for them to tell, Hey, now you are an expert in car repair. here it's the question of the user. Please check your documents and find the appropriate, basically. So this is the prompt template. So this is also, basically the costs. Yeah. Again, to explain this in all these solutions, even if you're using a basic GPT, you're basically adding. An additional prompt to what the user is writing in order to explain the system exactly what to retrieve and how to retrieve and what data it's looking for. And all of these count in your token count. the question from the audience to follow up is one token equals one bite? And no, the answer is no. It's just the way these systems work and just take it into that. A token is about 0. 7 words. And that's what you're going to pay for. Most of the pricing models that you're going to look at are going to give you the price for a million tokens. So basically if you're looking at a million tokens, it's going to cost you X. A million tokens is going to be 700, 000 words ish. Just depends on how long the words are. Yeah. We can actually also see it here. I just make a quick example of what tokens are. So we have this query, how to fix a broken airbag with a car you can see here. It's basically. The tokens, how it's split. So each of those are tokens. Tokens. So there's multiple tools online. what Joanna is showing now is called tokenizer. You can literally just paste your text in there and it's going to tell you how many tokens, but it's also going to show you how it's broken up, totally unnecessary. If you just want to know roughly, just assume that every, 0. 7 words is one token and you're going to pay for the tokens you use and there's different price for tokens coming in than there is for tokens coming out. On some of these models. So what I mean by coming in is your input. So your prompt, the document you're loading and all of that stuff is tokens coming in and then tokens coming out is inference or what the model is generating. And in most cases, they're not equal. And in all cases, it's not equal. The inference to generation actually costs you. More money and sometimes a lot more money. It's still very small amounts. Like as John, as I mentioned before, the most expensive model right now is, Claude three, three Opus, and it's 70 for every million tokens of output. So 700, 000 words. I don't know how many books that is, but it's probably two to three. Books of 300 pages each or something like that's going to cost you 70 bucks. So if compare that to any other way of generation of that amount of content before it's still free, right? But if you add that times 300 employees, times 10 times a day, these costs start adding up. And so optimizing for cost is important. So that was one topic. that we've covered, I have another question, still on this, what's the cost of actually hosting this, right? So I need that vector database to reside somewhere. What is roughly the cost of hosting that? I have actually to host 50 million tokens, in a vector database costs, I think just 70 something. Yeah. Yeah. Yeah. Awesome. the next question is what size of companies you feel are best suited for this solution? So I think, and I will let you answer, but I think we mentioned three different solutions. So I think we can. Yeah. reply on the different solutions and the different sizes of companies. Yes. Um, any basically company, even if you're individual, you can, and you have some documents that basically you, want to chat with your documents or gain more information from the documents, or you might have. I don't know, you might have, five YouTube videos about a specific topic that you want to create a new content about, generating the original content out of it. You can basically use GPT to do that. create a GPT, and, store the data there. Um, but, sorry, back to the question. It's So any company. Yeah. So no, go ahead. Yeah. So if you, for instance, if you use a company like gain, if you, we go now to their website. They're basically, they don't even have the pricing online. So it shows us that it's for enterprise, so this might be for a company that are 50, a hundred, 300, 000 employees, Yes, I'm sure there's the fact they raised whatever, 200 million also shows you. Exactly. Yeah. so there are for sure other tools like Gleam that are more, specialized essentially for smaller companies. I'm, I don't, I'm not really. Actually, I think I have even a spreadsheet with those companies. Yeah, companies, there are many chatbot tools that are basically the same thing, Chatbase or Dante are basically chatbots. Yes. You connect your data. They're really cheap. and you can connect whatever data you want to them and you will be able to talk to that data. So there's, Cheaper solution than Glean to give you an entry level to the level two. Okay. I'm not using Chachi PT or Gemini or Claude. I'm using an actual tool that does it. So Dante, Chatbase, there's a bunch of those that do the same thing and you can use them to upload your company information, to connect to URLs, to create. To upload videos, connect to YouTube videos, like all these things, multiple sources. And you can do that. I will say something beyond that. And it's half a question as well. Both Google and Microsoft are clearly working in that direction, right? Where you'll be able to use their. Chatbots that is going to be integrated to everything within their universe and beyond, like even today on the Microsoft environment on Microsoft co pilot, you can connect Microsoft co pilot to external data sources, and it's not everything, but it's the big ones. So you can connect it to Slack. You can connect it to Salesforce, CRM. And stuff like that. So do you think this whole concept of rag will become basically a given sometime in the next 12 to 18 months, at least to some level? Yeah, I actually had this discussion yesterday with someone about the fact that yes, Google, also has, I think even the ability, even now you pay a bit more around 20 something dollars and you can search through your Google drive and everything. yes. However, now you can not connect it with your Slack or basically with your other Salesforce or whatever, this kind of apps. that it's. that might be not, sufficient if you actually want to have access to all your basically apps that you're using, you might use a tool like game that has hundreds of connectors. but definitely, I, also with open AI, where they're going with their, assistant API and also now they increase also context to their GPTs, you will be able actually to upload more. So definitely there, yes, I would say there is some competition and we will use that, yeah, but there is room for other companies probably as well. And at least in the long, short term. yeah. Awesome. so quick summary. First of all, this was fantastic. Like we touched on a lot of things. I think we give a lot of people and, both the comments in zoom, as well as the comments in, on LinkedIn are all very positive and people really appreciate all the information that you shared the quick summary, you can use multiple levels of tools to communicate and chat with your data, not doing it. Is by definition costing you more money than searching the old way, where you're going to miss timelines, you're going to get the wrong information. You're going to potentially miss clients, use, lose clients, not win proposals, et cetera, et cetera. there's really very few excuses why not to do it. And you can start very small with tools that are, that requires. Zero technical knowledge, other than literally connect your data sources and you can start chatting with them. Joanna, if people want to follow you, learn from you, work with you, what are the best ways to do that? Yeah, sure. So you just, add me on LinkedIn, John, stop it again. I, basically you can also see it here and ask me whatever you want. I can share more tools, more, my spreadsheet cost calculations and everything. So feel free to, yeah, just shoot me a message on LinkedIn. Yeah. I think we'll do with a spreadsheet because a lot of people ask for it. Most of the people on LinkedIn said, yeah, I want it. I want it. I want it. I want it. So what we're going to do is I think I will ask you to create a shared google sheets with it and we'll just connect the link in the show notes Once the podcast goes live and then anybody who listens to the podcast can have it I can't thank you enough. I want to thank also the people who join us i'm gonna just go not with everybody because it's going to be a while But the people ask questions and participated so elsa paul and katie cope and kyle king And james lindsey, my man and Cursad paratas. I hope i'm not butchering names here And also China and Daniele and Igor, and, that have joined us on and ask questions on the zoom as well. So thank you everyone for participating. Thank you so much, Joanna, for sharing with you. This was awesome. We'll do it again sometime in the future. Definitely. Thank you for having me and thank you everyone who joined and, yeah. Talk to you soon. Love me. we'll take it from there. Ciao. Bye everyone.