quot;User: {userInput}\nChatBot: {bot_answer}\n\n"; Console.WriteLine("User: " + userInput); Console.WriteLine("ChatBot: " + bot_answer); } ``` ### Step4. Use embeddings to avoid hitting the token limit So far we have built our simple chat-bot which is great but you will find out depending on the size of chat entries, response size and the token limit of the model that you are using, after only few turns you will hit the input token Limit on your LLM. Long story short, you can't just keep building a history and keep passing it as an input parameter to the kernel function. To fix this, we should use another technique or concept in ML called the embeddings. You can think of embeddings as a mathemtical representations of values or objects like text, images, and audio that are designed to be consumed by machine learning models and semantic search algorithms. They translate objects like these into a mathematical form according to the factors or traits each one may or may not have, and the categories they belong to. Essentially, embeddings enable machine learning models to find similar objects. Given a photo or a document, a machine learning model that uses embeddings could find a similar photo or document. Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for artificial intelligence (AI). Technically, embeddings are vectors created by machine learning models for the purpose of capturing meaningful data about each object. you can learn more about it by looking it up or read blog posts about it like this [definitive guide to embeddings](https://www.featureform.com/post/the-definitive-guide-to-embeddings) Thankfully, the folks who have created SemanticKernel have thought of this and created a cool mechanism to incluse these embeddings in our function calls called `Memories`. We used kernel argument to fill the prompt with a `history` that continuously got populated as we chatted with the bot. Let's use memory instead! For that we need to narrow our scope which in case of our chatbot, Stars-AI we already know we are building a career coach ChatBot. So we need to gather some relavant facts about users professional goals. This is done by using the `TextMemoryPlugin` which exposes the `recall` native function. `recall` takes an input ask and performs a similarity search on the contents that have been embedded in the Memory Store. By default, `recall` returns the most relevant memory. so here is roughly almost the exact same code I used for Stars-AI in this website: ```cs using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Plugins.Memory; #pragma warning disable SKEXP0011, SKEXP0003, SKEXP0052 //memory builder is experimental const string MemoryCollectionName = "chatHistory"; var memoryBuilder = new MemoryBuilder(); //Lets use another OpenAI's model for text embedding string embeddingModel = "text-embedding-ada-002"; // this is the only model that supports text embedding memoryBuilder.WithOpenAITextEmbeddingGeneration(embeddingModel, openAIKey, organizationID); // For now we will use In-memory store memoryBuilder.WithMemoryStore(new VolatileMemoryStore()); var memory = memoryBuilder.Build(); const string skPrompt = @" You are StarsAI, a very polite and professional chat-bot, and you are chatting with ""{{$user}}"" who is the user. You are a Career and life coach and an expert teacher in different topics giving people advice to get them from their current level to the point that they can be hired as a professional or where they want to be. Don't answer random questions outside of learning and career topics. Just act as a responsible and patient teacher and career coach to help people with what they need to learn or do to advance their careers. Consider following facts, goals and personal information about ""{{$user}}"": - {{$fact0}} {{recall $fact0}} - {{$fact1}} {{recall $fact1}} - {{$fact2}} {{recall $fact2}} - {{$fact3}} {{recall $fact3}} - {{$fact4}} {{recall $fact4}} - {{$fact5}} {{recall $fact5}} If the conversation has not started yet, start by prompting: ""Welcome to StarsAI ""{{$user}}"", Tell me a little bit about yourself, what is your education level and what are your career goals?"" but don't show that if the conversation has started. Always consider what has been asked before and don't ask the same question. consider User answers for asking next question. After getting the input, layout a study guide and steps, online or university courses or even certificates user need to take or pass to have the best shot to get hired as a professional with the best salary possible. Always suggest shortest and most affordable options for learning, for example taking online courses. Only suggest getting a university degree if it is absolutely necessary for the job function User: {{$userInput}} StarsAI:"; var executionSettings = new OpenAIPromptExecutionSettings { MaxTokens = 3000, Temperature = 0.8, TopP = 0.5 }; chatFunction = kernel.CreateFunctionFromPrompt(skPrompt, executionSettings); arguments["fact0"] = "conversation started:"; arguments["fact1"] = "education level:"; arguments["fact2"] = "career goal:"; arguments["fact3"] = "work history:"; arguments["fact4"] = "desired job:"; arguments["fact5"] = "desired salary:"; arguments[TextMemoryPlugin.CollectionParam] = MemoryCollectionName; arguments[TextMemoryPlugin.LimitParam] = "2"; //how many memories to recall for a specific fact arguments[TextMemoryPlugin.RelevanceParam] = "0.6"; //measure of the relevance score from 0.0 to 1.0, where 1.0 means a perfect match. // We need to import the plugin to the kernel // do this only once either through constructor or transient dependency injection kernel.ImportPluginFromObject(new TextMemoryPlugin(memory)); // ... // Later in the code // Chat Object from Database public class Chat { public string Id { get; set; } = Guid.NewGuid().ToString(); public string Message { get; set; } public string Role { get; set; } public string UserName { get; set; } public DateTime CreationDate { get; set; } } var msg = new Chat { Message = "chat message", Role = "User", UserName = "John Doe", CreationDate = DateTime.UtcNow }; await memory.SaveInformationAsync(collection: MemoryCollectionName, id: msg.Id, text: msg.Message); ``` ### Step5. Use vector database instead of In-Memory VolatileMemoryStore We are almost there. So far, everything works great and we will not hit the Token Limit ever again. However, there is just one problem and that is our embeddings now live in the memory. we can rebuild them no problem (although it will hurt performance but doable) but the main issue is that as our application grow and so many people start using it, the server memory will hit its limit very soon so we need to seek a better solution for this and permanently store our embeddings and that is nowhere but the Vector Databases because embeddings are vectors. Again, Semantic Kernel got our back and it support a wide variety of databases. For Starspak, we are already using PostgreSQL DB and it turns out PostgreSQL support this through `VECTOR` extension. why not using it for this purpose too? All you have to do is to enable this extension and call `CREATE EXTENSION VECTOR;` > Note, "Azure Cosmos DB for PostgreSQL" uses `SELECT CREATE_EXTENSION('vector');` to enable the extension. Here is the code modification you need: ```cs #r "nuget: Microsoft.SemanticKernel.Connectors.Postgres, 1.0.0-rc4" #r "nuget: Pgvector, 0.2.0" #pragma warning disable SKEXP0032, SKEXP0052 //memory builder is experimental using Microsoft.SemanticKernel.Connectors.Postgres; using Npgsql; using Pgvector; // Use Postgres Memory Store NpgsqlDataSourceBuilder dataSourceBuilder = new("Server=localhost;Database=db;User Id=user;Password=pw"); //TODO: Replace with yours dataSourceBuilder.UseVector(); NpgsqlDataSource dataSource = dataSourceBuilder.Build(); PostgresMemoryStore memoryStore = new(dataSource,1536); //pass datasource and vector size memoryBuilder.WithMemoryStore(memoryStore); ``` ## Conclusion: So far, we created a specialized chatbot that can have additional context in form of memories that can be fed to it. we can of course enhance this with realtime or store datasets through plugins and more advanced use of memories. we can even chain this function to other functions using planner. For more information on Semantic Kernel head to the [official documentation](https://learn.microsoft.com/en-us/semantic-kernel/overview/) Also, like it was mentioned earlier, you can download the notebook for the entire tutorial [here](https://raw.githubusercontent.com/pakbaz/LLMChatBot_SemanticKernel/refs/heads/main/chatbot.ipynb)