Build a Custom Chat-Bot using LLM with custom dataset

%% > Published `=dateformat(this.created, "MMM dd, yyyy")` %% > Published Dec 15, 2023 # Build a Custom Chat-Bot using LLM with custom dataset Download the entire code base: [Chatbot.ipynb](https://raw.githubusercontent.com/pakbaz/LLMChatBot_SemanticKernel/refs/heads/main/chatbot.ipynb) ![[image-5.png | 300]] ___ ## How to build a custom chat-bot with your custom dataset First, for building. your own custom chatbot, you could use HuggingFace [HuggingChat](https://huggingface.co/chat/) or OpenAI's [GPTs](https://openai.com/blog/introducing-gpts) and you should specially if you don't have much time and/or you don't mind sparing few more dollars. But what we are going to do is to build a chat-bot using either OpenAI or HuggingFace Models using LLM integration tools like [LangChain](https://www.langchain.com) or [SemanticKernel](https://learn.microsoft.com/en-us/semantic-kernel/) that is a lot more customizable. Other than the fun of playing with LLMs, these tools will allow you to work with any LLM including open source ones and build your own pipeline the way you want and even better ... writing it in your programming language and integrate it in your application easily. Both of these tools are open source and free at least for our purpose. LangChain will only support Node.JS and python but SemanticKernel which is developed internally by Microsoft, supports python, C# and Java with Node.JS support on the way. Additionally, Microsoft themselves used SemanticKernel internally for many of their own projects like various Copilots they made like [Microsoft Copilot](https://copilot.microsoft.com) In this tutorial I chose to demonstrate how to build a chat-bot using Semantic Kernel and C# Language and show you how I built the Stars-AI from 0 to 100 so you can build one yourself. ## Building a Simple Chat Interface with SemanticKernel First we need to choose what LLM (Large Language Model) we use to build our kernel to start working with it. As of today, Semantic Kernel supports OpenAI, Hugging Face Models and OpenAI. For our demo we are going to use OpenAI model as it is easier to start working with and has decent performance. Here is the step by step tutorial. You can download the notebook for the entire tutorial [here](https://starspak.blob.core.windows.net/blog-media/202312152316111748chatbot.ipynb): ### Step1. Build our Kernel First step is to build the kernel with an LLM. We could use Azure OpenAI, OpenAI model and HuggingFace community Models. We are going to use OpenAI because it has a good performance. All we need is to head to <platform.openai.com> and signup for a free account (it should give you some credits to start building your first app). create organization, copy its ID, head out to API-Keys and create a new secret key (Don't share it with anyone, thats why it is called a secret key) and place them in the code snippet below which you have to run in jupyter notebook using .NET interactive Kernel. > Note for the chatModel we could use gpt-4 or even newer preview models but it will cost more besides gpt-3.5-turbo is already better than most of other LLM models available today. but if you want the best experience and cost is not an issue by all means choose gpt-4 or gpt-4-1106-preview turbo. ```cs // this is still release candidate not the first release // but it will most likely be identical to the first release #r "nuget: Microsoft.SemanticKernel, 1.0.0-rc4" using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Connectors.AI.OpenAI; var builder = new KernelBuilder(); string organizationID = "org-xxxxxxxxxx"; //TODO: regional endpoint organization id string openAIKey = "sk-xxxxxxxxxx"; //TODO: replace this with your key string chatModel = "gpt-3.5-turbo"; //we can use gpt-4 or ony other model // We can Use Azure OpenAI which supports regional endpoints which meets GDPR and other regulations builder.AddOpenAIChatCompletion(chatModel, openAIKey, organizationID); var kernel = builder.Build(); ``` ### Step2. Build Our Chat Function Now let's build simple chat function with minimum settings. All we need is the prompt with an argument for user input, execution settings with setting parameters like MaxTokens, Temprature and TopP and if you don't know what these are, there are some good articles online explaining it like [this one](https://peterchng.com/blog/2023/05/02/token-selection-strategies-top-k-top-p-and-temperature/) We build our function setting our function from prompt (we could save our prompt in a directory as a text file and use it just as easily as this one) Pay attention to the order of the code. We get user's input (hardcoded here) and then add it as kernelargument and then invoke our function. ```cs const string skPrompt = @" ChatBot can have a conversation with you about any topic. It can give explicit instructions or say 'I don't know' if it does not have an answer. User: {{$userInput}} ChatBot:"; var executionSettings = new OpenAIPromptExecutionSettings { MaxTokens = 2000, Temperature = 0.7, TopP = 0.5 }; var chatFunction = kernel.CreateFunctionFromPrompt(skPrompt, executionSettings); var arguments = new KernelArguments(); var userInput = "Hi, I'm looking for book suggestions"; arguments["userInput"] = userInput; string bot_answer = await chatFunction.InvokeAsync<string>(kernel, arguments); Console.WriteLine("User: " + userInput); Console.WriteLine("ChatBot: " + bot_answer); ``` > Function invocation is done async but the chat response will appear all in once which is both slower and is not similar to Chat-GPT experience. for fixing that, there is an option for invoking function using StreamingAsync that returns IAsyncEnumerable response which will enable streaming for awesome user experience. so the code will look like this: ```cs using System.Threading; IAsyncEnumerable<string> stream_answer = chatFunction.InvokeStreamingAsync<string>(kernel,arguments,CancellationToken.None); await foreach (var answer in stream_answer) { Console.Write(answer); } ``` ### Step3. Include History and make it a conversation chat-bot So far we have built a simple chat function but we want to make it a back and forth conversation chat app. for that we are missing a key element which is to keep the history of our chat because the function is stateless and don't know the context of the user input unless you give the context in the form of chat history to it. for that, we need to add another argument for chat history and update it on every turn.Code will look like this: ```cs const string skPrompt = @" ChatBot can have a conversation with you about any topic. It can give explicit instructions or say 'I don't know' if it does not have an answer. {{$history}} User: {{$userInput}} ChatBot:"; var executionSettings = new OpenAIPromptExecutionSettings { MaxTokens = 2000, Temperature = 0.7, TopP = 0.5 }; var chatFunction = kernel.CreateFunctionFromPrompt(skPrompt, executionSettings); var arguments = new KernelArguments(); string history = string.Empty; while(true) { var userInput = await Microsoft.DotNet.Interactive.Kernel.GetInputAsync("User: "); arguments["history"] = history; arguments["userInput"] = userInput; var bot_answer = await chatFunction.InvokeAsync(kernel, arguments); history += quot;User: {userInput}\nChatBot: {bot_answer}\n\n"; Console.WriteLine("User: " + userInput); Console.WriteLine("ChatBot: " + bot_answer); } ``` ### Step4. Use embeddings to avoid hitting the token limit So far we have built our simple chat-bot which is great but you will find out depending on the size of chat entries, response size and the token limit of the model that you are using, after only few turns you will hit the input token Limit on your LLM. Long story short, you can't just keep building a history and keep passing it as an input parameter to the kernel function. To fix this, we should use another technique or concept in ML called the embeddings. You can think of embeddings as a mathemtical representations of values or objects like text, images, and audio that are designed to be consumed by machine learning models and semantic search algorithms. They translate objects like these into a mathematical form according to the factors or traits each one may or may not have, and the categories they belong to. Essentially, embeddings enable machine learning models to find similar objects. Given a photo or a document, a machine learning model that uses embeddings could find a similar photo or document. Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for artificial intelligence (AI). Technically, embeddings are vectors created by machine learning models for the purpose of capturing meaningful data about each object. you can learn more about it by looking it up or read blog posts about it like this [definitive guide to embeddings](https://www.featureform.com/post/the-definitive-guide-to-embeddings) Thankfully, the folks who have created SemanticKernel have thought of this and created a cool mechanism to incluse these embeddings in our function calls called `Memories`. We used kernel argument to fill the prompt with a `history` that continuously got populated as we chatted with the bot. Let's use memory instead! For that we need to narrow our scope which in case of our chatbot, Stars-AI we already know we are building a career coach ChatBot. So we need to gather some relavant facts about users professional goals. This is done by using the `TextMemoryPlugin` which exposes the `recall` native function. `recall` takes an input ask and performs a similarity search on the contents that have been embedded in the Memory Store. By default, `recall` returns the most relevant memory. so here is roughly almost the exact same code I used for Stars-AI in this website: ```cs using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Plugins.Memory; #pragma warning disable SKEXP0011, SKEXP0003, SKEXP0052 //memory builder is experimental const string MemoryCollectionName = "chatHistory"; var memoryBuilder = new MemoryBuilder(); //Lets use another OpenAI's model for text embedding string embeddingModel = "text-embedding-ada-002"; // this is the only model that supports text embedding memoryBuilder.WithOpenAITextEmbeddingGeneration(embeddingModel, openAIKey, organizationID); // For now we will use In-memory store memoryBuilder.WithMemoryStore(new VolatileMemoryStore()); var memory = memoryBuilder.Build(); const string skPrompt = @" You are StarsAI, a very polite and professional chat-bot, and you are chatting with ""{{$user}}"" who is the user. You are a Career and life coach and an expert teacher in different topics giving people advice to get them from their current level to the point that they can be hired as a professional or where they want to be. Don't answer random questions outside of learning and career topics. Just act as a responsible and patient teacher and career coach to help people with what they need to learn or do to advance their careers. Consider following facts, goals and personal information about ""{{$user}}"": - {{$fact0}} {{recall $fact0}} - {{$fact1}} {{recall $fact1}} - {{$fact2}} {{recall $fact2}} - {{$fact3}} {{recall $fact3}} - {{$fact4}} {{recall $fact4}} - {{$fact5}} {{recall $fact5}} If the conversation has not started yet, start by prompting: ""Welcome to StarsAI ""{{$user}}"", Tell me a little bit about yourself, what is your education level and what are your career goals?"" but don't show that if the conversation has started. Always consider what has been asked before and don't ask the same question. consider User answers for asking next question. After getting the input, layout a study guide and steps, online or university courses or even certificates user need to take or pass to have the best shot to get hired as a professional with the best salary possible. Always suggest shortest and most affordable options for learning, for example taking online courses. Only suggest getting a university degree if it is absolutely necessary for the job function User: {{$userInput}} StarsAI:"; var executionSettings = new OpenAIPromptExecutionSettings { MaxTokens = 3000, Temperature = 0.8, TopP = 0.5 }; chatFunction = kernel.CreateFunctionFromPrompt(skPrompt, executionSettings); arguments["fact0"] = "conversation started:"; arguments["fact1"] = "education level:"; arguments["fact2"] = "career goal:"; arguments["fact3"] = "work history:"; arguments["fact4"] = "desired job:"; arguments["fact5"] = "desired salary:"; arguments[TextMemoryPlugin.CollectionParam] = MemoryCollectionName; arguments[TextMemoryPlugin.LimitParam] = "2"; //how many memories to recall for a specific fact arguments[TextMemoryPlugin.RelevanceParam] = "0.6"; //measure of the relevance score from 0.0 to 1.0, where 1.0 means a perfect match. // We need to import the plugin to the kernel // do this only once either through constructor or transient dependency injection kernel.ImportPluginFromObject(new TextMemoryPlugin(memory)); // ... // Later in the code // Chat Object from Database public class Chat { public string Id { get; set; } = Guid.NewGuid().ToString(); public string Message { get; set; } public string Role { get; set; } public string UserName { get; set; } public DateTime CreationDate { get; set; } } var msg = new Chat { Message = "chat message", Role = "User", UserName = "John Doe", CreationDate = DateTime.UtcNow }; await memory.SaveInformationAsync(collection: MemoryCollectionName, id: msg.Id, text: msg.Message); ``` ### Step5. Use vector database instead of In-Memory VolatileMemoryStore We are almost there. So far, everything works great and we will not hit the Token Limit ever again. However, there is just one problem and that is our embeddings now live in the memory. we can rebuild them no problem (although it will hurt performance but doable) but the main issue is that as our application grow and so many people start using it, the server memory will hit its limit very soon so we need to seek a better solution for this and permanently store our embeddings and that is nowhere but the Vector Databases because embeddings are vectors. Again, Semantic Kernel got our back and it support a wide variety of databases. For Starspak, we are already using PostgreSQL DB and it turns out PostgreSQL support this through `VECTOR` extension. why not using it for this purpose too? All you have to do is to enable this extension and call `CREATE EXTENSION VECTOR;` > Note, "Azure Cosmos DB for PostgreSQL" uses `SELECT CREATE_EXTENSION('vector');` to enable the extension. Here is the code modification you need: ```cs #r "nuget: Microsoft.SemanticKernel.Connectors.Postgres, 1.0.0-rc4" #r "nuget: Pgvector, 0.2.0" #pragma warning disable SKEXP0032, SKEXP0052 //memory builder is experimental using Microsoft.SemanticKernel.Connectors.Postgres; using Npgsql; using Pgvector; // Use Postgres Memory Store NpgsqlDataSourceBuilder dataSourceBuilder = new("Server=localhost;Database=db;User Id=user;Password=pw"); //TODO: Replace with yours dataSourceBuilder.UseVector(); NpgsqlDataSource dataSource = dataSourceBuilder.Build(); PostgresMemoryStore memoryStore = new(dataSource,1536); //pass datasource and vector size memoryBuilder.WithMemoryStore(memoryStore); ``` ## Conclusion: So far, we created a specialized chatbot that can have additional context in form of memories that can be fed to it. we can of course enhance this with realtime or store datasets through plugins and more advanced use of memories. we can even chain this function to other functions using planner. For more information on Semantic Kernel head to the [official documentation](https://learn.microsoft.com/en-us/semantic-kernel/overview/) Also, like it was mentioned earlier, you can download the notebook for the entire tutorial [here](https://raw.githubusercontent.com/pakbaz/LLMChatBot_SemanticKernel/refs/heads/main/chatbot.ipynb)