list Back to All Blogs

Chat With your SQL Database Using Langchain

AI Data Analyst

Background: The Challenge of Complex Data Interactions

In the digital age, the ability to swiftly access and analyze data is crucial. I found myself in need of a solution that could navigate through intricate Entity Relationship Diagrams, manage extensive data joins, and incorporate external data sources. The task was daunting: to create a chatbot that could handle such complex data interactions without the user having to dive into the depths of SQL queries.

Tools: Embracing Langchain for Its Versatility

The choice of Langchain for this project was a no-brainer. Its rich ecosystem, complete with out-of-the-box Large Language Model (LLM) and database connectors, made it an ideal fit. While I opted for the ChatGPT model, Langchain's flexibility meant I wasn't confined to just that. This adaptability was key in accommodating different data analysis needs.

Setting Up the Environment

To provide an example I tried to do this from the scratch for anyone to follow:

Step 1: Establishing a Local SQL Server

First a dummy database assuming you have docker installed, we can spin up a local PostgreSQL or SQLServer with this command:

docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=password@1234" \
-p 1433:1433 --name sql --hostname sql \
-d \
mcr.microsoft.com/mssql/server:2022-latest

Then Connect to it with any Db Management tool you have. server is localhost , username is sa and password is whatever you chose above which in our case is password@1234 and make sure SSL and Certificate trust is selected.

I personally recommend Azure Data Studio which is completely free and cross platform with lots of community Db Extensions but it has out of the box connector for SQL Server.

Step 2: Preparing the Database

Once the SQL Server is in place, the next step is to preload it with a sample database. For this tutorial, I used the Northwind Traders database, a popular db example that mimics real-world e-commerce data-store.

First Create the Database:

CREATE DATABASE "Northwind"

Then Create tables and preload data into it using this SQL query

Step 3: Choosing the Right LLM

For this project, I OpenAI model since they are the most popular more specifically I use GPT 3.5 turbo and while it's much cheaper and less capable than the newer gpt-4 model, it still more reliable than most other models and also packs enough power for our purposes. and good thing is using langchain you can upgrade to any newer GPT model or even other popular models like Anthropic or open source models. In my case I used Azure OpenAI gpt-35-turbo because I have free monthly azure credits to use through MSDN.

You can Sign up for Free Account at OpenAI or register for Azure OpenAI through Azure OpenAI portal

The Python Program: A Gateway to Simplified Data Analysis

I've prepared a complete Python notebook, available for download here, which serves as a practical guide. This program illustrates how to integrate Langchain with our SQL setup, enabling seamless data analytics chat.

Going through notebook

We need to install langchain and langchain openai as well as sqlalchemy for talking to SQL Server Database

!pip -q install langchain
!pip -q install langchain_openai
!pip -q install sqlalchemy

We will import modules as following. make sure you import the right chat model according to langchain documentations Since I used the Azure OpenAI I used AzureChatOpenAI Instruction but if you choose OpenAI directly you would use ChatOpenAI or any other model that is there:

import os
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents.agent_types import AgentType
from langchain_openai import AzureChatOpenAI
from langchain.sql_database import SQLDatabase
from langchain.prompts.chat import ChatPromptTemplate
import sqlalchemy as sa

To setup Environment variables we will use the following and for OpenAI and Azure OpenAI name of those environment variables is important. Make sure follow the documentation for it :

os.environ["AZURE_OPENAI_ENDPOINT"]="your endpint" # Your Azure OpenAI resource endpoint
os.environ["AZURE_OPENAI_API_KEY"]="your key" # Your Azure OpenAI resource key

os.environ["SQL_SERVER"]="localhost" # Your az SQL server name
os.environ["SQL_DB"]="northwind"
os.environ["SQL_USERNAME"]="sa" # SQL server username
os.environ["SQL_PWD"]="password@1234" # SQL server password

Then This Block which connects to SQL Database using sqlalchemy and SQLDatabase langchain module using pymssql

connection_url = sa.URL.create(
	"mssql+pymssql",
	username=os.getenv('SQL_USERNAME'),
	password=os.getenv('SQL_PWD'),
	host=os.getenv('SQL_SERVER'),
	database=os.getenv('SQL_DB'))
  

db_engine=sa.create_engine(connection_url)
db=SQLDatabase(db_engine)

I ran into some issues with the pymssql on Apple Silicon (M3 Machine) and running this script fixed my notebook error (assuming you have homebrew and python 3.11 like I do):

brew install FreeTDS
export CFLAGS="-I$(brew --prefix openssl)/include"
export LDFLAGS="-L$(brew --prefix openssl)/lib -L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I$(brew --prefix openssl)/include"
pip install --pre --no-binary :all: pymssql --no-cache

and Last but not least the final block to put everything together:

llm=AzureChatOpenAI(openai_api_version="2023-05-15", azure_deployment="gpt-35-turbo",)
sql_toolkit=SQLDatabaseToolkit(db=db,llm=llm)
sql_toolkit.get_tools()


prompt=ChatPromptTemplate.from_messages(
	[
		("system",
		"""
		you are a very intelligent AI assitasnt who is expert in identifying relevant questions from user and converting into sql queries to generate correcrt answer.
		Please use the below context to write the microsoft sql queries, dont use mysql queries.
		context:
		you must query against the connected database, you can use the database name as northwind.
		As an expert you must use joins whenever required, make sure look into order details table for further order questions.
		"""
		),
		("user","{question}\ ai: ")
	]
)

agent=create_sql_agent(llm=llm,toolkit=sql_toolkit,agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,verbose=True,max_execution_time=100,max_iterations=1000)


while True:
	question = input("Enter db query (or Enter to quit): ")
	if not question:
		break
	agent.run(prompt.format_prompt(question=question))

Few key points on this code block:

  1. system prompt or meta prompt is crucial to set the context. If your table names are not straight forward make sure to clarify and set up context as much as possible to get better result. in that promp we only have one input parameter called "question" but as you can imagine you have have mutiple parameter including some that can be dynamically set at runtime in User Interface for instance.
  2. It is common for data analytics job to query multiple times and perform double and tripple checks to make sure it has the correct dataset returned for LangChain for Response. Create SQL Agent is flexible, make sure to play with its parameters to get the most desirable outcome.
  3. Last part of the code block we have a while loop which keeps asking question about the Database and you can "Chat with your database" but if you leave prompt as blank it will exit out of the loop.

Conclusion: A Glimpse into the Future of Data Analysis

This project is just the beginning of a larger trend. With plug-and-play tools like Langchain becoming increasingly accessible, we're on the brink of a revolution in the data analytics industry. The potential for these technologies to streamline data analysis tasks is immense, and it may very well redefine the role of data analysts in the future.


By Sepehr Pakbaz published at 1/16/2024

Question or Comment? help Contact

An unhandled error has occurred. Reload 🗙