Conversational Streaming Bot using Gemini, Langchain and Streamlit
ChatGPT was the first widely used AI chatbot, but now the competition is getting fierce. Other models are joining the scene, offering longer conversational memory, empathetic responses, and grounding in your own data — among many other possibilities. There is more than one LLM release every month and have better performance as well. Now, all the community is focused towards solving the problems using the LLM. Some use cases of LLM are document question-answering bots, assistant bots, customer support bots etc. In this blog, I will explain an efficient and simpler way to create a conversational streaming Bot. This Bot will take all your queries and respond like ChatGPT.
For the sake of simplicity, I am using Google Gemini-pro as a language model. Because we can do 60 requests per minute for free. If you want to learn more about the Gemini model, then please read my previous articles where I have explained different types of Gemini models such as Gemini-nano, Gemini-pro and Gemini-ultra.
Google has also released a family lightweight state-of-the-art open-source model called Gemma i.e. Gemma 2B and Gemma 7B. This model was built from the same research and technology used to create the Gemini models. You can easily switch your model if you want to try other language models like Mistral, ChatGPT, Claude etc. You just need to change a few lines of code during model initialization.
Here, we will be using an open-source LangChain framework to access the language model and develop the request-response pipeline on the language model. LangChain provides ways to develop LLM-powered applications by connecting with external data sources.
LangChain has mainly following three core concepts:
- Chains are a sequence of processing components for the prompt. They are used within agents to define how the agent processes information.
- An agent is used as a reasoning engine to determine which actions to take and in which order. Agents are built to interact with the real world.
- Tools are specialized functionalities that can be used by agents or within chains for specific tasks.
Let’s start the implementation part. For this, you have to install the following libraries on your environment(you can find the requirements here). It is the best way to create a new virtual environment and install the dependencies inside that environment instead of installing all the dependencies globally. If you are a beginner in Python and want to learn how to create virtual environments in Python, read the following article.
- python-dotenv is used to read the value of critical variables(such as API key) from the environment file.
- langChain framework is used to create the conversational chain which will take your prompt, pass that prompt to LLM and respond by LLM response.
- langchain-google-genai is an integration package connecting Google’s genai package and LangChain.
- streamlit is used to create the UI of our chatbot and track conversational history using session.
Once you have set up your environment then it’s time to get the Gemini API. You can find the Gemini API key at https://makersuite.google.com/app/apikey. Currently, you can do 60 queries per minute to gemini-pro
using the API key collected from the above location. See the following image to get more insight on pricing.
First, let's import the dependencies to initialize the LLM. Here, we initialize the Gemini API key from the .env
file to the current environment using load_dotenv()
. Then we initialize LLM globally and use that later while creating the conversational chain.
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, HarmBlockThreshold, HarmCategory
load_dotenv()
llm = None
def get_llm_instance():
"""
Method to return the instance of llm model globally
:return:
"""
global llm
if llm is None:
llm = ChatGoogleGenerativeAI(
model="gemini-pro",
stream=True,
safety_settings={
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
},
)
return llm
In the above code, we have used ChatGoogleGenerativeAI()
to initialize gemini-pro
chat model API. We set stream=True
to stream the response of the model while the model is thinking of the remaining answer. We have also specified the safety settings to avoid the automated response blocking in the Gemini model.
Here, We have defined the LLM model. Let’s use that model and create the chain which takes the input prompt, calls the LLM and generates the streaming response.
import os
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
def get_response(user_query, conversation_history):
"""
Method to return the response using the streaming chain
:param user_query:
:param conversation_history:
:return:
"""
prompt_template = f"""
You are a AI assistant. Answer the following question considering the history of the conversation:
Chat history: {conversation_history}
User question: {user_query}
"""
prompt = ChatPromptTemplate.from_template(template=prompt_template)
llm = get_llm_instance()
expression_language_chain = prompt | llm | StrOutputParser()
# note: use .invoke() method for non-streaming
return expression_language_chain.stream(
{
"conversation_history": conversation_history,
"user_query": user_query
}
)
In the above code, we have defined a prompt template prompt_template
where we set the behaviour of the model and also passed the conversation history conversation_history
along with the user query user_query
. Now, our model first looks at the history before answering each question. Here we have defined the chat prompt template prompt
and instantiated LLM llm
using get_llm_instance()
method defined above. Then, we have created the Langchain expression language chain expression_language_chain
which takes the prompt template, passes the prompt to LLM and provides the streaming response using .stream()
method.
Warning: We haven’t counted tokens in the prompt, after a few conversations number of tokens in the prompt might exceed the token limit of the LLM. Please refresh your bot if you will encounter that error.
Awesome! We have completed the core working logic behind our conversational streaming bot. Now, it’s time to create the UI of the chatbot and invoke the chain to process the user prompt.
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
# let's create the streamlit app
st.set_page_config(page_title=" Conversational Bot!")
st.title("Conversational Chatbot 💬")
In the above code, we imported streamlit library and messages for both humans and AI. We have also set the title as Conversational Chatbot. Now, we will be using the streamlit session to track the user-bot history. In addition to that, streamlit has the functionality to share the variables between the reruns using the session.
# initialize the messages key in streamlit session to store message history
if "messages" not in st.session_state:
# add greeting message to user
st.session_state.messages = [
AIMessage(content="Hello, I am a bot. How can I help you?")
]
In the above code snippets, we first initialize the messages
store in the Streamlit session and added ”Hello, I am a bot. How can I help you?”
as the default bot message. Once, we have streamlit session ready, we will display the history present in the session to UI using st.write()
method.
# if there are messages already in session, write them on app
for message in st.session_state.messages:
if isinstance(message, AIMessage):
with st.chat_message("assistant"):
st.write(message.content)
elif isinstance(message, HumanMessage):
with st.chat_message("user"):
st.write(message.content)
In the above code, we looped over the messages and checked whether the messages were from the user and bot and displayed them on UI.
We have written the core logic behind LLM using LangChain, and we have also created the conversation store using Streamlit session. Now, it’s time to write the final block of code, which will take the user query and pass that query to the chain and displays the results.
prompt = st.chat_input("Say Something")
if prompt is not None and prompt != "":
# add the message to chat message container
if not isinstance(st.session_state.messages[-1], HumanMessage):
st.session_state.messages.append(HumanMessage(content=prompt))
# display to the streamlit application
message = st.chat_message("user")
message.write(f"{prompt}")
if not isinstance(st.session_state.messages[-1], AIMessage):
with st.chat_message("assistant"):
# use .write() method for non-streaming, which means .invoke() method in chain
response = st.write_stream(get_response(prompt, st.session_state.messages))
st.session_state.messages.append(AIMessage(content=response))
In the above code snippets, we first read the user query i.e.prompt
and added that query to the Streamlit session. Then, we passed that query to the chain we defined above which will provide the streaming response. Finally, we displayed that response to UI and added that response from the chain to the session as well to track the history.
Congratulations! We completed the implementation part. I hope you enjoyed reading and implementing it yourself. Now, it’s time to test our chatbot. Firstly, let's see how it looks:
It is a simple bot having minimal functionality to chat with LLM. I hope you like it. Let’s ask a few questions to the above bot.
Great! It respond as expected. When I asked the bot to write a Python program to sort the list in ascending order write a python program to sort list in ascending order please use demo data and also provide response on data
it responded correctly as expected:
Now, Let me test the bot by providing some information and asking about the information later.
In the above conversation, I Introduced myself to bot by providing my name hi, I am Netra
. Then we continued other conversations. At last, I asked the bot about my name do you remember my name
then bot responded as expected with my name Yes, I remmber your name is Netra
. Please, try different prompts and evaluate the response of the bot and provide your valuable feedback. I will be waiting for your feedback. I hope it will not let you down your expectations.
Finally, Let’s summarize overall working pipeline of bot. At first we initialized LLM and created chain which will take user query and provide stream response. We have written prompt using prompt template in such a way that, It contains both the conversation history and user query. Then we have created the streamlit session to store the conversation history. Finally, we read input from Bot UI and passed that to chain and written that streaming response to UI. Now, we have fully functional chatbot like ChatGPT. You can find entire code snippets here in my github repository.
Finally, If you have any queries I am happy to answer them if possible. If you liked it, then please don’t forget to clap and share it with your friends. See you in the next blog…