Topic Expert Conversational AI vs. Generalized Chatbots
Conversational bots allow automating human tasks in many domains, including entertainment, healthcare, fashion, etc. With the adoption of chatbots, organizations have reported higher customer satisfaction rates, reduced costs, and higher profits margins. Intelligent chatbots are capable of handling diverse business functions including sales, advertising, marketing, customer relations, and fraud detection. They can even be used to automate several tasks simultaneously. Chatbots come in several flavors, ranging from simple rule-based bots with limited capabilities to AI-enabled bots that are capable of a wider range of human-like interactions, as well as hybrid bots that combine rule-based and AI-enabled chatbots. Their history is traceable to Joseph Weizenbaum, who developed the first chatbot in 1966 at the Massachusetts Institute of Technology. Today, conversational chatbots have become integral to many organizations.
Conversational bots are built to respond to questions the way humans do, and their performance has improved markedly in recent years. Of course, the internet is also full of blog posts, articles, videos, and podcasts that often address the same topics as chatbots. However, the interactive nature of bots makes information easier to access on-demand and gives users a feeling of human interaction that is quite different from simply typing a query into a search engine. Yet, chatbots must be carefully designed, because at times they can suffer from issues like providing misleading, irrelevant, redundant, or incomplete responses. Bots can also suffer from hallucinations, or even sometimes refuse to answers questions at all. Such issues are particularly sensitive when it comes to domain-specific chatbots that are expected to reliably provide accurate responses to queries within their domain. To tackle these problems, the present cohort at Fellowship.ai have set out to develop a conversational bot that would not only give good answers for a specific domain, but could also be adapted to a different domain with relatively little effort.
Our conversational AI team is currently working on building a domain-expert chatbot. State-of-the-art open-domain chatbots, like Facebook’s BlenderBot 3, provide human-like user interaction, but their responses to questions in specialized domains are often quite generic. In contrast, this project focuses on designing a chatbot that provides quick and reliable answers to questions within its domain of expertise. Here, we describe our progress so far, showcasing some of the challenges we have encountered and possible solutions for them.
Our team scraped domain-specific data for our bot from several reliable online sources: articles, blog posts, and video captions on specialized websites. Our initial dataset consisted of captions extracted from over 2250 videos from a single website. The bots we developed were then tested on 150 questions that were extracted from Q&A sessions on blog post and videos from the same website. Our dataset was later supplemented with articles from other sources.
We tested several models and decided to use the following architecture based on the haystack framework:
We used an off-the-shelf text2text generative model pre-trained on Wikipedia articles and books by Facebook, then fine-tuned on the Long Form Question and Answer (LFQA) dataset by deepset. The model stores our source documents and their metadata in the document store. Documents that are relevant to a query are selected by the retriever and then re-ranked based on cosine similarity to the query for sentence embeddings. For the generative model, we have used Bert, lfqa and Roberta for extraction. We have encountered issues with sources that fell outside the domain of our bot being selected during data extraction and preprocessing, so we opted for checking our data sources using zero shot classification. This approach helped us to eliminate inappropriate answers provided by the bot.
Our bot’s response generator is based on an autoregressive language model that uses deep neural networks to produce text that is similar to human writing. It is a generative transformer model that uses a pre-trained neural network (we use GPT3). The model allows generating elaborate textual responses based on a small volume of input text. Our model is able to pull articles from a website and is very portable: we can create a bot for an arbitrary domain simply by providing an appropriate corpus of source texts and produce satisfactory answers.
We deployed our bot on Telegram and compared our bot’s responses to user queries to answers that were obtained from Google search and BlenderBot 3. Our bot performed well on many questions, but struggled with questions that called for a numerical response and, in some instances, questions that called for a yes or no response. We have found that extractive models that simply pull a sentence containing the relevant number or binary response can outperform generative models on these types of questions. A potentially promising approach that we are testing would employ a hybrid model that identifies the expected type of response to a query, then uses either the extractive or the generative model, as appropriate.
Here are a few sample questions we have asked the bot and the answers that it provided:
Query: Is it OK to max out every session in the gym?
Answer: It depends on the type of exercise you’re doing. If you’re a power lifter, then yes, you can max out every session in the gym. However, if you’re an endurance athlete, then no, you won’t be able to do that.
Query: Is it healthy to exercise a muscle when it is still sore?
Answer: No, it’s not healthy to exercise a muscle when it’s still sore. It is better to take a day or two off and let your body recover. If you have a sore muscle, you should rest it for a few days to let it heal.
Query: I am underweight. How do I gain weight and muscles?
Answer: There are a few ways to gain weight and muscle. The first is to eat more calories than you burn. The second is to train your body to burn more calories. The third is to build muscle. There are a lot of different ways to do this, but these are the most common.
We are currently working on improving the model’s answers, as well as on automating the process for searching for reliable data sources that the model can refer to.
Here are a few screenshots from our running application.