Skip to main content

Embracing Deterministic Decision Making - The Critical Role of Intent Classification in Generative AI Chatbots

· 12 min read
Christopher Brox

As artificial intelligence (AI) continues to evolve, one area that remains crucial is intent classification. In the midst of the constant excitement about generative models like GPT-4 and their abilities to produce human-like text, one might wonder, do we still need intent classification? The answer is a resounding yes and is rooted in the deterministic nature of intent classification, a trait that generative AI models, no matter how advanced, cannot currently replicate.

What is intent classification?

To start, let's explain what intent classification is. This is a crucial component of natural language understanding (NLU) in which AI is trained to understand and categorize a user's intent behind their spoken or written command. For example, in a sentence like "Set an alarm for 7 AM", the intent is to 'Set an Alarm.' This is extremely valuable, especially in areas like chatbot development, voice assistants, and any scenario where understanding the user's purpose is crucial.

In the world of generative AI, the responses are probabilistic in nature – meaning given the same prompt, a generative AI could potentially provide different responses each time due to the inherent randomness in its design. While this may seem like a strength in many situations, such as creating diverse content, it poses a challenge in scenarios where deterministic results are paramount. This is where intent classification shines - its ability to consistently interpret the same input into the same output (intent) is invaluable.

Challenges of intent classification

Consider the scenario of a team tasked with developing an intelligent chatbot for a customer service application. The chatbot needs to understand a variety of user intents, from inquiring about service availability to requesting help with complex technical issues. The pressure is high: each misinterpreted user intent results in a frustrated customer which can negatively impact the company's reputation and bottom line.

A popular approach to intent classification is training a neural network to classify text, however training neural networks can be complex, computationally expensive, and slow – particularly for large datasets.

Neural networks for intent classification can create significant costs and can degrade the user experience​

Training a neural networkIncredibly demanding process requiring powerful hardware or cloud compute resources. The training can take anywhere from hours to days depending on the size of the training dataset.Cloud compute cost. Cost of GPUs. Hours/days of productivity lost while training.
Understanding the black boxThese models are typically black boxes, meaning their decision-making process is hard to interpret, making debugging and improving the models difficult.Cost for sophisticated MLops tools to help better understand the model or inability to troubleshoot and tune the model.
Model inferenceEspecially for large models, GPUs may be needed to quickly run inference services. Inferring using a neural network can be slow relative to other methods.Cost of running inference service. Slow inference times make responding to users query slower.
Updates and maintenanceLanguage evolves and so do customer needs. This means that new intents and phrases must be continually added and the neural network model retrained, starting the painstaking process all over again!Slow updates. Less agile. Can't react to customer feedback quickly.

The constant cycle of data preparation, training, optimization, and retraining can feel like an uphill battle, where the summit keeps moving further away. What if there was a better way?

Rather than training a neural network for intent classification, let's take a different approach. What if we could use semantic search, the same technology that powers Google search and Netflix recommendations, to build a deterministic intent classifier. Before jumping into the solution, let's define some key components – embeddings and vector databases.

What are embeddings?

Embeddings, at their core, are dense vector representations of data. In the context of NLP, they are a way of representing words, sentences, or even entire documents as points in a multi-dimensional space. The aim of these vector representations is to capture the semantic meaning of words or phrases. This means that words or phrases with similar meanings are close together in this multi-dimensional space, while those with different meanings are farther apart.

Embeddings are typically generated using various machine learning algorithms which learn the meaning of language by being trained on large amounts of text data. The most commonly known methods include Word2Vec, GloVe, and more recently, transformer-based models like BERT or GPT from OpenAI. These models learn to predict a word given its context (or vice versa), thereby learning a rich, high-dimensional representation for each word that captures its context and semantic meaning.

What is a vector database?

A vector database is designed to efficiently handle high dimensional vectors (like our embeddings). Vector databases have become increasingly crucial with the rise of artificial intelligence and machine learning applications, particularly in areas such as image recognition, natural language processing (NLP), and recommendation systems, where data is often represented as high-dimensional vectors.

In a vector database you store vectors, each of which represents some piece of data such as an image, a word or phrase (in the case of NLP), or a user's browsing history (in the case of recommendation systems). You retrieve data by providing a query vector and the database returns the stored vectors that are most similar to the query. The measure of similarity can vary, but it is often based on the cosine similarity or Euclidean distance between vectors.

Using semantic search and embeddings for intent classification offers numerous advantages. Firstly, with embeddings, each phrase, regardless of its linguistic complexity, is converted into a compact, numerical representation. This process captures the semantic context of the phrase allowing us to compare and classify user inputs based on their semantic similarity rather than exact keyword matching. When a new user input arrives, you can just calculate its embedding and use semantic search to find the closest matching intent in the database. This process is highly efficient and can be done in real-time, making it very scalable.

Imagine the same team from above now presented with the alternative of using semantic search and embeddings for intent classification. At first, there might be a healthy dose of skepticism. Could this new approach really relieve them of the constant cycle of data collection, training, and retraining? Could it help them create a chatbot that not only accurately interprets user intent but also does so consistently?

Generating embeddings doesn't require the same intensive computational resources as training a neural network. The process is faster, less resource-intensive, and offers the same, if not superior, level of semantic understanding. The results are consistent and deterministic, bringing a sense of predictability and control that was missing with the probabilistic neural networks.

Maintenance and updates, which were once dreaded activities, are now simpler. Adding new intents or phrases to their database doesn't require retraining the entire model but just calculating the new embeddings and adding them to the database. There's a sense of fluidity and dynamism now. They can respond more quickly to changes, adapting their chatbot to new languages, domains, or user behaviors with relative ease.

Overall, the solution is faster, cheaper and more reliable than a neural network.


Let's build an intent classifier using semantic search. With a few lines of code, we can have a robust and scalable intent classifier that can be deployed to a serverless application (like Google Cloud Run) to add deterministic intent classification to our chatbots and apps.

Prepare the dataset

First we need a dataset (labeled example data) that will serve as the initial intent examples. We can download a dataset from the 🤗 hub using the 🤗 dataset python package. The dataset in the example below is from Galileo and includes 16k+ records of banking chatbot queries labeled with intent.

from datasets import load_dataset, get_dataset_infos

dataset_name = 'rungalileo/banking_intent' # Name of the dataset from Huggingface

dataset = load_dataset(dataset_name) # Load the dataset
dataset_info = get_dataset_infos(dataset_name) # Load the dataset metadata (this contains the label names)

# Get Labels and index them. This is useful because we're going to replace the label indexes with label names
labels = dataset_info['rungalileo--banking_intent'].features['label'].names

# Add a column with the label names
dataset = x: x | {'label_name': indexed_labels[x['label']]})

# Convert the dataset to a list, combining the test, train, and validation data
dataset_list = []
for item in dataset.values():
dataset_list += item


# [{'id': 0, 'text': 'How do I locate my card?', 'label': 33, 'label_name': 'card_arrival'}, {'id': 1, 'text': 'I still have not received my new card, I ordered over a week ago.', 'label': 33, 'label_name': 'card_arrival'}, ...]

Create semantic embeddings

Next we need to turn our dataset into embeddings which will extract meaning out of the text and represent that meaning as a vector. We're going to use OpenAI's Embeddings API but you could also use other proprietary models or even open source models (like Bert or Bloom).


This dataset contains 16k+ records and making embeddings this way will take a very long time. Additionally, OpenAI has rate limits that will limit the speed at which you can create embeddings. Consider an asynchronous approach like using Google Cloud Tasks to orchestrate a serverless, rate-limited pipeline.

import openai

openai.api_key = "<Your OpenAI API key>"

def make_embeddings(row):
response = openai.Embedding.create(input=row['text'], model="text-embedding-ada-002")
embedding = response['data'][0]['embedding']
return row | {'embedding': embedding}

dataset_with_embeddings = map(make_embeddings, dataset_list)

Upload embeddings to vector database

Now we need somewhere to store our embeddings. We could store our embeddings in a traditional database but then we wouldn't have a way to do a semantic search comparing our input embedding to the stored embeddings. So instead, let's use a vector database - a special kind of database meant to store and query vectors. Pinecone is a great vector database because they provide great APIs and documentation so it's easy to get started.

import pinecone
from random import shuffle
import itertools

pinecone.init('<Your Pinecone API key>', environment='<Your Pinecone project environment name>')
pinecone_index_name = '<Your Pinecone index name>'

def make_vector(row):
id = f'{row["id"]}'
vector = row['embedding']
metadata = {
'type': 'classification',
'text': row['text'],
'label': row['label_name']
return (id, vector, metadata)

vectors = map(make_vector, dataset_with_embeddings)

training_amount = int(len(vectors) * .8)

training_data = vectors[:training_amount]
test_data = vectors[training_amount:]

index = pinecone.Index(pinecone_index_name)

def chunks(iterable, batch_size=100):
"""A helper function to break an iterable into chunks of size batch_size."""
it = iter(iterable)
chunk = tuple(itertools.islice(it, batch_size))
while chunk:
yield chunk
chunk = tuple(itertools.islice(it, batch_size))

# Upsert data with 100 vectors per upsert request
for ids_vectors_chunk in chunks(training_data, batch_size=100):


Ok, now let's query our embeddings. In the example below, a user asks "Where is my debit card?". We'll take this query and convert it into an embedding and then query our Pinecone index for the top 5 similar embeddings using a cosine similarity algorithm.

input_text = "Where is my debit card?"
input_embedding = make_embedding(input)

res = index.query(vector=input_embedding, top_k=5, include_metadata=True)


Now we can take the test data we set aside and evaluate how the solution performs.

def is_match(row):
res = index.query(vector=row[1], top_k=1, include_metadata=True)
label = row[2].get('label')

first_match = res['matches'][0]
if label == first_match['metadata']['label']:
return True

correctly_identified = list(filter(is_match, test_data))

performance = round(len(correctly_identified) / len(test_data) * 100, 2)
print(f'{performance}% of the queries were correctly identified.')

# 94.09% of the queries were correctly identified.

After evaluating the solution, 94.09% of the queries had the correct intent label identified in the first search result returned from Pinecone. This is incredible! These results outperform training and fine tuning models and perform faster and cheaper than traditional methods.

Next Steps

Now that you have a foundational understanding of embeddings, semantic search, vector databases, and their crucial role in intent classification, you're ready to take the next big leap - applying these concepts to build your own chatbot.

In the world of chatbot development, one platform stands out for its ease of use and powerful capabilities - Twilio. Twilio's APIs allow you to build intelligent chatbots with features like natural language processing and seamless integration with various messaging channels, including SMS, WhatsApp, and web-based chat. These tools will help you apply the concepts you've learned in real-world scenarios.

Begin by understanding the specific needs of your chatbot. What are the user intents you need to cater to? This will help you define the key phrases for which you will generate embeddings. You can use OpenAI's language model to create these embeddings, given its impressive performance in capturing the semantic meaning of phrases.

Next, set up a vector database to store these embeddings. When a user interacts with your chatbot, their input will be transformed into an embedding and the vector database will identify the most semantically similar phrases, effectively recognizing the user's intent.

As you embark on this journey, remember that building a chatbot isn't just about coding and algorithms. It's about creating an engaging, user-friendly experience. It's about understanding the needs of your users and providing them with meaningful, timely, and accurate responses. Start building with Twilio today, and unlock a world of possibilities with your newfound knowledge of embeddings, semantic search, and vector databases.

❤️ We can't wait to see what you will build!