GPT Models Can Do More Than Just Talk: How to Make Them Brew You Coffee ☕

Intent Classification 🎯
Intent classification is the task of figuring out what the user wants to achieve or do from their natural language input. For example, if the user says "Can you make me a coffee?", the intent could be "Make a coffee". This way, we can map the user's input to a specific action or function that we want our GPT model to perform.
To do this, I used Ada Models, which I wrote about in a previous article. Ada Models are models that return the embedding of the text, which is a numerical representation of its meaning. The embedding captures the semantic and syntactic features of the text, such as the words, phrases, and context.
By using a simple formula, we can find the most relatable of the embedding among a set of predefined categories, which indicates the most likely intent. The formula is based on the Euclidean distance, which measures how close two vectors are in a multidimensional space. The smaller the distance, the more similar the vectors are.
For example, if we have three categories: "Make a coffee", "Make a tea", and "Make a sandwich", and we have an embedding for each category, we can compare the embedding of the user's input with each category embedding and find the one with the smallest distance. If the user says "Can you brew me a coffee?", the embedding of their input will be closer to the embedding of "Make a coffee" than to the other two categories, so we can infer that their intent is "Make a coffee".
A sample of how we can find the intent can be found by categorizing the intents into actions. So for example, if you have an action like Make a coffee
, you can define it as a category with a user-provided prompt and a function to execute:
Make a coffee => (User Provided Prompt = "Can you make me a coffee?") => MakeACoffee()
However, to increase the accuracy and flexibility of the system, we don't just define one category for each action, but more different ways of expressing the same intent. For example:
Make me a coffee => (User Provided Prompt = "Can you make me a coffee?") => MakeACoffee()\
Bring me a coffee => (User Provided Prompt = "Can you bring me a coffee?") => MakeACoffee()\
Buy me a coffee => (User Provided Prompt = "Can you buy me a coffee?") => MakeACoffee()
As you can see, all these categories will trigger the same function: MakeACoffee().
Rule-Based System 📜
The rule-based system is like any other chatbot system, but ours will be very dynamic and powerful and use AI from start to end. The concept is to make a list of categories as keys and objects as values:
const intents = {
categories: {
"Make a coffee": MakeACoffee,
"Make me a coffee": MakeACoffee,
"Bring me a coffee": MakeACoffee,
"Buy me a coffee": MakeACoffee,
},
};
Each object has an action, which can be either an API call or something else, a URL to call if it's an API call and a response function to handle the result of the call. For example:
const MakeACoffee = {
action: "LOCAL_ACTION",
response : (response) => 'Your coffee is here!'
};
A function would make a fetch request and return the response. You can customize the response as much as you like.
Now whenever you ask for, Make me a coffee
Your result would be your coffee is here.
But what if you have follow-up questions or options? How would you handle that?
For example, when you ask the system to makeACoffee, you might want to handle what kind of coffee use case. To do that, we will have a recursive approach.
Each object can also have an ask function to generate a follow-up question and a categories object to define the possible options for the user. For example:
const MakeACoffee = {
action: "API_CALL",
url: "http://localhost:3000/coffees",
ask: [
(coffees) => {
return `What kind of coffee would you like ${coffees.map(
(coffee) => coffee
)}?`;
},
],
categories: () => ({
Mocha: MochaCoffee,
Karak: KarakCoffee,
Espresso: EspressoCoffee,
}),
};
Now with this approach, we will have a list of coffees in the response with a follow-up question and another object to handle the user's choice.
const MochaCoffee = {
action: "LOCAL_ACTION",
response: (response) => "Your mocha coffee is here!",
};
And that's it! You have successfully integrated your GPT model with your own data sources and actions. You can now enjoy your coffee while chatting with your AI friend. ☕
Behind the Scenes 🕵️♂️
Now let's look at the behind the scene working of this application. I showed you the configuration, but let's look at the internals.
The first step was to get the embedding of each category. Using ADA model, here is how you can get the category embeddings:
const initializeCategoriesEmbedding = async (categories) => {
const category_embeddings = {};
const listOfPromise = [];
categories.forEach((category) => {
listOfPromise.push(
new Promise((resolve) => {
const input = { input: category, model: "text-embedding-ada-002" };
openai.createEmbedding(input).then((response) => {
const embedding = response.data.data[0].embedding; // extract embedding array
category_embeddings[category] = embedding; // add category and embedding to the object
resolve();
});
})
);
});
await Promise.all(listOfPromise);
return category_embeddings;
};
Then, let's put the user input into the classification and see in which category it will fall:
const classification = (newInput, category_embeddings) => {
return new Promise(async (resolve, reject) => {
const input = {
input: newInput,
model: "text-embedding-ada-002",
};
const response = await openai.createEmbedding(input).catch(() => {
throw new Error("Error");
});
const new_article_embedding = response.data.data[0].embedding; // extract new article embedding array
// Compare new article embedding with category embeddings
let min_distance = Infinity; // initialize minimum distance to infinity
let best_category = ""; // initialize best category to empty string
for (let category in category_embeddings) {
// for each category in the object
let category_embedding = category_embeddings[category]; // get category embedding array
let distance = euclideanDistance(
new_article_embedding,
category_embedding
); // calculate euclidean distance between new article and category embeddings
if (distance < min_distance) {
// if distance is smaller than minimum distance
min_distance = distance; // update minimum distance
best_category = category; // update best category
}
}
resolve(best_category);
});
};
Here is the Euclidean distance function if you are wondering:\
function euclideanDistance(a, b) {
// a and b are arrays of numbers with the same length
let sum = 0; // initialize sum to zero
for (let i = 0; i < a.length; i++) {
// for each element in the arrays
let diff = a[i] - b[i]; // calculate the difference between corresponding elements
sum += diff * diff; // add the square of the difference to the sum
}
return Math.sqrt(sum); // return the square root of the sum
}
Lastly, let's build our own logic to extract the data and process it. The code below shows how we can select a category and execute its action:
const categorySelection = async (category, cate) => {
const selectedCat = cate && cate[category];
if (selectedCat?.action === "API_CALL") {
fetch(selectedCat.url)
.then((e) => e.json())
.then(async (e) => {
if (selectedCat.categories && selectedCat.ask) {
newQuestion = selectedCat.ask.map((x) => x(e));
init(newQuestion, selectedCat.categories(e));
} else {
messages.push({
content: JSON.stringify(e),
role: "system",
});
openai
.createChatCompletion({
model: "gpt-3.5-turbo",
messages,
})
.then((e) => console.log(e.data.choices[0].message.content))
.catch((e) => console.log(e));
}
});
} else if (selectedCat?.action === "LOCAL_ACTION") {
if (selectedCat.categories && selectedCat.ask) {
newQuestion = selectedCat.ask.map((e) => e(selectedCat.categories()));
await init(newQuestion, selectedCat.categories());
}else{
console.log(selectedCat.response())
}
} else {
console.log({ conversation });
}
};
This code demonstrates how we can select a category and execute its action based on the user input. When we pass a category to the function, it will look it up in the list of categories and check if it requires an API call. If it does, then we will make an API request and get the response. Then we will pass the response to the init function again, which will repeat the same process with the new input and the new categories. This way, we can handle multiple levels of interaction and options.
If there are no more categories to choose from, we will pass the response to the gpt-3.5-turbo model and let it generate a natural language reply for the user. This way, we can leverage the power of AI to create engaging and natural conversations.
If we want to handle other types of actions that don't require an API call, we can define them as LOCAL_ACTION and write our own logic for them. In the above example, we are not making an API request for this type of action.
Now let's look at the calling function for all the above processes:\
const init = async (newInputs, cate) => {
await initializeCategoriesEmbedding(Object?.keys(cate || {})).then(
async (embeddings) => {
const inp = Array.isArray(newInputs) ? newInputs : [newInputs];
const responses = inp.map((newInput) => {
messages.push({
content: newInput,
role: 'system',
});
const input = prompt(newInput + ' : ');
messages.push({
content: input,
role: 'user',
});
conversation.push({ question: newInput, reply: input });
return input;
});
const result = await classification(responses.join(' : '), embeddings);
await categorySelection(result, cate);
},
);
};
init(
[
'Welcome to Our Support, Please Enter Your Name : ',
'Great, What can i do for you?',
],
data.categories,
);
This function takes an array of new inputs and a categories object as arguments. It then initializes the category embeddings using the Ada model and maps each new input to a user response. It then passes the responses to the classification function and gets the best category. It then selects the category and executes its action.
That's how we can make our GPT model brew us a coffee using intent classification and a rule-based system. This approach can be applied to any other data sources, APIs, or actions that you want to integrate with your GPT model. The possibilities are endless!

Raja Osama
I Describe Myself as a Polyglot ~ Tech Agnostic ~ Rockstar Software Engineer. I Specialise in Javascript-based tech stack to create fascinating applications.
I am also open for freelance work, and in-case you want to hire me, you can contact me at rajaosama.me@gmail.com or contact@rajaosama.me