OpenAI’s Assistants API — A hands-on demo
Introduction
Let me start by saying that I am not an AI expert and some of the things I say here may sound simplistic, but it’s the way that helps me understand them. So, let’s begin.
OpenAI just released the Assistants API, which pretty much simplifies the process of setting up a Q&A system based on a knowledge base for GPT.
To quote the OpenAI docs:
Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.
This is pretty amazing, if you think that the process of chunking, indexing, storing and vector-searching the embeddings was done separately, with customized solutions.
I have a short section at the end of this document, explaining what embeddings and vectors are. And also a brief explanation on how you would implement a q&A based on a knowledge base without the Assistants API.
The demo
You can check out my Github repo which contains the example which I am going to explain here in detail:
https://github.com/mandarini/openai-assistant-demo
Please spend some time to also go through the README of that repo.
Motivation
Building the Nx AI Assistant was a journey that taught us quite a lot about how to use the OpenAI APIs for chat completions, and embeddings, and gave us a taste on how to interact with LLMs. So it was only natural that I wanted to try out the brand new API that would render some of that work obsolete (at least I think so, I’m no expert).
So, in the demo, which I am going to present in this blog post, I am using the Nx docs, I am uploading them on OpenAI, and then I am adding them in a new assistant. Then, I use this assistant to ask questions about these docs.
Project Overview
I have created a super simple chat interface (ChatGPT helped a lot) that accepts a question from the user, and replies based on the knowledge base of the Assistant.
The logic is the following:
- Create an assistant: a. Upload your files to OpenAI, and b. Create an Assistant and pass it all the files id’s of the files you uploaded
- Create a new message thread
- Create a new message in the message thread with the user’s query
- Run the thread using the Assistant that was just created
- Retrieve the run’s result once run is complete
- Return the messages to the UI
Let’s see these steps in more detail.
Step 1: Creating the assistant
The assistant creation step happens once, and “offline”. You only need to run it again if you want to update your knowledge base. I am not going to cover this in the present blog post.
Uploading the files
The first thing that you need to do is upload all the files that you want to use. In this example, I am uploading the files programmatically, from a local directory, like this:
for (const file of allFileNames) {
…
const oneFile = await openai.files.create({
purpose: 'assistants',
file: fs.createReadStream(filePath),
});
files.push(oneFile);
}
The result of the openai.files.create
gives us back, among other metadata, the file’s id. You can see the full list of your uploaded files under https://platform.openai.com/files this URL. Each file can be deleted, you can copy it’s ID or you can manually upload more files. All of these actions can also be achieved programmatically with the Files object and methods.
Creating the assistant
Now, once you have all your files uploaded, and you also have their id’s handy, you can proceed and create your assistant, either programmatically, or through the GUI. Here is how it can be achieved programmatically:
const assistant: OpenAI.Beta.Assistants.Assistant =
await openai.beta.assistants.create({
instructions:
'You are Nx Assistant, a helpful assistant for Nx Dev Tools. Your primary role is to provide accurate and sourced information about Nx Dev Tools. Rely solely on the information in the files you have; do not use external knowledge. If the information is not in the files, respond with "Sorry I cannot help with that".',
model: 'gpt-4–1106-preview',
tools: [{ type: 'retrieval' }],
file_ids: […files.map((file) => file.id)],
});
Of course you can customize the instructions, but make sure to add that you only want your model to rely on the existing knowledge base. The openai.beta.assistants.create
function will give us back some metadata about our assistant, including its ID. We can see the list of our assistants and their information (id, etc) under this URL: https://platform.openai.com/assistants
Step 2: Creating a thread
Initialize the thread
The first thing that needs to happen when the chat page is initialized, is for our client to create a message Thread. The message thread represents the conversation between the AI and the user. The command to create a new thread is simple:
await openai.beta.threads.create()
This function will return, among other metadata, also the thread’s id
, which we are going to use to run it. This should ideally happen once every session (or once every time the user “resets” the chat).
Add the user’s message to the thread
Once we have the id of our thread, we can pass in that thread the message that the user sends from the chat interface. The user’s question, that is. Let’s take a look at the code:
await openai.beta.threads.messages.create(thread.id, {
role: 'user',
content: userQuery,
});
The role: user
represents the (surprise!) user. The role: assistant
represents the OpenAI API GPT assistant. The role names are defined in the API.
Step 3: Running the thread
Create a new “run” instance
Now that our thread is ready we can run it. First we need to create a “run” instance for our thread, and pass the id
of our assistant to it:
const run = await openai.beta.threads.runs.create(thread.id, {
assistant_id: assistantId as string,
});
This function will return (among other metadata) a run id for this particular run.
Wait for run to complete
Now, this is where it gets tricky. You can read about the run lifecycle in the OpenAI docs: https://platform.openai.com/docs/assistants/how-it-works/runs-and-run-steps
Reading from the above page:
In order to keep the status of your run up to date, you will have to periodically retrieve the
Run
object. You can check the status of the run each time you retrieve the object to determine what your application should do next. We plan to add support for streaming to make this simpler in the near future.
We need to manually poll our run to see if it has completed. The openai.beta.threads.runs.retrieve(threadId, runId)
function returns a status and once the status is completed we can finally call the openai.beta.threads.messages.list(threadId)
function to get back a full list of all our messages in the thread.
Here is a simple implementation of the polling logic:
while (timeElapsed < timeout) {
const run = await openai.beta.threads.runs.retrieve(threadId, runId);
if (run.status === 'completed') {
const messagesFromThread: OpenAI.Beta.Threads.Messages.ThreadMessagesPage =
await openai.beta.threads.messages.list(threadId);
resolve({ runResult: run, messages: messagesFromThread });
return;
}
await new Promise((resolve) => setTimeout(resolve, interval));
timeElapsed += interval;
}
Once we get back our result, we can return the array of messages to the front end.
Future Enhancements
Part of the immediate improvement plans is to add a list of sources at the end of each message. This is easy to implement, since each message object contains a list of file id’s that were used for that message to be created. Then, it will be a matter of calling the openai.beta.threads.messages.retrieve
function
(https://platform.openai.com/docs/api-reference) and getting the title of that file.
Of course, once streaming is implemented by the API, it should be used instead of polling, and it would make the response appear much faster.
Lots of other improvements could be added, and it’s still to be defined if implementing the assistant programmatically provides benefits over creating a custom GPT (ref: https://www.builder.io/blog/custom-gpt).
Conclusion
OpenAI’s Assistants API represents a significant step forward in making complex AI functionalities more accessible and practical for developers. By integrating a knowledge base directly into an AI assistant, we’re able to create a dynamic, responsive, and highly intelligent system that can provide specific, sourced information on demand. This was possible before, but with custom solutions, by “manually” chunking text and creating embeddings. “Manually” storing them in a vector db and implementing the vector search to return relevant sections according to a user query’s embeddings, to add as context.
The new approach is much simpler and more straightforward, as it comes out-of-the-box with the new API. Questions remain as to whether this approach can leverage issues such as potential downtime, or extra configurations that the other APIs support, but the feature is still in beta, so let’s see what will come next!
Glossary
Core concepts
I find it useful to explain what some terms mean.
Embeddings
What they are
In the context of machine learning, embeddings are a type of representation for text data. Instead of treating words as mere strings of characters, embeddings transform them into vectors (lists of numbers) in a way that captures their meanings. In embeddings, vectors are like digital fingerprints for words or phrases, converting their essence into a series of numbers that can be easily analyzed and compared.
Why they matter
With embeddings, words or phrases with similar meanings end up having vectors that are close to each other, making it easier to compare and identify related content.
Generative AI
What it is
Generative AI, the technology driving the Nx Docs AI Assistant, is a subset of AI that’s trained, not just to classify input data, but to generate new content.
How it works
Generative AI operates like a sophisticated software compiler. Just as a compiler takes in high-level code and translates it into machine instructions, generative AI takes in textual prompts and processes them through layers of neural network operations, resulting in detailed and coherent text outputs. It’s like providing a programmer with a high-level task description, and they write the necessary code to achieve it, except here the ‘programmer’ is the AI, and the ‘code’ is the generated text response.
What Does “Generation” Mean in AI Context?
In AI, especially with natural language processing models, “generation” refers to the process of producing sequences of data, in our case, text. It’s about creating content that wasn’t explicitly in the training data but follows the same patterns and structures.
How Does GPT Predict the Next Word?
GPT, which stands for “Generative Pre-trained Transformer”, works using a predictive mechanism. At its core, it’s trained to predict the next word in a sentence. When you provide GPT with a prompt, it uses that as a starting point and keeps predicting the next word until it completes the response or reaches a set limit.
It’s like reading a sentence and trying to guess the next word based on what you’ve read so far. GPT does this but by using a massive amount of textual data it has seen during training, enabling it to make highly informed predictions.
Links
Github: https://github.com/mandarini/openai-assistant-demo
The live demo: https://openai-assistant.vercel.app/ (which may or may not work because of API limits etc — it’s just a live demo)
Follow me on X: https://twitter.com/psybercity
Follow Nx: https://twitter.com/NxDevTools