Episode 14: Week 1 of Building Multi-Agent Applications - A Bootcamp

I have recently joined a mentorship program for building Multi-Agent LLM Applications and will be sharing my learnings in a series of detailed blog posts over the next couple of weeks.

My blog posts will be divided into two sections:

  1. The Bootcamp: What I learned from the bootcamp during the week.
  2. The Basics: What I had to learn on my own to understand #1.

The former talks about the interesting insights shared by industry practitioners and experts in this field and the latter is on the basics and fundamentals of generative AI technology that I had to catch up with in order to understand these complex discussions.

Let’s get right in!

Disclaimer: All memes in this post are AI-generated using Predis.ai

There is a short demo video of a personal vacation planning AI assistant built using no-code LangFlow at the end of this post!

History of LLMs [1]

The Basics:

What is the difference between monolithic models and compound AI systems? [2]

A model on its own has limited knowledge and finds it hard to adapt to new problems. Suppose I train an LLM on a fixed training dataset and it is perfect for tasks like summarizing documents and drafting emails but when I ask it a specific question like, “What is my company’s HR policy on taking sick days?” it would not give a correct answer because it does not have access to my personal data.

Therefore, people have started building compound AI systems in which multiple, specialized LLMs are integrated into an application that can also include non-LLM models, to perform different workflows. For example, there will be an LLM specialized in summarizing documents, another that specializes in translation and together they can work together within the application to perform different tasks and produce more accurate results. These LLMs have access to databases, can write specific search queries using different tools, remember past interactions and therefore give specific answers that can be more accurate. The system can also contain non-LLM models such as a traditional ML based recommender system which can be used as a tool by the agent to perform its tasks with reasoning.

LLM agents using an external database to augment their knowledge is the example of the most common AI compound system called Retrieval Augmented Generation (RAG). [1]

What is an agent? [3]

Traditionally, control logic in systems was designed using fixed instructions. The code followed a predictable and deterministic path. But in an agentic flow, multiple LLM agents can be used to guide the flow of the application. They can be used to perform specialized tasks using special tools, different data sources and a history of past interactions with the end user. This allows the agents to become increasingly specialized in their tasks and gain better domain knowledge with each iteration. So, in summary, an agent has three properties:

What is the difference between agents and LLMs? [3]

It is possible to build a compund AI system with non-agentic flow while using LLMs. In such a system, LLMs will perform different tasks they are trained to perform but the workflow being executed will more or less be pre-determined. For example, an LLM can be asked to provide the user with all the national holidays of the country being given as an input. This LLM will not use any reasoning to come up with the answer. It will simply return a pre-determined list of dates that are national holidays in the given country and are always the same for that country. The response will never change for the same prompt.

However, in an agentic system, the LLM has some level of “agency” or autonomy in the decision making process. If an agent is provided with different tools and uses reasoning to decide which tools to use to perform the task in a better way, then that is an agentic workflow. This is similar to the concept of policies in reinforcement learning - the mechanism by which the environment provides feedback to the agent about its actions. The agent learns a policy (a mapping from states to actions) to maximize its cumulative reward.

Just because a system is complex, doesn’t mean it is agentic but most complex workflows might benefit from an agentic solution!

How to decide between using a monolithic system and a compound AI system for your application? [3]

When the task at hand is simple, does not depend on past interactions, and involves only one function, then using a single LLM makes more sense than using LLM agents in a compound AI system.

Example: Summarizing documents.

But when the task involves several subtasks, when the reward function becomes too complex with multiple constraints for a single LLM, when the response depends not only on the user prompt but other data sources, then using multiple specialized LLMs (agents) makes more sense.

Example: A chatbot that assists users. In addition to user prompts, the agents need to access the company’s confidential policy documents to make decisions. The decision is supposed to optimize multiple factors such as conversion and retention. Coming up with a reward function that maximizes both is going to be a challenge so developing multiple agents that specialize in optimizing each of these constraints is more useful.

What is the lang family?[4]

The Lang Family [4]

LangChain:

LangGraph:

Example of a code using LangChain. Source [4]

LangFlow:

LangSmith:

The Lang Family

The Bootcamp

What is the product strategy formula for developing LLM based products? [5]

Without domain driven product design, service oriented and micro-service architecture, you cannot build rock solid agentic applications. Selling products where automation is masked as agentic workflow comes under “agentic washing”, which is deception. The product strategy for a good agentic application should be designed keeping in mind the following components:

The agent should have the following features: [9]

"Agentic Washing" is a crime

One example where an LLM product launch went badly for a big company? [6]

In Feb 2024, Air Canada’s customer service chatbot misguided a customer on the carrier’s refund policy on bereavement fares. When a customer asked the chatbot about the policy, the chatbot replied:

“Air Canada offers reduced bereavement fares if you need to travel because of an imminent death or a death in your immediate family…If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.”

The page that the chatbot’s response linked to, which was Air Canada’s official website, contradicted what the bot was saying which resulted in the carrier losing the lawsuit and paying in damages. The chatbot was clearly hallucinating and extrapolating.

Air Canada ended up shutting down the chatbot and firing a bunch of data scientists. What could Air Canada have done differently? The chatbot should have warned users about potential inaccuracies in its responses.

The Lang Family

One example of a rock solid LLM product launch? [7]

Navan (formerly TripActions) is an online travel management, corporate card and expense management company.

What was Navan’s product strategy? [9]

What is AI SDK 3.4 with Generative UI? [9]

Imagine asking a ChatBot:

“Help me plan an affordable vacation to Paris, including budget-friendly flights, accommodations, and activities?” and instead of getting a response in text form, you get a dynamic, interactive response including dynamic filters and clickable elements for tailored adjustments, e.g.

Vercel’s AI SDK 3.4, released in September 2024, allows developers to build such applications.

What are the different steps in building user’s trust with your AI application? [5]

What are some of the components of a product requirements document for non-deterministic AI products: [5]

  1. Model requirements
    • Which model to use?
    • How big is the context window (e.g. how much memory do different agents need to have for their specializations.)
    • Latency (is it fast enough?)
    • Open vs Closed-source models ($$$)
    • Need to fine-tune?
    • Modality (i.e. in what form will the model take input and generate output e.g. text, image etc. )
  2. Data:
    • Sources of training and feedback data.
    • Ways to collect, prepare, store data.
    • Training method: fine-tuning or RLHF.
  3. Prompt requirements

  4. AI User Experience

  5. AI Testing & Measurement

  6. AI Risks

  7. AI Costs:
    • Data acquisition
    • Model Fine-tuning
    • Operational expenses with token usage
Feature Fine-Tuning RLHF
Data Type Labeled, task-specific Rankings or feedback from humans
Objective Improve accuracy for defined tasks Align outputs with human preferences
Training Complexity Relatively simpler More complex (requires human input and reward modeling)
Outcome Task-specific optimization Improved alignment with user expectations, adaptability to subjective contexts
Example Updating a language model to improve its medical terminology responses by training it on medical-specific data. Human’s feeback on the quality of the essay returned by the chatbot

Demo Time

This application takes a city name as input from the user, retrieves statutory holidays, and identifies the best surrounding cities, creating an itinerary complete with budget, weather forecasts, and travel recommendations for that city and its surrounding areas for each vacation date. Time to book those vacation days at your job!

While this is a pretty cool application, this is not purely agentic and therefore serves as a good example to learn about the difference between agentic and automated workflows.

Agent 1 and 2 are not actually agents but LLMs following a pre-determined workflow. Each agent in all most all relevant user queries will follow the same workflow without any autonomous decision making. It will always call the search API, look for holidays of the input city and neighboring cities and return the list. It will always give the same response for the same input city.

Agent 3, however, is truly agentic. Since it is using reasoning and planning to come up with an itinerary and the responses might change on the next iteration for the same input city.

One way to convert it into an agentic workflow would be to:

References:

[1] Large Language Models (LLMs) - Everything You NEED To Know
[2] What are AI Agents?
[3] LLM Agents, Part 2 - What the Heck are Agents, anyway?
[4] LangGraph vs LangChain vs LangFlow vs LangSmith : Which One To Use & Why?
[5] LLM Products vs Traditional Digital Products by Amin Bashi
[6] What Air Canada Lost In ‘Remarkable’ Lying AI Chatbot Case
[7] Navan Introduces World-First Generative AI Tool for CFOs and Travel Admins [8] Bringing React Components to AI
[9] Building Agentic Products by Amin Bashi

Share this: