Understanding RAG, Agentic AI, and MCP By a Real-World AI Assistant Example

Post Views: 502

After working with Large Language Models (LLMs), I was amazed by their reasoning power, but what really excited me was making LLM even more useful by connecting them to real-world data and tools. Before I learned about MCP, I tried to connect LLMs to real-world external tools such as internet search, music playback, and email sending. Eventually I managed to build a smart assistant that could handle these tasks, however the implementation was really messy and fragile. The code relied heavily on conditional prompting, for example:

“If the user asks to search for something on the internet, generate a function call like result = search_internet(query) with no explanation, output only Python code inside a code block.”

I then had to sanitize the LLM’s output, execute the generated code in Python, capture the result, and pass it back to the LLM again to produce the final response. While this approach technically worked, it was error-prone, hard to maintain, and often broke in edge cases. Adding any new functionality was a huge pain, as it required rewriting prompts, adding more conditions, and increasing the overall complexity of the system.

When I learned about MCP, I literally said, “Wow.” that’s what I really was looking for. Instead of brittle prompt engineering and manual code execution, MCP offered a clear separation of concerns, better reliability, and far easier extensibility. What previously felt like a hack suddenly became an elegant and scalable architecture.

Regarding RAG, my early experiments was enriching models with the right context to produce more accurate and context-aware responses. I found it to be a powerful approach for building smart agents that can answer questions within a specific domain, especially when grounded in private or proprietary databases. By retrieving relevant information at query time and feeding it to the model, RAG enables precise and up-to-date answers without requiring model retraining. This makes it particularly well suited for use cases such as internal knowledge assistants, documentation search, and domain-specific Q&A systems.

Indeed, LLMs are like geniuses locked in a room with no internet or tools. They can think and explain, but they can’t reach out and do anything. If you ask:

“What’s the temperature in New York right now?”

LLM will hallucinate or guess, because it has no access to real data. And if you say:

“Play my favorite song”

It’ll describe how to play it… but won’t touch your speakers. To make AI useful, we need to connect these models to tools, data, and actions safely.

In this post, I’ll walk you through how to set up a RAG pipeline and an AI agent using a local LLM with Ollama and LangChain.

LangChain

LangChain is an open-source framework designed to help developers build applications powered by Large Language Models (LLMs), such as GPT, Claude, or Llama. Instead of using an LLM as a simple text generator, LangChain allow us to connect
the model to real data sources, external APIs, or even custom Python functions. This gives the LLM the ability to reason, plan, and act through tools.

In short, LangChain helps you turn a language model into an intelligent agent that can think, use tools, and take actions autonomously.

Retrieval-Augmented Generation (RAG)

One of great features of the LangChain is providing the easy RAG setup. It enables our model fetch context from external data (documents, databases, etc.) before answering.

Here’s a minimal RAG example using LangChain. Assume we have dataset of weather information and we want AI answer user’s question based on the dataset facts. In simple words, we should feed LLM with our dataset and ask it to answer the questions based on that :

# pip install langchain langchain-ollama faiss-cpu

from langchain_ollama import ChatOllama
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain.docstore.document import Document

# 1. Create your "knowledge base"
docs = [
    Document(page_content="Tomorrow it is raining all across Germany."),
    Document(page_content="It's windy in London Today.")
]

splitter = CharacterTextSplitter(chunk_size=200)
chunks = splitter.split_documents(docs)

# 2. Build a retriever (index)
embeddings = OllamaEmbeddings(model="mxbai-embed-large")
vectorstore = FAISS.from_documents(chunks, embeddings)

# 3. Create a retrieval-based QA chain
qa = RetrievalQA.from_chain_type(
    llm=ChatOllama(model="llama3"),
    retriever=vectorstore.as_retriever()
)

# 4. Ask something beyond the model’s memory
query = "What's the weather like tomorrow in Berlin?"
print(qa.run(query))

Output:

“Tomorrow is raining in Berlin.”

That’s RAG in action. Giving our LLM fresh memory without retraining. It starts by defining a small knowledge base of text documents and splits them into manageable chunks. Each chunk is then converted into a numerical representation (embeddings) using the OllamaEmbeddings model. These embeddings are stored in a FAISS vector database, which allows efficient similarity searches. When a user asks a question, the retriever searches for the most relevant text chunks based on their embeddings and passes them, along with the query, to the local language model (llama3). The model then generates a final answer grounded in the retrieved context, effectively extending its knowledge beyond its built-in training data.

AI Agent

Now we understand that RAG gives the model knowledge, but not arm to take actions like playing music or sending email. To let it perform tasks, LangChain introduces Agents that can reason and then act using tools.

Example: a simple agent that can use a calculator, Wikipedia, and a method that reverse given text.

# pip install langchain langchain-ollama wikipedia

from langchain.agents import initialize_agent, load_tools, AgentType, Tool
from langchain_ollama import ChatOllama

# === Step 1: Initialize the local Ollama model ===
llm = ChatOllama(model="llama3", temperature=0)

# === Step 2: Load built-in tools ===
tools = load_tools(["wikipedia", "llm-math"], llm=llm)

# === Step 3: Define a custom tool ===
def reverse_text(text: str) -> str:
    """
    A simple custom tool that reverses a string.
    You can replace this with any real logic you want — e.g., playing music, reading files, etc.
    """
    return text[::-1]

custom_tool = Tool(
    name="ReverseText",
    func=reverse_text,
    description="Reverses the given text string."
)

# === Step 4: Add the custom tool to the list ===
tools.append(custom_tool)

# === Step 5: Initialize the agent ===
agent = initialize_agent(
    tools,
    llm,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# === Step 6: Run an example query ===
response = agent.invoke({
    "input": "Find the year Einstein was born, multiply it by 2, and then reverse that number as text."
})

print("\nFinal Answer:")
print(response["output"])

Output:

“Albert Einstein was born in 1879. 1879 × 2 = 3758. And the reverse of the result is 8573.”

The agent looked up Einstein’s birth year via Wikipedia, used the calculator tool and combined both into a reasoning chain, finally using reverse method it reversed the result . That’s a small taste of agentic reasoning. The model doesn’t just answer, it decides how to answer. First it retrieves related information from Wikipedia, perform calculation on the birthday, then he decide to use reverse tool to generate final result.

MCP (Model Context Protocol)

The problem with Langchain agentic tools is that we have to work only with local functions defined in the same project and language that LLM is working. The MCP came to solve this issue. Think of MCP as an open protocol that standardizes how LLMs can use external tools. The MCP server communicates with LLMs, telling them which tools are available and how each one can be used, it exposes the methods’ name and signature. The langchain-mcp-adapters package lets LangChain agents connect directly to MCP servers, local or remote, retrieve and use their tools automatically. Using LangChain MCP adapter Our model can use any MCP tool, even if it’s written in another language or running remotely. Let’s build it step-by-step.

Here I wrote a MCP server which is running on stdio. The stdio MCP server servers local requests. So only local LLM application can make use of it:

# This is math_mcp_server.py file
# run pip install mcp in command line to install mcp package
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Math")

@mcp.tool()
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

@mcp.tool()
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

if __name__ == "__main__":
    mcp.run(transport="stdio")

I’ve wrote another MCP server which serves on streamable-http. The MCP servers running on streamable-http can server all LLMs in the network:

# This is weather_mcp_serve.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Weather")

@mcp.tool()
async def get_weather(location: str) -> str:
    """Return simple fake weather info."""
    return f"The weather in {location} is sunny and 25°C."

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

LangChain Agent that Connects to Both MCP Servers:

# pip install langchain-mcp-adapters langchain langchain-ollama langgraph

import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_ollama import ChatOllama

async def main():
    client = MultiServerMCPClient({
        "math": {
            "transport": "stdio",
            "command": "python",
            "args": ["math_server.py"],
        },
        "weather": {
            "transport": "streamable_http",
            "url": "http://localhost:8000/mcp",
        },
    })

    tools = await client.get_tools()
    llm = ChatOllama(model="gpt-oss:120b-cloud", temperature=0)
    agent = create_react_agent(
        llm,
        tools
    )

    math_result = await agent.ainvoke({
        "messages": [{"role": "user", "content": "What is (3 + 5) x 12?"}]
    })
    print("Math:", math_result)

    weather_result = await agent.ainvoke({
        "messages": [{"role": "user", "messages": "What's the weather in Paris?"}]
    })
    print("Weather:", weather_result)

if __name__ == "__main__":
    asyncio.run(main())

It first creates a MultiServerMCPClient that connects to two tools — a local math_server.py via stdio and a remote weather service via HTTP. Then it retrieves available tools and initializes a ReAct agent (create_react_agent) powered by the ChatOllama large language model (gpt-oss:120b-cloud). Then it dynamically discovers their tools and chooses the right tool based on our prompt. No hardcoding!

With MCP, tools become modular and shareable, any LLM that speaks MCP can use them. And with LangChain, your agent can orchestrate all of that, locally or remotely.

Thanks for reading the post.

2 thoughts on “Understanding RAG, Agentic AI, and MCP By a Real-World AI Assistant Example”

Angila says:

January 7, 2026 at 11:43 am

What’s the advantages of LangChain over LLmIndex or lighRag?

vans says:

March 5, 2026 at 3:50 am

Using MultiServerMCPClient to bridge local and remote tools is definitely the way forward. It makes the whole ecosystem so much more modular. For anyone trying to implement this in production, I stumbled upon some solid resources here (https://open-claw.online/) that might help with the integration.