{"id":2237,"date":"2025-10-12T07:27:59","date_gmt":"2025-10-12T17:27:59","guid":{"rendered":"https:\/\/mshaeri.com\/blog\/?p=2237"},"modified":"2025-12-13T01:11:03","modified_gmt":"2025-12-13T11:11:03","slug":"understanding-rag-agentic-ai-and-mcp-by-a-real-world-ai-assistant-example","status":"publish","type":"post","link":"https:\/\/mshaeri.com\/blog\/understanding-rag-agentic-ai-and-mcp-by-a-real-world-ai-assistant-example\/","title":{"rendered":"Understanding RAG, Agentic AI, and MCP By a Real-World AI Assistant Example"},"content":{"rendered":"\n<p>After working with Large Language Models (LLMs), I was amazed by their reasoning power, but what really excited me was making LLM even more useful by connecting them to real-world data and tools. Before I learned about MCP, I tried to connect LLMs to real-world external tools such as internet search, music playback, and email sending. Eventually I managed to build a smart assistant that could handle these tasks, however the implementation was really messy and fragile. The code relied heavily on conditional prompting, for example:<\/p>\n\n\n\n<p><br><em>\u201cIf the user asks to search for something on the internet, generate a function call like <code>result = search_internet(query)<\/code> with no explanation, output only Python code inside a code block.\u201d<\/em><\/p>\n\n\n\n<p>I then had to sanitize the LLM\u2019s output, execute the generated code in Python, capture the result, and pass it back to the LLM again to produce the final response. While this approach technically worked, it was error-prone, hard to maintain, and often broke in edge cases. Adding any new functionality was a huge pain, as it required rewriting prompts, adding more conditions, and increasing the overall complexity of the system.<\/p>\n\n\n\n<p>When I learned about MCP, I literally said, <em>\u201c<strong>Wow<\/strong>.\u201d<\/em> that&#8217;s what I really was looking for. Instead of brittle prompt engineering and manual code execution, MCP offered a clear separation of concerns, better reliability, and far easier extensibility. What previously felt like a hack suddenly became an elegant and scalable architecture.<\/p>\n\n\n\n<p>Regarding RAG, my early experiments was enriching models with the right context to produce more accurate and context-aware responses. I found it to be a powerful approach for building smart agents that can answer questions within a specific domain, especially when grounded in private or proprietary databases. By retrieving relevant information at query time and feeding it to the model, RAG enables precise and up-to-date answers<strong> without requiring model retraining<\/strong>. This makes it particularly well suited for use cases such as internal knowledge assistants, documentation search, and domain-specific Q&amp;A systems.<\/p>\n\n\n\n<p><strong>Indeed, LLMs are like geniuses locked in a room with no internet or tools. They can think and explain, but they can\u2019t reach out and <em>do<\/em> anything<\/strong>. If you ask:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\u201cWhat\u2019s the temperature in New York right now?\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>LLM will hallucinate or guess, because it has no access to real data. And if you say:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\u201cPlay my favorite song\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>It\u2019ll describe how to play it\u2026 but won\u2019t touch your speakers. To make AI <strong>useful<\/strong>, we need to connect these models to <strong>tools<\/strong>, <strong>data<\/strong>, and <strong>actions<\/strong> safely.<\/p>\n\n\n\n<p>In this post, I\u2019ll walk you through how to set up a RAG pipeline and an AI agent using a local LLM with Ollama and LangChain.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LangChain<\/h2>\n\n\n\n<p><strong>LangChain <\/strong>is an open-source framework designed to help developers build applications powered by Large Language Models (LLMs), such as GPT, Claude, or Llama. Instead of using an LLM as a simple text generator, <strong>LangChain <\/strong>allow us to connect<br>the model to real data sources, external APIs, or even custom Python functions. This gives the LLM  the ability to reason, plan, and act through tools. <\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p><strong>In short,<\/strong> <strong>LangChain helps you turn a language model into an <em>intelligent agent<\/em> that can think, use tools, and take actions autonomously.<\/strong><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Retrieval-Augmented Generation (RAG)<\/h2>\n\n\n\n<p>One of great features of the LangChain is providing the easy RAG setup. It enables our model <strong>fetch context<\/strong> from external data (documents, databases, etc.) before answering.<\/p>\n\n\n\n<p>Here\u2019s a minimal RAG example using <strong>LangChain<\/strong>. Assume we have dataset of weather information and we want AI answer user&#8217;s question based on the dataset facts. In simple words, we should feed LLM with our dataset and ask it to answer the questions based on that :<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python\"># pip install langchain langchain-ollama faiss-cpu\n\nfrom langchain_ollama import ChatOllama\nfrom langchain.text_splitter import CharacterTextSplitter\nfrom langchain.vectorstores import FAISS\nfrom langchain.embeddings import OllamaEmbeddings\nfrom langchain.chains import RetrievalQA\nfrom langchain.docstore.document import Document\n\n# 1. Create your \"knowledge base\"\ndocs = [\n    Document(page_content=\"Tomorrow it is raining all across Germany.\"),\n    Document(page_content=\"It's windy in London Today.\")\n]\n\nsplitter = CharacterTextSplitter(chunk_size=200)\nchunks = splitter.split_documents(docs)\n\n# 2. Build a retriever (index)\nembeddings = OllamaEmbeddings(model=\"mxbai-embed-large\")\nvectorstore = FAISS.from_documents(chunks, embeddings)\n\n# 3. Create a retrieval-based QA chain\nqa = RetrievalQA.from_chain_type(\n    llm=ChatOllama(model=\"llama3\"),\n    retriever=vectorstore.as_retriever()\n)\n\n# 4. Ask something beyond the model\u2019s memory\nquery = \"What's the weather like tomorrow in Berlin?\"\nprint(qa.run(query))\n<\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\u201cTomorrow is raining in Berlin.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>That\u2019s RAG in action. Giving our LLM fresh memory without retraining. It starts by defining a small knowledge base of text documents and splits them into manageable chunks. Each chunk is then converted into a <strong>numerical representation <\/strong>(<strong>embeddings<\/strong>) using the <strong><code>OllamaEmbeddings<\/code> <\/strong>model. These embeddings are stored in a <strong>FAISS <\/strong>vector database, which allows efficient similarity searches. When a user asks a question, the retriever searches for the most relevant text chunks based on their embeddings and passes them, along with the query, to the local language model (<code><strong>llama3<\/strong><\/code>). The model then generates a final answer grounded in the retrieved context, effectively extending its knowledge beyond its built-in training data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">AI Agent<\/h2>\n\n\n\n<p>Now we understand that RAG gives the model <em>knowledge<\/em>, but not arm to take actions like playing music or sending email. To let it perform tasks, <strong>LangChain <\/strong>introduces <strong>Agents<\/strong> that can <em>reason<\/em> and then <em>act<\/em> using tools.<\/p>\n\n\n\n<p>Example: a simple agent that can use a calculator, Wikipedia, and a method that reverse given text.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python\"># pip install langchain langchain-ollama wikipedia\n\nfrom langchain.agents import initialize_agent, load_tools, AgentType, Tool\nfrom langchain_ollama import ChatOllama\n\n# === Step 1: Initialize the local Ollama model ===\nllm = ChatOllama(model=\"llama3\", temperature=0)\n\n# === Step 2: Load built-in tools ===\ntools = load_tools([\"wikipedia\", \"llm-math\"], llm=llm)\n\n# === Step 3: Define a custom tool ===\ndef reverse_text(text: str) -&gt; str:\n&nbsp; &nbsp; \"\"\"\n&nbsp; &nbsp; A simple custom tool that reverses a string.\n&nbsp; &nbsp; You can replace this with any real logic you want \u2014 e.g., playing music, reading files, etc.\n&nbsp; &nbsp; \"\"\"\n&nbsp; &nbsp; return text[::-1]\n\ncustom_tool = Tool(\n&nbsp; &nbsp; name=\"ReverseText\",\n&nbsp; &nbsp; func=reverse_text,\n&nbsp; &nbsp; description=\"Reverses the given text string.\"\n)\n\n# === Step 4: Add the custom tool to the list ===\ntools.append(custom_tool)\n\n# === Step 5: Initialize the agent ===\nagent = initialize_agent(\n&nbsp; &nbsp; tools,\n&nbsp; &nbsp; llm,\n&nbsp; &nbsp; agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n&nbsp; &nbsp; verbose=True\n)\n\n# === Step 6: Run an example query ===\nresponse = agent.invoke({\n&nbsp; &nbsp; \"input\": \"Find the year Einstein was born, multiply it by 2, and then reverse that number as text.\"\n})\n\nprint(\"\\nFinal Answer:\")\nprint(response[\"output\"])\n<\/code><\/pre>\n\n\n\n<p><strong>Output<\/strong>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\u201cAlbert Einstein was born in 1879. 1879 \u00d7 2 = 3758. And the reverse of the result is 8573.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>The agent looked up Einstein\u2019s birth year via Wikipedia, used the calculator tool and combined both into a reasoning chain, finally using reverse method it reversed the result . That\u2019s a small taste of <strong>agentic reasoning<\/strong>. The model doesn\u2019t just answer, it <em>decides how<\/em> to answer. First it retrieves related information from Wikipedia, perform calculation on the birthday, then he decide to use reverse tool to generate final result.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">MCP (Model Context Protocol)<\/h2>\n\n\n\n<p>The problem with <strong>Langchain agentic tools<\/strong> is that we have to work only with local functions defined in the same project and language that LLM is working. The MCP came to solve this issue. Think of MCP as an open protocol that standardizes how LLMs can use external tools. The MCP server communicates with LLMs, telling them which tools are available and how each one can be used, it exposes the methods&#8217; name and signature. The <a href=\"https:\/\/pypi.org\/project\/langchain-mcp-adapters\/\"><code><strong>langchain-mcp-adapters<\/strong><\/code><\/a> package lets LangChain agents connect directly to <strong>MCP servers<\/strong>, <strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">local or remote<\/mark><\/strong>, retrieve and use their tools automatically. Using LangChain MCP adapter Our model can use any MCP tool, even if it\u2019s written in another language or running remotely. Let\u2019s build it step-by-step.<\/p>\n\n\n\n<p>Here I wrote a MCP server which is running on <strong>stdio<\/strong>. The <strong>stdio<\/strong> MCP server servers local requests. So only local LLM application can make use of it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python\"># This is math_mcp_server.py file\n# run pip install mcp in command line to install mcp package\nfrom mcp.server.fastmcp import FastMCP\n\nmcp = FastMCP(\"Math\")\n\n@mcp.tool()\ndef add(a: int, b: int) -&gt; int:\n    \"\"\"Add two numbers.\"\"\"\n    return a + b\n\n@mcp.tool()\ndef multiply(a: int, b: int) -&gt; int:\n    \"\"\"Multiply two numbers.\"\"\"\n    return a * b\n\nif __name__ == \"__main__\":\n    mcp.run(transport=\"stdio\")\n<\/code><\/pre>\n\n\n\n<p>I&#8217;ve wrote another MCP server which serves on <strong>streamable-http<\/strong>. The MCP servers running on <strong>streamable-http<\/strong> can server all LLMs in the network:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python\"># This is weather_mcp_serve.py\nfrom mcp.server.fastmcp import FastMCP\n\nmcp = FastMCP(\"Weather\")\n\n@mcp.tool()\nasync def get_weather(location: str) -&gt; str:\n    \"\"\"Return simple fake weather info.\"\"\"\n    return f\"The weather in {location} is sunny and 25\u00b0C.\"\n\nif __name__ == \"__main__\":\n    mcp.run(transport=\"streamable-http\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>LangChain Agent that Connects to Both MCP Servers:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"python\" class=\"language-python\"># pip install langchain-mcp-adapters langchain langchain-ollama langgraph\n\nimport asyncio\nfrom langchain_mcp_adapters.client import MultiServerMCPClient\nfrom langgraph.prebuilt import create_react_agent\nfrom langchain_ollama import ChatOllama\n\nasync def main():\n    client = MultiServerMCPClient({\n        \"math\": {\n            \"transport\": \"stdio\",\n            \"command\": \"python\",\n            \"args\": [\"math_server.py\"],\n        },\n        \"weather\": {\n            \"transport\": \"streamable_http\",\n            \"url\": \"http:\/\/localhost:8000\/mcp\",\n        },\n    })\n\n    tools = await client.get_tools()\n    llm = ChatOllama(model=\"gpt-oss:120b-cloud\", temperature=0)\n    agent = create_react_agent(\n        llm,\n        tools\n    )\n\n    math_result = await agent.ainvoke({\n        \"messages\": [{\"role\": \"user\", \"content\": \"What is (3 + 5) x 12?\"}]\n    })\n    print(\"Math:\", math_result)\n\n    weather_result = await agent.ainvoke({\n        \"messages\": [{\"role\": \"user\", \"messages\": \"What's the weather in Paris?\"}]\n    })\n    print(\"Weather:\", weather_result)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n<\/code><\/pre>\n\n\n\n<p>It first creates a <strong><code>MultiServerMCPClient<\/code> <\/strong>that connects to two tools \u2014 a local <code>math_server.py<\/code> via stdio and a remote weather service via HTTP. Then it retrieves available tools and initializes a <strong>ReAct agent<\/strong> (<code>create_react_agent<\/code>) powered by the <strong>ChatOllama<\/strong> large language model (<code><strong>gpt-oss:120b-cloud<\/strong><\/code>).  Then it dynamically discovers their tools and chooses the right tool based on our prompt. No hardcoding!<\/p>\n\n\n\n<p>With <strong>MCP<\/strong>, tools become <em>modular and shareable<\/em>, any <strong>LLM <\/strong>that speaks <strong>MCP <\/strong>can use them. And with <strong>LangChain<\/strong>, your agent can orchestrate all of that, locally or remotely.<\/p>\n\n\n\n<p>Thanks for reading the post. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>After working with Large Language Models (LLMs), I was amazed by their reasoning power, but what really excited me was making LLM even more useful &hellip; <\/p>\n","protected":false},"author":1,"featured_media":2250,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[290,1,320,28,41],"tags":[326,327,324,329,328,321,325,322,323,39],"_links":{"self":[{"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/posts\/2237"}],"collection":[{"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/comments?post=2237"}],"version-history":[{"count":4,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/posts\/2237\/revisions"}],"predecessor-version":[{"id":2326,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/posts\/2237\/revisions\/2326"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/media\/2250"}],"wp:attachment":[{"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/media?parent=2237"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/categories?post=2237"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mshaeri.com\/blog\/wp-json\/wp\/v2\/tags?post=2237"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}