[{"data":1,"prerenderedAt":1110},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fen\u002Fai\u002Fmulti-agent-orchestration-single-llm-call":13},{"i18nKey":4,"paths":5},"ai-008-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Fmulti-agent-orchestrierung-llm-aufrufe-systeme","\u002Fen\u002Fai\u002Fmulti-agent-orchestration-single-llm-call","\u002Fes\u002Fai\u002Forquestracion-multi-agente","\u002Ffr\u002Fai\u002Fmulti-agent-orchestration-systemes","\u002Fit\u002Fai\u002Fmulti-agent-orchestration-sistemi","\u002Fru\u002Fai\u002Fmulti-agent-orchestration-llm","\u002Ftr\u002Fai\u002Fmulti-agent-orchestration-tek-llm-cagrisindan-sistemlere",{"_path":7,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1104,"_id":1105,"_source":1106,"_file":1107,"_stem":1108,"_extension":1109},"ai",false,"","Multi-Agent Orchestration: From Single LLM Call to Production Systems","Agent SDKs, tool use, and parallel\u002Fserial topologies transform LLMs into production infrastructure — managing latency, cost, and reliability tradeoffs.","2026-05-23",[21,22,23,24,25],"multi-agent","llm-orchestration","tool-use","agent-sdk","ai-engineering",8,"Roibase",{"type":29,"children":30,"toc":1094},"root",[31,39,46,51,56,61,231,236,242,262,272,289,305,312,419,424,430,442,667,672,682,859,864,870,875,883,923,931,962,972,982,988,993,1001,1006,1018,1024,1036,1041,1074,1079,1083,1088],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","In 2024, \"AI assistant\" meant a single prompt-response loop. In 2026, what's running in production is different: parallel agent meshes, serial orchestration pipelines, agents wired to external systems via tool use. Moving from a single LLM call to a system of agents signaling each other rewrites the reliability and cost\u002Flatency balance. Multi-agent orchestration is the architectural layer that transforms the LLM into a production infrastructure component.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"agent-sdks-and-the-tool-use-layer",[44],{"type":37,"value":45},"Agent SDKs and the Tool Use Layer",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"Agent frameworks — LangGraph, Autogen, CrewAI — give the LLM permission to \"call functions.\" Tool use is the model converting its own output into function calls conforming to JSON schema, the interpreter executing that function, and feeding the result back into the prompt. OpenAI function calling, Anthropic Claude's tool use API, Google Gemini's function declaration all rest on the same principle: the LLM cannot run deterministic code, but it can decide which function to call with which parameters.",{"type":32,"tag":33,"props":52,"children":53},{},[54],{"type":37,"value":55},"SDKs manage this loop: user query arrives, model says \"call the weather API with city=Istanbul,\" orchestrator invokes the API, appends the result to the prompt, model produces final output. That's 3 roundtrips = 3× latency. In production, a tool call chain can reach 5–7 steps; each adding 200–800ms means 1–5 seconds total response time. In multi-agent systems, the goal is breaking that latency through parallelization and caching.",{"type":32,"tag":33,"props":57,"children":58},{},[59],{"type":37,"value":60},"Example tool definition:",{"type":32,"tag":62,"props":63,"children":67},"pre",{"code":64,"language":65,"meta":16,"className":66,"style":16},"tools = [\n    {\n        \"name\": \"query_analytics\",\n        \"description\": \"Fetch specified metric from BigQuery\",\n        \"parameters\": {\n            \"metric\": \"string (revenue|sessions|conversions)\",\n            \"date_range\": \"string (7d|30d|90d)\"\n        }\n    }\n]\n","python","language-python shiki shiki-themes github-dark",[68],{"type":32,"tag":69,"props":70,"children":71},"code",{"__ignoreMap":16},[72,95,104,129,151,165,187,205,213,222],{"type":32,"tag":73,"props":74,"children":77},"span",{"class":75,"line":76},"line",1,[78,84,90],{"type":32,"tag":73,"props":79,"children":81},{"style":80},"--shiki-default:#E1E4E8",[82],{"type":37,"value":83},"tools ",{"type":32,"tag":73,"props":85,"children":87},{"style":86},"--shiki-default:#F97583",[88],{"type":37,"value":89},"=",{"type":32,"tag":73,"props":91,"children":92},{"style":80},[93],{"type":37,"value":94}," [\n",{"type":32,"tag":73,"props":96,"children":98},{"class":75,"line":97},2,[99],{"type":32,"tag":73,"props":100,"children":101},{"style":80},[102],{"type":37,"value":103},"    {\n",{"type":32,"tag":73,"props":105,"children":107},{"class":75,"line":106},3,[108,114,119,124],{"type":32,"tag":73,"props":109,"children":111},{"style":110},"--shiki-default:#9ECBFF",[112],{"type":37,"value":113},"        \"name\"",{"type":32,"tag":73,"props":115,"children":116},{"style":80},[117],{"type":37,"value":118},": ",{"type":32,"tag":73,"props":120,"children":121},{"style":110},[122],{"type":37,"value":123},"\"query_analytics\"",{"type":32,"tag":73,"props":125,"children":126},{"style":80},[127],{"type":37,"value":128},",\n",{"type":32,"tag":73,"props":130,"children":132},{"class":75,"line":131},4,[133,138,142,147],{"type":32,"tag":73,"props":134,"children":135},{"style":110},[136],{"type":37,"value":137},"        \"description\"",{"type":32,"tag":73,"props":139,"children":140},{"style":80},[141],{"type":37,"value":118},{"type":32,"tag":73,"props":143,"children":144},{"style":110},[145],{"type":37,"value":146},"\"Fetch specified metric from BigQuery\"",{"type":32,"tag":73,"props":148,"children":149},{"style":80},[150],{"type":37,"value":128},{"type":32,"tag":73,"props":152,"children":154},{"class":75,"line":153},5,[155,160],{"type":32,"tag":73,"props":156,"children":157},{"style":110},[158],{"type":37,"value":159},"        \"parameters\"",{"type":32,"tag":73,"props":161,"children":162},{"style":80},[163],{"type":37,"value":164},": {\n",{"type":32,"tag":73,"props":166,"children":168},{"class":75,"line":167},6,[169,174,178,183],{"type":32,"tag":73,"props":170,"children":171},{"style":110},[172],{"type":37,"value":173},"            \"metric\"",{"type":32,"tag":73,"props":175,"children":176},{"style":80},[177],{"type":37,"value":118},{"type":32,"tag":73,"props":179,"children":180},{"style":110},[181],{"type":37,"value":182},"\"string (revenue|sessions|conversions)\"",{"type":32,"tag":73,"props":184,"children":185},{"style":80},[186],{"type":37,"value":128},{"type":32,"tag":73,"props":188,"children":190},{"class":75,"line":189},7,[191,196,200],{"type":32,"tag":73,"props":192,"children":193},{"style":110},[194],{"type":37,"value":195},"            \"date_range\"",{"type":32,"tag":73,"props":197,"children":198},{"style":80},[199],{"type":37,"value":118},{"type":32,"tag":73,"props":201,"children":202},{"style":110},[203],{"type":37,"value":204},"\"string (7d|30d|90d)\"\n",{"type":32,"tag":73,"props":206,"children":207},{"class":75,"line":26},[208],{"type":32,"tag":73,"props":209,"children":210},{"style":80},[211],{"type":37,"value":212},"        }\n",{"type":32,"tag":73,"props":214,"children":216},{"class":75,"line":215},9,[217],{"type":32,"tag":73,"props":218,"children":219},{"style":80},[220],{"type":37,"value":221},"    }\n",{"type":32,"tag":73,"props":223,"children":225},{"class":75,"line":224},10,[226],{"type":32,"tag":73,"props":227,"children":228},{"style":80},[229],{"type":37,"value":230},"]\n",{"type":32,"tag":33,"props":232,"children":233},{},[234],{"type":37,"value":235},"When the model decides to use this tool, the orchestrator invokes the BigQuery client, appends the result to the prompt, and the model produces final synthesis. Tool use's power: the LLM can query the external world without sacrificing determinism.",{"type":32,"tag":40,"props":237,"children":239},{"id":238},"parallel-and-serial-agent-topologies",[240],{"type":37,"value":241},"Parallel and Serial Agent Topologies",{"type":32,"tag":33,"props":243,"children":244},{},[245,247,253,255,260],{"type":37,"value":246},"Single agent = serial processing. Multi-agent = parallel + serial mix. Two primary patterns: ",{"type":32,"tag":248,"props":249,"children":250},"strong",{},[251],{"type":37,"value":252},"scatter-gather",{"type":37,"value":254}," and ",{"type":32,"tag":248,"props":256,"children":257},{},[258],{"type":37,"value":259},"pipeline",{"type":37,"value":261},".",{"type":32,"tag":33,"props":263,"children":264},{},[265,270],{"type":32,"tag":248,"props":266,"children":267},{},[268],{"type":37,"value":269},"Scatter-gather:",{"type":37,"value":271}," The main orchestrator splits the task into 3 sub-agents; each runs simultaneously with a different tool; results merge at a central agent. Example: \"Analyze last month's campaign performance\" → agent_1 hits Google Ads API, agent_2 hits Meta Ads API, agent_3 hits BigQuery, all in parallel. Orchestrator collects the 3 responses, synthesizes, delivers final report. Latency: max(agent_1, agent_2, agent_3) + synthesis latency. If serial: agent_1 + agent_2 + agent_3 + synthesis. Instead of 3×800ms, you get 800ms + 300ms = 1.1s.",{"type":32,"tag":33,"props":273,"children":274},{},[275,280,282,287],{"type":32,"tag":248,"props":276,"children":277},{},[278],{"type":37,"value":279},"Pipeline:",{"type":37,"value":281}," Agent_A's output is agent_B's input. Example: (1) query planner agent writes SQL → (2) execution agent runs the SQL → (3) visualization agent produces chart spec. Each step depends on the next. Latency is serial, but ",{"type":32,"tag":248,"props":283,"children":284},{},[285],{"type":37,"value":286},"each agent is specialized",{"type":37,"value":288}," — the query planner can be a small model (GPT-4o-mini, 50ms), doesn't need heavy reasoning; visualization agent can use Gemini Flash. Instead of one large model, three small models = cheaper + faster (in many cases).",{"type":32,"tag":33,"props":290,"children":291},{},[292,294,303],{"type":37,"value":293},"In Roibase's ",{"type":32,"tag":295,"props":296,"children":300},"a",{"href":297,"rel":298},"https:\u002F\u002Fwww.roibase.com.tr\u002Fen\u002Ffirstparty",[299],"nofollow",[301],{"type":37,"value":302},"First-Party Data & Measurement Architecture",{"type":37,"value":304}," service, we use multi-agent orchestration in attribution pipelines: one agent parses raw events, one binds them to sessions, one maps revenue, final agent computes cross-channel attribution. Pipeline topology = deterministic steps, each with its own tool set.",{"type":32,"tag":306,"props":307,"children":309},"h3",{"id":308},"parallel-vs-serial-tradeoff",[310],{"type":37,"value":311},"Parallel vs. Serial Tradeoff",{"type":32,"tag":313,"props":314,"children":315},"table",{},[316,345],{"type":32,"tag":317,"props":318,"children":319},"thead",{},[320],{"type":32,"tag":321,"props":322,"children":323},"tr",{},[324,330,335,340],{"type":32,"tag":325,"props":326,"children":327},"th",{},[328],{"type":37,"value":329},"Topology",{"type":32,"tag":325,"props":331,"children":332},{},[333],{"type":37,"value":334},"Latency",{"type":32,"tag":325,"props":336,"children":337},{},[338],{"type":37,"value":339},"Cost",{"type":32,"tag":325,"props":341,"children":342},{},[343],{"type":37,"value":344},"Use Case",{"type":32,"tag":346,"props":347,"children":348},"tbody",{},[349,373,396],{"type":32,"tag":321,"props":350,"children":351},{},[352,358,363,368],{"type":32,"tag":353,"props":354,"children":355},"td",{},[356],{"type":37,"value":357},"Parallel (scatter-gather)",{"type":32,"tag":353,"props":359,"children":360},{},[361],{"type":37,"value":362},"Low (max operation time)",{"type":32,"tag":353,"props":364,"children":365},{},[366],{"type":37,"value":367},"High (N agents × LLM call)",{"type":32,"tag":353,"props":369,"children":370},{},[371],{"type":37,"value":372},"Independent queries (multi-source data pull)",{"type":32,"tag":321,"props":374,"children":375},{},[376,381,386,391],{"type":32,"tag":353,"props":377,"children":378},{},[379],{"type":37,"value":380},"Serial (pipeline)",{"type":32,"tag":353,"props":382,"children":383},{},[384],{"type":37,"value":385},"High (total time)",{"type":32,"tag":353,"props":387,"children":388},{},[389],{"type":37,"value":390},"Medium (each agent can be small model)",{"type":32,"tag":353,"props":392,"children":393},{},[394],{"type":37,"value":395},"Dependent operations (parse → enrich → analyze)",{"type":32,"tag":321,"props":397,"children":398},{},[399,404,409,414],{"type":32,"tag":353,"props":400,"children":401},{},[402],{"type":37,"value":403},"Hybrid (parallel → merge → serial)",{"type":32,"tag":353,"props":405,"children":406},{},[407],{"type":37,"value":408},"Medium",{"type":32,"tag":353,"props":410,"children":411},{},[412],{"type":37,"value":413},"Medium-High",{"type":32,"tag":353,"props":415,"children":416},{},[417],{"type":37,"value":418},"Complex tasks (data gathering parallel, result processing serial)",{"type":32,"tag":33,"props":420,"children":421},{},[422],{"type":37,"value":423},"In production, we cap concurrency on scatter-gather to avoid rate limits (e.g., max 5 parallel LLM calls). On serial pipelines, we use intermediate caching — if agent_A's output is valid for 10 minutes, the same query sends agent_B directly from the cached output.",{"type":32,"tag":40,"props":425,"children":427},{"id":426},"the-orchestrators-job-routing-and-error-handling",[428],{"type":37,"value":429},"The Orchestrator's Job: Routing and Error Handling",{"type":32,"tag":33,"props":431,"children":432},{},[433,435,440],{"type":37,"value":434},"The orchestrator doesn't just trigger agents; it ",{"type":32,"tag":248,"props":436,"children":437},{},[438],{"type":37,"value":439},"decides which agent owns which task",{"type":37,"value":441},". In LangGraph, this is called the \"supervisor agent\": it categorizes the incoming query and routes accordingly. Example logic:",{"type":32,"tag":62,"props":443,"children":445},{"code":444,"language":65,"meta":16,"className":66,"style":16},"def route_query(user_query: str) -> str:\n    # LLM-based router (small model, fast)\n    classification = llm.classify(user_query, categories=[\"data_query\", \"content_gen\", \"code_review\"])\n    \n    if classification == \"data_query\":\n        return \"analytics_agent\"\n    elif classification == \"content_gen\":\n        return \"writer_agent\"\n    else:\n        return \"code_agent\"\n",[446],{"type":32,"tag":69,"props":447,"children":448},{"__ignoreMap":16},[449,488,497,558,566,593,606,631,643,655],{"type":32,"tag":73,"props":450,"children":451},{"class":75,"line":76},[452,457,463,468,474,479,483],{"type":32,"tag":73,"props":453,"children":454},{"style":86},[455],{"type":37,"value":456},"def",{"type":32,"tag":73,"props":458,"children":460},{"style":459},"--shiki-default:#B392F0",[461],{"type":37,"value":462}," route_query",{"type":32,"tag":73,"props":464,"children":465},{"style":80},[466],{"type":37,"value":467},"(user_query: ",{"type":32,"tag":73,"props":469,"children":471},{"style":470},"--shiki-default:#79B8FF",[472],{"type":37,"value":473},"str",{"type":32,"tag":73,"props":475,"children":476},{"style":80},[477],{"type":37,"value":478},") -> ",{"type":32,"tag":73,"props":480,"children":481},{"style":470},[482],{"type":37,"value":473},{"type":32,"tag":73,"props":484,"children":485},{"style":80},[486],{"type":37,"value":487},":\n",{"type":32,"tag":73,"props":489,"children":490},{"class":75,"line":97},[491],{"type":32,"tag":73,"props":492,"children":494},{"style":493},"--shiki-default:#6A737D",[495],{"type":37,"value":496},"    # LLM-based router (small model, fast)\n",{"type":32,"tag":73,"props":498,"children":499},{"class":75,"line":106},[500,505,509,514,520,524,529,534,539,544,548,553],{"type":32,"tag":73,"props":501,"children":502},{"style":80},[503],{"type":37,"value":504},"    classification ",{"type":32,"tag":73,"props":506,"children":507},{"style":86},[508],{"type":37,"value":89},{"type":32,"tag":73,"props":510,"children":511},{"style":80},[512],{"type":37,"value":513}," llm.classify(user_query, ",{"type":32,"tag":73,"props":515,"children":517},{"style":516},"--shiki-default:#FFAB70",[518],{"type":37,"value":519},"categories",{"type":32,"tag":73,"props":521,"children":522},{"style":86},[523],{"type":37,"value":89},{"type":32,"tag":73,"props":525,"children":526},{"style":80},[527],{"type":37,"value":528},"[",{"type":32,"tag":73,"props":530,"children":531},{"style":110},[532],{"type":37,"value":533},"\"data_query\"",{"type":32,"tag":73,"props":535,"children":536},{"style":80},[537],{"type":37,"value":538},", ",{"type":32,"tag":73,"props":540,"children":541},{"style":110},[542],{"type":37,"value":543},"\"content_gen\"",{"type":32,"tag":73,"props":545,"children":546},{"style":80},[547],{"type":37,"value":538},{"type":32,"tag":73,"props":549,"children":550},{"style":110},[551],{"type":37,"value":552},"\"code_review\"",{"type":32,"tag":73,"props":554,"children":555},{"style":80},[556],{"type":37,"value":557},"])\n",{"type":32,"tag":73,"props":559,"children":560},{"class":75,"line":131},[561],{"type":32,"tag":73,"props":562,"children":563},{"style":80},[564],{"type":37,"value":565},"    \n",{"type":32,"tag":73,"props":567,"children":568},{"class":75,"line":153},[569,574,579,584,589],{"type":32,"tag":73,"props":570,"children":571},{"style":86},[572],{"type":37,"value":573},"    if",{"type":32,"tag":73,"props":575,"children":576},{"style":80},[577],{"type":37,"value":578}," classification ",{"type":32,"tag":73,"props":580,"children":581},{"style":86},[582],{"type":37,"value":583},"==",{"type":32,"tag":73,"props":585,"children":586},{"style":110},[587],{"type":37,"value":588}," \"data_query\"",{"type":32,"tag":73,"props":590,"children":591},{"style":80},[592],{"type":37,"value":487},{"type":32,"tag":73,"props":594,"children":595},{"class":75,"line":167},[596,601],{"type":32,"tag":73,"props":597,"children":598},{"style":86},[599],{"type":37,"value":600},"        return",{"type":32,"tag":73,"props":602,"children":603},{"style":110},[604],{"type":37,"value":605}," \"analytics_agent\"\n",{"type":32,"tag":73,"props":607,"children":608},{"class":75,"line":189},[609,614,618,622,627],{"type":32,"tag":73,"props":610,"children":611},{"style":86},[612],{"type":37,"value":613},"    elif",{"type":32,"tag":73,"props":615,"children":616},{"style":80},[617],{"type":37,"value":578},{"type":32,"tag":73,"props":619,"children":620},{"style":86},[621],{"type":37,"value":583},{"type":32,"tag":73,"props":623,"children":624},{"style":110},[625],{"type":37,"value":626}," \"content_gen\"",{"type":32,"tag":73,"props":628,"children":629},{"style":80},[630],{"type":37,"value":487},{"type":32,"tag":73,"props":632,"children":633},{"class":75,"line":26},[634,638],{"type":32,"tag":73,"props":635,"children":636},{"style":86},[637],{"type":37,"value":600},{"type":32,"tag":73,"props":639,"children":640},{"style":110},[641],{"type":37,"value":642}," \"writer_agent\"\n",{"type":32,"tag":73,"props":644,"children":645},{"class":75,"line":215},[646,651],{"type":32,"tag":73,"props":647,"children":648},{"style":86},[649],{"type":37,"value":650},"    else",{"type":32,"tag":73,"props":652,"children":653},{"style":80},[654],{"type":37,"value":487},{"type":32,"tag":73,"props":656,"children":657},{"class":75,"line":224},[658,662],{"type":32,"tag":73,"props":659,"children":660},{"style":86},[661],{"type":37,"value":600},{"type":32,"tag":73,"props":663,"children":664},{"style":110},[665],{"type":37,"value":666}," \"code_agent\"\n",{"type":32,"tag":33,"props":668,"children":669},{},[670],{"type":37,"value":671},"The router agent is typically a fast, cheap model like GPT-4o-mini or Claude Haiku. It adds 50–100ms overhead but prevents unnecessary use of large models. If the user says \"summarize campaign performance,\" it routes to analytics_agent (with BigQuery tool use); if \"write a blog post,\" to writer_agent (with web search + writing LLM).",{"type":32,"tag":33,"props":673,"children":674},{},[675,680],{"type":32,"tag":248,"props":676,"children":677},{},[678],{"type":37,"value":679},"Error handling is critical in multi-agent.",{"type":37,"value":681}," With a single agent, if the LLM hallucinates, you retry. In multi-agent, if agent_2 works with agent_1's faulty output, you get cascade failure. The orchestrator must validate each agent's output:",{"type":32,"tag":62,"props":683,"children":685},{"code":684,"language":65,"meta":16,"className":66,"style":16},"def validate_agent_output(output: dict, schema: dict) -> bool:\n    # JSON schema validation\n    if not matches_schema(output, schema):\n        raise AgentOutputError(\"Agent output does not match schema\")\n    \n    # Semantic check (optional, expensive)\n    if confidence_score(output) \u003C 0.7:\n        return False  # retry or fallback\n    \n    return True\n",[686],{"type":32,"tag":69,"props":687,"children":688},{"__ignoreMap":16},[689,733,741,758,781,788,796,822,839,846],{"type":32,"tag":73,"props":690,"children":691},{"class":75,"line":76},[692,696,701,706,711,716,720,724,729],{"type":32,"tag":73,"props":693,"children":694},{"style":86},[695],{"type":37,"value":456},{"type":32,"tag":73,"props":697,"children":698},{"style":459},[699],{"type":37,"value":700}," validate_agent_output",{"type":32,"tag":73,"props":702,"children":703},{"style":80},[704],{"type":37,"value":705},"(output: ",{"type":32,"tag":73,"props":707,"children":708},{"style":470},[709],{"type":37,"value":710},"dict",{"type":32,"tag":73,"props":712,"children":713},{"style":80},[714],{"type":37,"value":715},", schema: ",{"type":32,"tag":73,"props":717,"children":718},{"style":470},[719],{"type":37,"value":710},{"type":32,"tag":73,"props":721,"children":722},{"style":80},[723],{"type":37,"value":478},{"type":32,"tag":73,"props":725,"children":726},{"style":470},[727],{"type":37,"value":728},"bool",{"type":32,"tag":73,"props":730,"children":731},{"style":80},[732],{"type":37,"value":487},{"type":32,"tag":73,"props":734,"children":735},{"class":75,"line":97},[736],{"type":32,"tag":73,"props":737,"children":738},{"style":493},[739],{"type":37,"value":740},"    # JSON schema validation\n",{"type":32,"tag":73,"props":742,"children":743},{"class":75,"line":106},[744,748,753],{"type":32,"tag":73,"props":745,"children":746},{"style":86},[747],{"type":37,"value":573},{"type":32,"tag":73,"props":749,"children":750},{"style":86},[751],{"type":37,"value":752}," not",{"type":32,"tag":73,"props":754,"children":755},{"style":80},[756],{"type":37,"value":757}," matches_schema(output, schema):\n",{"type":32,"tag":73,"props":759,"children":760},{"class":75,"line":131},[761,766,771,776],{"type":32,"tag":73,"props":762,"children":763},{"style":86},[764],{"type":37,"value":765},"        raise",{"type":32,"tag":73,"props":767,"children":768},{"style":80},[769],{"type":37,"value":770}," AgentOutputError(",{"type":32,"tag":73,"props":772,"children":773},{"style":110},[774],{"type":37,"value":775},"\"Agent output does not match schema\"",{"type":32,"tag":73,"props":777,"children":778},{"style":80},[779],{"type":37,"value":780},")\n",{"type":32,"tag":73,"props":782,"children":783},{"class":75,"line":153},[784],{"type":32,"tag":73,"props":785,"children":786},{"style":80},[787],{"type":37,"value":565},{"type":32,"tag":73,"props":789,"children":790},{"class":75,"line":167},[791],{"type":32,"tag":73,"props":792,"children":793},{"style":493},[794],{"type":37,"value":795},"    # Semantic check (optional, expensive)\n",{"type":32,"tag":73,"props":797,"children":798},{"class":75,"line":189},[799,803,808,813,818],{"type":32,"tag":73,"props":800,"children":801},{"style":86},[802],{"type":37,"value":573},{"type":32,"tag":73,"props":804,"children":805},{"style":80},[806],{"type":37,"value":807}," confidence_score(output) ",{"type":32,"tag":73,"props":809,"children":810},{"style":86},[811],{"type":37,"value":812},"\u003C",{"type":32,"tag":73,"props":814,"children":815},{"style":470},[816],{"type":37,"value":817}," 0.7",{"type":32,"tag":73,"props":819,"children":820},{"style":80},[821],{"type":37,"value":487},{"type":32,"tag":73,"props":823,"children":824},{"class":75,"line":26},[825,829,834],{"type":32,"tag":73,"props":826,"children":827},{"style":86},[828],{"type":37,"value":600},{"type":32,"tag":73,"props":830,"children":831},{"style":470},[832],{"type":37,"value":833}," False",{"type":32,"tag":73,"props":835,"children":836},{"style":493},[837],{"type":37,"value":838},"  # retry or fallback\n",{"type":32,"tag":73,"props":840,"children":841},{"class":75,"line":215},[842],{"type":32,"tag":73,"props":843,"children":844},{"style":80},[845],{"type":37,"value":565},{"type":32,"tag":73,"props":847,"children":848},{"class":75,"line":224},[849,854],{"type":32,"tag":73,"props":850,"children":851},{"style":86},[852],{"type":37,"value":853},"    return",{"type":32,"tag":73,"props":855,"children":856},{"style":470},[857],{"type":37,"value":858}," True\n",{"type":32,"tag":33,"props":860,"children":861},{},[862],{"type":37,"value":863},"If agent_1 fails, the orchestrator enters a fallback chain: first retry (1×), then alternative agent (larger model), then human-in-the-loop. Without this logic, multi-agent is unreliable in production.",{"type":32,"tag":40,"props":865,"children":867},{"id":866},"latency-and-cost-benchmark-scenarios",[868],{"type":37,"value":869},"Latency and Cost: Benchmark Scenarios",{"type":32,"tag":33,"props":871,"children":872},{},[873],{"type":37,"value":874},"Test scenario: \"Analyze revenue trend for the last 30 days, summarize campaign performance, write a summary email for the CEO\" — 3 independent tasks.",{"type":32,"tag":33,"props":876,"children":877},{},[878],{"type":32,"tag":248,"props":879,"children":880},{},[881],{"type":37,"value":882},"Single agent (GPT-4, serial):",{"type":32,"tag":884,"props":885,"children":886},"ul",{},[887,893,898,903,913],{"type":32,"tag":888,"props":889,"children":890},"li",{},[891],{"type":37,"value":892},"Query BigQuery → 800ms (LLM + API)",{"type":32,"tag":888,"props":894,"children":895},{},[896],{"type":37,"value":897},"Query ad platforms → 900ms",{"type":32,"tag":888,"props":899,"children":900},{},[901],{"type":37,"value":902},"Generate email → 600ms",{"type":32,"tag":888,"props":904,"children":905},{},[906,911],{"type":32,"tag":248,"props":907,"children":908},{},[909],{"type":37,"value":910},"Total:",{"type":37,"value":912}," 2300ms",{"type":32,"tag":888,"props":914,"children":915},{},[916,921],{"type":32,"tag":248,"props":917,"children":918},{},[919],{"type":37,"value":920},"Cost:",{"type":37,"value":922}," 3 turns × $0.03\u002F1K tokens = ~$0.09 (typical input\u002Foutput mix)",{"type":32,"tag":33,"props":924,"children":925},{},[926],{"type":32,"tag":248,"props":927,"children":928},{},[929],{"type":37,"value":930},"Multi-agent (scatter-gather + pipeline):",{"type":32,"tag":884,"props":932,"children":933},{},[934,939,944,953],{"type":32,"tag":888,"props":935,"children":936},{},[937],{"type":37,"value":938},"Agents 1, 2, 3 in parallel (BigQuery, ads, email prep) → max 900ms",{"type":32,"tag":888,"props":940,"children":941},{},[942],{"type":37,"value":943},"Orchestrator merge + synthesis → 400ms",{"type":32,"tag":888,"props":945,"children":946},{},[947,951],{"type":32,"tag":248,"props":948,"children":949},{},[950],{"type":37,"value":910},{"type":37,"value":952}," 1300ms",{"type":32,"tag":888,"props":954,"children":955},{},[956,960],{"type":32,"tag":248,"props":957,"children":958},{},[959],{"type":37,"value":920},{"type":37,"value":961}," 3 agents × $0.02 (small model) + synthesis $0.03 = ~$0.09 (same, but reducible via model selection)",{"type":32,"tag":33,"props":963,"children":964},{},[965,970],{"type":32,"tag":248,"props":966,"children":967},{},[968],{"type":37,"value":969},"Gain:",{"type":37,"value":971}," 43% latency reduction. Cost is similar, but with model optimization (agent_1 → Gemini Flash, agent_2 → Claude Haiku, orchestrator → GPT-4o-mini), it drops to $0.05.",{"type":32,"tag":33,"props":973,"children":974},{},[975,980],{"type":32,"tag":248,"props":976,"children":977},{},[978],{"type":37,"value":979},"But:",{"type":37,"value":981}," Parallel agents consume parallel rate limits. If OpenAI tier allows 500 RPM, 10 parallel agents = 50 users in 5 minutes. With a single agent, you'd serve 500 users. In production, we manage this via queuing + caching.",{"type":32,"tag":40,"props":983,"children":985},{"id":984},"observability-and-debugging",[986],{"type":37,"value":987},"Observability and Debugging",{"type":32,"tag":33,"props":989,"children":990},{},[991],{"type":37,"value":992},"In multi-agent systems, answering \"where did it break?\" is hard. Tools like LangSmith, Helicone, and Arize Phoenix visualize agent traces: which agent called which tool when, with which prompt, what it returned, where it retried. Example trace:",{"type":32,"tag":62,"props":994,"children":996},{"code":995},"orchestrator → classify_query (50ms, GPT-4o-mini) → \"data_query\"\n→ analytics_agent → query_bigquery (800ms, tool_call) → success\n→ writer_agent → generate_summary (600ms, GPT-4) → success\n→ orchestrator → merge_results (200ms) → final_output\n",[997],{"type":32,"tag":69,"props":998,"children":999},{"__ignoreMap":16},[1000],{"type":37,"value":995},{"type":32,"tag":33,"props":1002,"children":1003},{},[1004],{"type":37,"value":1005},"Each step logs token count, latency, and cost. Without this telemetry in production, multi-agent is impossible to debug. If agent A's tool call times out, you see it in the trace and add retry logic.",{"type":32,"tag":33,"props":1007,"children":1008},{},[1009,1011,1016],{"type":37,"value":1010},"Another metric: ",{"type":32,"tag":248,"props":1012,"children":1013},{},[1014],{"type":37,"value":1015},"agent utilization",{"type":37,"value":1017},". If you define 5 agents but 80% of user queries route to a single agent, your routing logic is broken. We measure the orchestrator's classification accuracy — building labeled datasets from user feedback and fine-tuning the router (moving from few-shot prompts to lightweight classifiers).",{"type":32,"tag":40,"props":1019,"children":1021},{"id":1020},"multi-agents-limits",[1022],{"type":37,"value":1023},"Multi-Agent's Limits",{"type":32,"tag":33,"props":1025,"children":1026},{},[1027,1029,1034],{"type":37,"value":1028},"Multi-agent doesn't solve everything. There's ",{"type":32,"tag":248,"props":1030,"children":1031},{},[1032],{"type":37,"value":1033},"coordination overhead",{"type":37,"value":1035},": inter-agent messaging, orchestration logic, error handling — all add latency. A simple query that a single agent completes in 1 second might take 1.5 seconds with multi-agent (routing + orchestrator + merge). Architectural complexity grows — larger codebase, harder to test, deployment more fragile.",{"type":32,"tag":33,"props":1037,"children":1038},{},[1039],{"type":37,"value":1040},"Multi-agent makes sense in these scenarios:",{"type":32,"tag":884,"props":1042,"children":1043},{},[1044,1054,1064],{"type":32,"tag":888,"props":1045,"children":1046},{},[1047,1052],{"type":32,"tag":248,"props":1048,"children":1049},{},[1050],{"type":37,"value":1051},"Parallel data pull required:",{"type":37,"value":1053}," Pulling from 5 different APIs benefits from scatter-gather",{"type":32,"tag":888,"props":1055,"children":1056},{},[1057,1062],{"type":32,"tag":248,"props":1058,"children":1059},{},[1060],{"type":37,"value":1061},"Specialized models are optimal:",{"type":37,"value":1063}," Query planning with a small model, code generation with a large one — pipeline topology cuts cost",{"type":32,"tag":888,"props":1065,"children":1066},{},[1067,1072],{"type":32,"tag":248,"props":1068,"children":1069},{},[1070],{"type":37,"value":1071},"Long-running tasks:",{"type":37,"value":1073}," Agent_1 starts work, agent_2 monitors async, agent_3 completes, orchestrator notifies — event-driven architecture beats synchronous LLM calls",{"type":32,"tag":33,"props":1075,"children":1076},{},[1077],{"type":37,"value":1078},"For short, frequent, simple queries, a single agent + caching is better. Multi-agent creates value when a complex task can be decomposed and optimized.",{"type":32,"tag":1080,"props":1081,"children":1082},"hr",{},[],{"type":32,"tag":33,"props":1084,"children":1085},{},[1086],{"type":37,"value":1087},"Multi-agent orchestration transforms the LLM from a stateless function call into a stateful, observable, scalable system. Parallel topology breaks latency, pipeline topology cuts cost, orchestrator ensures reliability. In production, start with scatter-gather, monitor rate limits and cost, move to pipelines as needed. Log agent traces, layer error handling, test routing logic. Multi-agent is the inflection point from LLM engineering to LLM infrastructure.",{"type":32,"tag":1089,"props":1090,"children":1091},"style",{},[1092],{"type":37,"value":1093},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":106,"depth":106,"links":1095},[1096,1097,1100,1101,1102,1103],{"id":42,"depth":97,"text":45},{"id":238,"depth":97,"text":241,"children":1098},[1099],{"id":308,"depth":106,"text":311},{"id":426,"depth":97,"text":429},{"id":866,"depth":97,"text":869},{"id":984,"depth":97,"text":987},{"id":1020,"depth":97,"text":1023},"markdown","content:en:ai:multi-agent-orchestration-single-llm-call.md","content","en\u002Fai\u002Fmulti-agent-orchestration-single-llm-call.md","en\u002Fai\u002Fmulti-agent-orchestration-single-llm-call","md",1780898611423]