[{"data":1,"prerenderedAt":1110},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fde\u002Fai\u002Fmulti-agent-orchestrierung-llm-aufrufe-systeme":13},{"i18nKey":4,"paths":5},"ai-008-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Fmulti-agent-orchestrierung-llm-aufrufe-systeme","\u002Fen\u002Fai\u002Fmulti-agent-orchestration-single-llm-call","\u002Fes\u002Fai\u002Forquestracion-multi-agente","\u002Ffr\u002Fai\u002Fmulti-agent-orchestration-systemes","\u002Fit\u002Fai\u002Fmulti-agent-orchestration-sistemi","\u002Fru\u002Fai\u002Fmulti-agent-orchestration-llm","\u002Ftr\u002Fai\u002Fmulti-agent-orchestration-tek-llm-cagrisindan-sistemlere",{"_path":6,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1104,"_id":1105,"_source":1106,"_file":1107,"_stem":1108,"_extension":1109},"ai",false,"","Multi-Agent-Orchestrierung: Von einzelnen LLM-Aufrufen zu Systemen","Agent-SDKs, Tool Use und parallele\u002Fserielle Topologien verwandeln LLMs in Production-Systeme — Latenz-, Cost- und Zuverlässigkeits-Tradeoffs.","2026-05-23",[21,22,23,24,25],"multi-agent","llm-orchestrierung","tool-use","agent-sdk","ai-engineering",9,"Roibase",{"type":29,"children":30,"toc":1094},"root",[31,39,46,51,56,61,231,236,242,262,272,289,305,312,419,424,430,442,667,672,682,859,864,870,875,883,923,931,962,972,982,988,993,1001,1006,1018,1024,1036,1041,1074,1079,1083,1088],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","2024 bedeutete „KI-Assistent\" noch: ein Prompt-Response-Zyklus. 2026 in Production ist anders: parallele Agent-Meshes, serielle Orchestrierungs-Pipelines, Agenten mit Tool Use an externe Systeme gebunden. Statt einzelner LLM-Aufrufe ein System von Agenten, die sich gegenseitig Signale schicken — das schreibt das Gleichgewicht zwischen Zuverlässigkeit und Cost\u002FLatenz neu. Multi-Agent-Orchestrierung ist die Architektur-Schicht, die das LLM vom Funktionsaufruf zum Production-Infrastructure-Element macht.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"agent-sdks-und-tool-use-schicht",[44],{"type":37,"value":45},"Agent-SDKs und Tool-Use-Schicht",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"Agent-Frameworks — LangGraph, Autogen, CrewAI — geben dem LLM die Berechtigung: „Du kannst Funktionen aufrufen.\" Tool Use bedeutet: Das Modell transformiert seine eigene Ausgabe in einen Function Call (JSON-Schema-konform), und der Interpreter führt diese Funktion aus und fügt das Ergebnis zurück ins Prompt ein. OpenAIs Function Calling, Anthropic Claude's Tool-Use API, Google Geminis Function Declaration folgen demselben Prinzip: LLMs können keinen deterministischen Code ausführen, aber sie können sagen, welche Funktion mit welchen Parametern aufgerufen werden soll.",{"type":32,"tag":33,"props":52,"children":53},{},[54],{"type":37,"value":55},"SDKs managen diesen Loop: Nutzer-Query kommt an, Modell sagt „rufe Wetter-API mit city=Berlin auf\", Orchestrator ruft API auf, fügt Antwort ins Prompt ein, Modell produziert finale Ausgabe. Diese 3 Roundtrips = 3× Latenz. In Production kann eine Tool-Call-Kette 5–7 Schritte lang sein, jeder addiert 200–800ms, zusammen 1–5 Sekunden Response Time. In Multi-Agent geht es darum, diese Latenz durch Parallelisierung und Caching zu brechen.",{"type":32,"tag":33,"props":57,"children":58},{},[59],{"type":37,"value":60},"Beispiel einer Tool-Definition:",{"type":32,"tag":62,"props":63,"children":67},"pre",{"code":64,"language":65,"meta":16,"className":66,"style":16},"tools = [\n    {\n        \"name\": \"query_analytics\",\n        \"description\": \"Metrik aus BigQuery abrufen\",\n        \"parameters\": {\n            \"metric\": \"string (revenue|sessions|conversions)\",\n            \"date_range\": \"string (7d|30d|90d)\"\n        }\n    }\n]\n","python","language-python shiki shiki-themes github-dark",[68],{"type":32,"tag":69,"props":70,"children":71},"code",{"__ignoreMap":16},[72,95,104,129,151,165,187,205,214,222],{"type":32,"tag":73,"props":74,"children":77},"span",{"class":75,"line":76},"line",1,[78,84,90],{"type":32,"tag":73,"props":79,"children":81},{"style":80},"--shiki-default:#E1E4E8",[82],{"type":37,"value":83},"tools ",{"type":32,"tag":73,"props":85,"children":87},{"style":86},"--shiki-default:#F97583",[88],{"type":37,"value":89},"=",{"type":32,"tag":73,"props":91,"children":92},{"style":80},[93],{"type":37,"value":94}," [\n",{"type":32,"tag":73,"props":96,"children":98},{"class":75,"line":97},2,[99],{"type":32,"tag":73,"props":100,"children":101},{"style":80},[102],{"type":37,"value":103},"    {\n",{"type":32,"tag":73,"props":105,"children":107},{"class":75,"line":106},3,[108,114,119,124],{"type":32,"tag":73,"props":109,"children":111},{"style":110},"--shiki-default:#9ECBFF",[112],{"type":37,"value":113},"        \"name\"",{"type":32,"tag":73,"props":115,"children":116},{"style":80},[117],{"type":37,"value":118},": ",{"type":32,"tag":73,"props":120,"children":121},{"style":110},[122],{"type":37,"value":123},"\"query_analytics\"",{"type":32,"tag":73,"props":125,"children":126},{"style":80},[127],{"type":37,"value":128},",\n",{"type":32,"tag":73,"props":130,"children":132},{"class":75,"line":131},4,[133,138,142,147],{"type":32,"tag":73,"props":134,"children":135},{"style":110},[136],{"type":37,"value":137},"        \"description\"",{"type":32,"tag":73,"props":139,"children":140},{"style":80},[141],{"type":37,"value":118},{"type":32,"tag":73,"props":143,"children":144},{"style":110},[145],{"type":37,"value":146},"\"Metrik aus BigQuery abrufen\"",{"type":32,"tag":73,"props":148,"children":149},{"style":80},[150],{"type":37,"value":128},{"type":32,"tag":73,"props":152,"children":154},{"class":75,"line":153},5,[155,160],{"type":32,"tag":73,"props":156,"children":157},{"style":110},[158],{"type":37,"value":159},"        \"parameters\"",{"type":32,"tag":73,"props":161,"children":162},{"style":80},[163],{"type":37,"value":164},": {\n",{"type":32,"tag":73,"props":166,"children":168},{"class":75,"line":167},6,[169,174,178,183],{"type":32,"tag":73,"props":170,"children":171},{"style":110},[172],{"type":37,"value":173},"            \"metric\"",{"type":32,"tag":73,"props":175,"children":176},{"style":80},[177],{"type":37,"value":118},{"type":32,"tag":73,"props":179,"children":180},{"style":110},[181],{"type":37,"value":182},"\"string (revenue|sessions|conversions)\"",{"type":32,"tag":73,"props":184,"children":185},{"style":80},[186],{"type":37,"value":128},{"type":32,"tag":73,"props":188,"children":190},{"class":75,"line":189},7,[191,196,200],{"type":32,"tag":73,"props":192,"children":193},{"style":110},[194],{"type":37,"value":195},"            \"date_range\"",{"type":32,"tag":73,"props":197,"children":198},{"style":80},[199],{"type":37,"value":118},{"type":32,"tag":73,"props":201,"children":202},{"style":110},[203],{"type":37,"value":204},"\"string (7d|30d|90d)\"\n",{"type":32,"tag":73,"props":206,"children":208},{"class":75,"line":207},8,[209],{"type":32,"tag":73,"props":210,"children":211},{"style":80},[212],{"type":37,"value":213},"        }\n",{"type":32,"tag":73,"props":215,"children":216},{"class":75,"line":26},[217],{"type":32,"tag":73,"props":218,"children":219},{"style":80},[220],{"type":37,"value":221},"    }\n",{"type":32,"tag":73,"props":223,"children":225},{"class":75,"line":224},10,[226],{"type":32,"tag":73,"props":227,"children":228},{"style":80},[229],{"type":37,"value":230},"]\n",{"type":32,"tag":33,"props":232,"children":233},{},[234],{"type":37,"value":235},"Entscheidet sich das Modell für dieses Tool, ruft der Orchestrator den BigQuery-Client auf, fügt das Ergebnis ins Prompt ein, das Modell synthetisiert die finale Antwort. Die Kraft von Tool Use: LLMs können die externe Welt abfragen, ohne auf Determinismus zu verzichten.",{"type":32,"tag":40,"props":237,"children":239},{"id":238},"parallele-und-serielle-agent-topologien",[240],{"type":37,"value":241},"Parallele und serielle Agent-Topologien",{"type":32,"tag":33,"props":243,"children":244},{},[245,247,253,255,260],{"type":37,"value":246},"Ein Agent = serieller Prozess. Multi-Agent = Mischung aus parallel + seriell. Zwei grundlegende Muster: ",{"type":32,"tag":248,"props":249,"children":250},"strong",{},[251],{"type":37,"value":252},"Scatter-Gather",{"type":37,"value":254}," und ",{"type":32,"tag":248,"props":256,"children":257},{},[258],{"type":37,"value":259},"Pipeline",{"type":37,"value":261},".",{"type":32,"tag":33,"props":263,"children":264},{},[265,270],{"type":32,"tag":248,"props":266,"children":267},{},[268],{"type":37,"value":269},"Scatter-Gather:",{"type":37,"value":271}," Der zentrale Orchestrator teilt die Aufgabe auf 3 Sub-Agenten auf, jeder arbeitet gleichzeitig mit einem anderen Tool. Beispiel: „Analysiere die Kampagnen-Performance des letzten Monats\" → agent_1 zur Google-Ads-API, agent_2 zur Meta-Ads-API, agent_3 zu BigQuery, alle parallel. Der Orchestrator sammelt 3 Responses, synthetisiert, liefert finalen Report. Latenz: max(agent_1, agent_2, agent_3) + Synthese-Latenz. Seriell wäre es agent_1 + agent_2 + agent_3 + Synthese. Statt 3×800ms = 2400ms sind es 800ms + 300ms = 1100ms.",{"type":32,"tag":33,"props":273,"children":274},{},[275,280,282,287],{"type":32,"tag":248,"props":276,"children":277},{},[278],{"type":37,"value":279},"Pipeline:",{"type":37,"value":281}," Output von Agent_A ist Input für Agent_B. Beispiel: (1) Query-Planer-Agent schreibt SQL → (2) Ausführungs-Agent führt SQL aus → (3) Visualisierungs-Agent erzeugt Graph-Spec. Jeder Schritt ist Abhängigkeit des nächsten. Latenz ist seriell, aber ",{"type":32,"tag":248,"props":283,"children":284},{},[285],{"type":37,"value":286},"jeder Agent ist spezialisiert",{"type":37,"value":288}," — Query-Planer kann ein kleines Modell sein (GPT-4o-mini, 50ms), erfordert keine Execution-Logik, Visualisierungs-Agent kann Gemini Flash verwenden. 3 kleine Modelle statt 1 großes = billiger + schneller (manchmal).",{"type":32,"tag":33,"props":290,"children":291},{},[292,294,303],{"type":37,"value":293},"Bei Roibases ",{"type":32,"tag":295,"props":296,"children":300},"a",{"href":297,"rel":298},"https:\u002F\u002Fwww.roibase.com.tr\u002Fde\u002Ffirstparty",[299],"nofollow",[301],{"type":37,"value":302},"First-Party-Daten & Messung-Architektur",{"type":37,"value":304}," nutzen wir Multi-Agent-Orchestrierung in Attribution-Pipelines: ein Agent parsed Raw Events, ein Agent bindet sie an Sessions, ein Agent mapped Revenue, finaler Agent berechnet Cross-Channel-Attribution. Pipeline-Topologie = deterministische Schritte, jeder mit eigenem Tool-Set.",{"type":32,"tag":306,"props":307,"children":309},"h3",{"id":308},"paralleles-vs-serielles-tradeoff",[310],{"type":37,"value":311},"Paralleles vs. serielles Tradeoff",{"type":32,"tag":313,"props":314,"children":315},"table",{},[316,345],{"type":32,"tag":317,"props":318,"children":319},"thead",{},[320],{"type":32,"tag":321,"props":322,"children":323},"tr",{},[324,330,335,340],{"type":32,"tag":325,"props":326,"children":327},"th",{},[328],{"type":37,"value":329},"Topologie",{"type":32,"tag":325,"props":331,"children":332},{},[333],{"type":37,"value":334},"Latenz",{"type":32,"tag":325,"props":336,"children":337},{},[338],{"type":37,"value":339},"Cost",{"type":32,"tag":325,"props":341,"children":342},{},[343],{"type":37,"value":344},"Einsatzfall",{"type":32,"tag":346,"props":347,"children":348},"tbody",{},[349,373,396],{"type":32,"tag":321,"props":350,"children":351},{},[352,358,363,368],{"type":32,"tag":353,"props":354,"children":355},"td",{},[356],{"type":37,"value":357},"Parallel (Scatter-Gather)",{"type":32,"tag":353,"props":359,"children":360},{},[361],{"type":37,"value":362},"Niedrig (max-Prozess)",{"type":32,"tag":353,"props":364,"children":365},{},[366],{"type":37,"value":367},"Hoch (N Agent × LLM-Aufruf)",{"type":32,"tag":353,"props":369,"children":370},{},[371],{"type":37,"value":372},"Unabhängige Abfragen (Multi-Source-Datenzug)",{"type":32,"tag":321,"props":374,"children":375},{},[376,381,386,391],{"type":32,"tag":353,"props":377,"children":378},{},[379],{"type":37,"value":380},"Seriell (Pipeline)",{"type":32,"tag":353,"props":382,"children":383},{},[384],{"type":37,"value":385},"Hoch (Gesamtdauer)",{"type":32,"tag":353,"props":387,"children":388},{},[389],{"type":37,"value":390},"Mittel (jeder Agent könnte kleines Modell sein)",{"type":32,"tag":353,"props":392,"children":393},{},[394],{"type":37,"value":395},"Abhängige Verarbeitung (Parse → Enrichment → Analyse)",{"type":32,"tag":321,"props":397,"children":398},{},[399,404,409,414],{"type":32,"tag":353,"props":400,"children":401},{},[402],{"type":37,"value":403},"Hybrid (Parallel → Merge → Seriell)",{"type":32,"tag":353,"props":405,"children":406},{},[407],{"type":37,"value":408},"Mittel",{"type":32,"tag":353,"props":410,"children":411},{},[412],{"type":37,"value":413},"Mittel-Hoch",{"type":32,"tag":353,"props":415,"children":416},{},[417],{"type":37,"value":418},"Komplexe Aufgabe (Datenbeschaffung parallel, Ergebnis-Pipeline)",{"type":32,"tag":33,"props":420,"children":421},{},[422],{"type":37,"value":423},"In Production legen wir Concurrency-Limits für Scatter-Gather fest, um Rate Limits zu vermeiden (z.B. max 5 parallele LLM-Aufrufe). Bei seriellen Pipelines nutzen wir Intermediate-Cache — wenn Agent_As Output 10 Minuten gültig ist, startet Agent_B bei derselben Query direkt vom gecachten Output.",{"type":32,"tag":40,"props":425,"children":427},{"id":426},"aufgaben-des-orchestrators-routing-und-error-handling",[428],{"type":37,"value":429},"Aufgaben des Orchestrators: Routing und Error Handling",{"type":32,"tag":33,"props":431,"children":432},{},[433,435,440],{"type":37,"value":434},"Der Orchestrator tetigt Agent nicht nur, sondern ",{"type":32,"tag":248,"props":436,"children":437},{},[438],{"type":37,"value":439},"entscheidet, welcher Agent welche Aufgabe übernimmt",{"type":37,"value":441},". In LangGraph heißt das „Supervisor Agent\": kategorisiert eingehende Query und routet. Beispiel-Logik:",{"type":32,"tag":62,"props":443,"children":445},{"code":444,"language":65,"meta":16,"className":66,"style":16},"def route_query(user_query: str) -> str:\n    # LLM-basiertes Routing (kleines Modell, schnell)\n    classification = llm.classify(user_query, categories=[\"data_query\", \"content_gen\", \"code_review\"])\n    \n    if classification == \"data_query\":\n        return \"analytics_agent\"\n    elif classification == \"content_gen\":\n        return \"writer_agent\"\n    else:\n        return \"code_agent\"\n",[446],{"type":32,"tag":69,"props":447,"children":448},{"__ignoreMap":16},[449,488,497,558,566,593,606,631,643,655],{"type":32,"tag":73,"props":450,"children":451},{"class":75,"line":76},[452,457,463,468,474,479,483],{"type":32,"tag":73,"props":453,"children":454},{"style":86},[455],{"type":37,"value":456},"def",{"type":32,"tag":73,"props":458,"children":460},{"style":459},"--shiki-default:#B392F0",[461],{"type":37,"value":462}," route_query",{"type":32,"tag":73,"props":464,"children":465},{"style":80},[466],{"type":37,"value":467},"(user_query: ",{"type":32,"tag":73,"props":469,"children":471},{"style":470},"--shiki-default:#79B8FF",[472],{"type":37,"value":473},"str",{"type":32,"tag":73,"props":475,"children":476},{"style":80},[477],{"type":37,"value":478},") -> ",{"type":32,"tag":73,"props":480,"children":481},{"style":470},[482],{"type":37,"value":473},{"type":32,"tag":73,"props":484,"children":485},{"style":80},[486],{"type":37,"value":487},":\n",{"type":32,"tag":73,"props":489,"children":490},{"class":75,"line":97},[491],{"type":32,"tag":73,"props":492,"children":494},{"style":493},"--shiki-default:#6A737D",[495],{"type":37,"value":496},"    # LLM-basiertes Routing (kleines Modell, schnell)\n",{"type":32,"tag":73,"props":498,"children":499},{"class":75,"line":106},[500,505,509,514,520,524,529,534,539,544,548,553],{"type":32,"tag":73,"props":501,"children":502},{"style":80},[503],{"type":37,"value":504},"    classification ",{"type":32,"tag":73,"props":506,"children":507},{"style":86},[508],{"type":37,"value":89},{"type":32,"tag":73,"props":510,"children":511},{"style":80},[512],{"type":37,"value":513}," llm.classify(user_query, ",{"type":32,"tag":73,"props":515,"children":517},{"style":516},"--shiki-default:#FFAB70",[518],{"type":37,"value":519},"categories",{"type":32,"tag":73,"props":521,"children":522},{"style":86},[523],{"type":37,"value":89},{"type":32,"tag":73,"props":525,"children":526},{"style":80},[527],{"type":37,"value":528},"[",{"type":32,"tag":73,"props":530,"children":531},{"style":110},[532],{"type":37,"value":533},"\"data_query\"",{"type":32,"tag":73,"props":535,"children":536},{"style":80},[537],{"type":37,"value":538},", ",{"type":32,"tag":73,"props":540,"children":541},{"style":110},[542],{"type":37,"value":543},"\"content_gen\"",{"type":32,"tag":73,"props":545,"children":546},{"style":80},[547],{"type":37,"value":538},{"type":32,"tag":73,"props":549,"children":550},{"style":110},[551],{"type":37,"value":552},"\"code_review\"",{"type":32,"tag":73,"props":554,"children":555},{"style":80},[556],{"type":37,"value":557},"])\n",{"type":32,"tag":73,"props":559,"children":560},{"class":75,"line":131},[561],{"type":32,"tag":73,"props":562,"children":563},{"style":80},[564],{"type":37,"value":565},"    \n",{"type":32,"tag":73,"props":567,"children":568},{"class":75,"line":153},[569,574,579,584,589],{"type":32,"tag":73,"props":570,"children":571},{"style":86},[572],{"type":37,"value":573},"    if",{"type":32,"tag":73,"props":575,"children":576},{"style":80},[577],{"type":37,"value":578}," classification ",{"type":32,"tag":73,"props":580,"children":581},{"style":86},[582],{"type":37,"value":583},"==",{"type":32,"tag":73,"props":585,"children":586},{"style":110},[587],{"type":37,"value":588}," \"data_query\"",{"type":32,"tag":73,"props":590,"children":591},{"style":80},[592],{"type":37,"value":487},{"type":32,"tag":73,"props":594,"children":595},{"class":75,"line":167},[596,601],{"type":32,"tag":73,"props":597,"children":598},{"style":86},[599],{"type":37,"value":600},"        return",{"type":32,"tag":73,"props":602,"children":603},{"style":110},[604],{"type":37,"value":605}," \"analytics_agent\"\n",{"type":32,"tag":73,"props":607,"children":608},{"class":75,"line":189},[609,614,618,622,627],{"type":32,"tag":73,"props":610,"children":611},{"style":86},[612],{"type":37,"value":613},"    elif",{"type":32,"tag":73,"props":615,"children":616},{"style":80},[617],{"type":37,"value":578},{"type":32,"tag":73,"props":619,"children":620},{"style":86},[621],{"type":37,"value":583},{"type":32,"tag":73,"props":623,"children":624},{"style":110},[625],{"type":37,"value":626}," \"content_gen\"",{"type":32,"tag":73,"props":628,"children":629},{"style":80},[630],{"type":37,"value":487},{"type":32,"tag":73,"props":632,"children":633},{"class":75,"line":207},[634,638],{"type":32,"tag":73,"props":635,"children":636},{"style":86},[637],{"type":37,"value":600},{"type":32,"tag":73,"props":639,"children":640},{"style":110},[641],{"type":37,"value":642}," \"writer_agent\"\n",{"type":32,"tag":73,"props":644,"children":645},{"class":75,"line":26},[646,651],{"type":32,"tag":73,"props":647,"children":648},{"style":86},[649],{"type":37,"value":650},"    else",{"type":32,"tag":73,"props":652,"children":653},{"style":80},[654],{"type":37,"value":487},{"type":32,"tag":73,"props":656,"children":657},{"class":75,"line":224},[658,662],{"type":32,"tag":73,"props":659,"children":660},{"style":86},[661],{"type":37,"value":600},{"type":32,"tag":73,"props":663,"children":664},{"style":110},[665],{"type":37,"value":666}," \"code_agent\"\n",{"type":32,"tag":33,"props":668,"children":669},{},[670],{"type":37,"value":671},"Der Router-Agent nutzt üblicherweise ein schnelles, billiges Modell wie GPT-4o-mini oder Claude Haiku. Es addiert 50–100ms Overhead, aber verhindert unnötige große Modelle. Sagt der Nutzer „Fasse Kampagnen-Performance zusammen\", geht es zum analytics_agent (BigQuery Tool Use), sagt er „Schreibe Blogartikel\", zum writer_agent (Web-Search-Tool + Writing-LLM).",{"type":32,"tag":33,"props":673,"children":674},{},[675,680],{"type":32,"tag":248,"props":676,"children":677},{},[678],{"type":37,"value":679},"Error Handling ist in Multi-Agent kritisch.",{"type":37,"value":681}," Mit einzelnem Agent: LLM halluziniert → Retry. Mit Multi-Agent: agent_2 arbeitet mit fehlerhafter Output von agent_1 → Cascade Failure. Der Orchestrator muss jede Agent-Ausgabe validieren:",{"type":32,"tag":62,"props":683,"children":685},{"code":684,"language":65,"meta":16,"className":66,"style":16},"def validate_agent_output(output: dict, schema: dict) -> bool:\n    # JSON-Schema-Validierung\n    if not matches_schema(output, schema):\n        raise AgentOutputError(\"Agent-Ausgabe entspricht nicht dem Schema\")\n    \n    # Semantische Prüfung (optional, teuer)\n    if confidence_score(output) \u003C 0.7:\n        return False  # retry oder Fallback\n    \n    return True\n",[686],{"type":32,"tag":69,"props":687,"children":688},{"__ignoreMap":16},[689,733,741,758,781,788,796,822,839,846],{"type":32,"tag":73,"props":690,"children":691},{"class":75,"line":76},[692,696,701,706,711,716,720,724,729],{"type":32,"tag":73,"props":693,"children":694},{"style":86},[695],{"type":37,"value":456},{"type":32,"tag":73,"props":697,"children":698},{"style":459},[699],{"type":37,"value":700}," validate_agent_output",{"type":32,"tag":73,"props":702,"children":703},{"style":80},[704],{"type":37,"value":705},"(output: ",{"type":32,"tag":73,"props":707,"children":708},{"style":470},[709],{"type":37,"value":710},"dict",{"type":32,"tag":73,"props":712,"children":713},{"style":80},[714],{"type":37,"value":715},", schema: ",{"type":32,"tag":73,"props":717,"children":718},{"style":470},[719],{"type":37,"value":710},{"type":32,"tag":73,"props":721,"children":722},{"style":80},[723],{"type":37,"value":478},{"type":32,"tag":73,"props":725,"children":726},{"style":470},[727],{"type":37,"value":728},"bool",{"type":32,"tag":73,"props":730,"children":731},{"style":80},[732],{"type":37,"value":487},{"type":32,"tag":73,"props":734,"children":735},{"class":75,"line":97},[736],{"type":32,"tag":73,"props":737,"children":738},{"style":493},[739],{"type":37,"value":740},"    # JSON-Schema-Validierung\n",{"type":32,"tag":73,"props":742,"children":743},{"class":75,"line":106},[744,748,753],{"type":32,"tag":73,"props":745,"children":746},{"style":86},[747],{"type":37,"value":573},{"type":32,"tag":73,"props":749,"children":750},{"style":86},[751],{"type":37,"value":752}," not",{"type":32,"tag":73,"props":754,"children":755},{"style":80},[756],{"type":37,"value":757}," matches_schema(output, schema):\n",{"type":32,"tag":73,"props":759,"children":760},{"class":75,"line":131},[761,766,771,776],{"type":32,"tag":73,"props":762,"children":763},{"style":86},[764],{"type":37,"value":765},"        raise",{"type":32,"tag":73,"props":767,"children":768},{"style":80},[769],{"type":37,"value":770}," AgentOutputError(",{"type":32,"tag":73,"props":772,"children":773},{"style":110},[774],{"type":37,"value":775},"\"Agent-Ausgabe entspricht nicht dem Schema\"",{"type":32,"tag":73,"props":777,"children":778},{"style":80},[779],{"type":37,"value":780},")\n",{"type":32,"tag":73,"props":782,"children":783},{"class":75,"line":153},[784],{"type":32,"tag":73,"props":785,"children":786},{"style":80},[787],{"type":37,"value":565},{"type":32,"tag":73,"props":789,"children":790},{"class":75,"line":167},[791],{"type":32,"tag":73,"props":792,"children":793},{"style":493},[794],{"type":37,"value":795},"    # Semantische Prüfung (optional, teuer)\n",{"type":32,"tag":73,"props":797,"children":798},{"class":75,"line":189},[799,803,808,813,818],{"type":32,"tag":73,"props":800,"children":801},{"style":86},[802],{"type":37,"value":573},{"type":32,"tag":73,"props":804,"children":805},{"style":80},[806],{"type":37,"value":807}," confidence_score(output) ",{"type":32,"tag":73,"props":809,"children":810},{"style":86},[811],{"type":37,"value":812},"\u003C",{"type":32,"tag":73,"props":814,"children":815},{"style":470},[816],{"type":37,"value":817}," 0.7",{"type":32,"tag":73,"props":819,"children":820},{"style":80},[821],{"type":37,"value":487},{"type":32,"tag":73,"props":823,"children":824},{"class":75,"line":207},[825,829,834],{"type":32,"tag":73,"props":826,"children":827},{"style":86},[828],{"type":37,"value":600},{"type":32,"tag":73,"props":830,"children":831},{"style":470},[832],{"type":37,"value":833}," False",{"type":32,"tag":73,"props":835,"children":836},{"style":493},[837],{"type":37,"value":838},"  # retry oder Fallback\n",{"type":32,"tag":73,"props":840,"children":841},{"class":75,"line":26},[842],{"type":32,"tag":73,"props":843,"children":844},{"style":80},[845],{"type":37,"value":565},{"type":32,"tag":73,"props":847,"children":848},{"class":75,"line":224},[849,854],{"type":32,"tag":73,"props":850,"children":851},{"style":86},[852],{"type":37,"value":853},"    return",{"type":32,"tag":73,"props":855,"children":856},{"style":470},[857],{"type":37,"value":858}," True\n",{"type":32,"tag":33,"props":860,"children":861},{},[862],{"type":37,"value":863},"Schlägt agent_1 fehl, geht der Orchestrator zur Fallback-Chain: erst Retry (1×), dann alternativer Agent (größeres Modell), dann Human-in-the-Loop. Ohne diese Logik ist Multi-Agent unreliabel.",{"type":32,"tag":40,"props":865,"children":867},{"id":866},"latenz-und-cost-benchmark-szenarien",[868],{"type":37,"value":869},"Latenz und Cost: Benchmark-Szenarien",{"type":32,"tag":33,"props":871,"children":872},{},[873],{"type":37,"value":874},"Test-Szenario: „Analysiere Umsatz-Trend der letzten 30 Tage, fasse Kampagnen-Performance zusammen, schreibe Übersichts-Email für CEO\" — 3 unabhängige Aufgaben.",{"type":32,"tag":33,"props":876,"children":877},{},[878],{"type":32,"tag":248,"props":879,"children":880},{},[881],{"type":37,"value":882},"Single Agent (GPT-4, seriell):",{"type":32,"tag":884,"props":885,"children":886},"ul",{},[887,893,898,903,913],{"type":32,"tag":888,"props":889,"children":890},"li",{},[891],{"type":37,"value":892},"BigQuery abfragen → 800ms (LLM + API)",{"type":32,"tag":888,"props":894,"children":895},{},[896],{"type":37,"value":897},"Ad Platforms abfragen → 900ms",{"type":32,"tag":888,"props":899,"children":900},{},[901],{"type":37,"value":902},"Email generieren → 600ms",{"type":32,"tag":888,"props":904,"children":905},{},[906,911],{"type":32,"tag":248,"props":907,"children":908},{},[909],{"type":37,"value":910},"Gesamt:",{"type":37,"value":912}," 2300ms",{"type":32,"tag":888,"props":914,"children":915},{},[916,921],{"type":32,"tag":248,"props":917,"children":918},{},[919],{"type":37,"value":920},"Cost:",{"type":37,"value":922}," 3 Durchläufe × $0.03\u002F1K Token = ~$0.09 (Standard-Input\u002FOutput-Mix)",{"type":32,"tag":33,"props":924,"children":925},{},[926],{"type":32,"tag":248,"props":927,"children":928},{},[929],{"type":37,"value":930},"Multi-Agent (Scatter-Gather + Pipeline):",{"type":32,"tag":884,"props":932,"children":933},{},[934,939,944,953],{"type":32,"tag":888,"props":935,"children":936},{},[937],{"type":37,"value":938},"Agent_1, 2, 3 parallel (BigQuery, Ads, Email-Vorbereitung) → max 900ms",{"type":32,"tag":888,"props":940,"children":941},{},[942],{"type":37,"value":943},"Orchestrator Merge + Synthese → 400ms",{"type":32,"tag":888,"props":945,"children":946},{},[947,951],{"type":32,"tag":248,"props":948,"children":949},{},[950],{"type":37,"value":910},{"type":37,"value":952}," 1300ms",{"type":32,"tag":888,"props":954,"children":955},{},[956,960],{"type":32,"tag":248,"props":957,"children":958},{},[959],{"type":37,"value":920},{"type":37,"value":961}," 3 Agent × $0.02 (kleines Modell) + Synthese $0.03 = ~$0.09 (gleich, aber mit Modell-Optimierung auf $0.05 reduzierbar)",{"type":32,"tag":33,"props":963,"children":964},{},[965,970],{"type":32,"tag":248,"props":966,"children":967},{},[968],{"type":37,"value":969},"Gewinn:",{"type":37,"value":971}," 43% Latenz-Reduktion. Cost gleich, aber mit Modell-Optimierung (agent_1 → Gemini Flash, agent_2 → Claude Haiku, Orchestrator → GPT-4o-mini) auf $0.05 reduzierbar.",{"type":32,"tag":33,"props":973,"children":974},{},[975,980],{"type":32,"tag":248,"props":976,"children":977},{},[978],{"type":37,"value":979},"Aber:",{"type":37,"value":981}," Parallele Agenten = parallele Rate-Limit-Auslastung. Wenn OpenAI-Tier 500 RPM erlaubt, bedeuten 10 parallele Agenten 50 User in 5 Minuten. Einzelner Agent hätte 500 User in 5 Minuten bedient. In Production managen wir diesen Tradeoff mit Queue + Cache.",{"type":32,"tag":40,"props":983,"children":985},{"id":984},"beobachtbarkeit-und-debugging",[986],{"type":37,"value":987},"Beobachtbarkeit und Debugging",{"type":32,"tag":33,"props":989,"children":990},{},[991],{"type":37,"value":992},"In Multi-Agent-Systemen ist die Antwort auf „Wo ist es schief gelaufen?\" schwer. Tools wie LangSmith, Helicone, Arize Phoenix visualisieren Agent-Trace: welcher Agent wann welches Tool aufgerufen hat, mit welchem Prompt, was zurückgekommen ist, wo Retries stattgefunden haben. Beispiel-Trace:",{"type":32,"tag":62,"props":994,"children":996},{"code":995},"orchestrator → classify_query (50ms, GPT-4o-mini) → \"data_query\"\n→ analytics_agent → query_bigquery (800ms, tool_call) → success\n→ writer_agent → generate_summary (600ms, GPT-4) → success\n→ orchestrator → merge_results (200ms) → final_output\n",[997],{"type":32,"tag":69,"props":998,"children":999},{"__ignoreMap":16},[1000],{"type":37,"value":995},{"type":32,"tag":33,"props":1002,"children":1003},{},[1004],{"type":37,"value":1005},"Bei jedem Schritt werden Token-Count, Latenz und Cost geloggt. Ohne dieses Telemetry in Production ist Multi-Agent nicht debugbar. Wenn Agent As Tool Call timeoutet, sieht man es im Trace, fügt Retry-Logik ein.",{"type":32,"tag":33,"props":1007,"children":1008},{},[1009,1011,1016],{"type":37,"value":1010},"Eine weitere Metrik: ",{"type":32,"tag":248,"props":1012,"children":1013},{},[1014],{"type":37,"value":1015},"Agent-Auslastung",{"type":37,"value":1017},". Wenn du 5 Agenten definiert hast, aber 80% der User-Queries an einen Agent gehen, ist die Routing-Logik fehlerhaft. Wir messen die Classification-Accuracy des Orchestrators — mit User-Feedback schaffen wir ein gelabeltes Dataset und Fine-Tune den Router-Agent (Few-Shot-Prompt statt Lightweight-Classifier).",{"type":32,"tag":40,"props":1019,"children":1021},{"id":1020},"limits-von-multi-agent",[1022],{"type":37,"value":1023},"Limits von Multi-Agent",{"type":32,"tag":33,"props":1025,"children":1026},{},[1027,1029,1034],{"type":37,"value":1028},"Multi-Agent löst nicht jedes Problem. Es gibt ",{"type":32,"tag":248,"props":1030,"children":1031},{},[1032],{"type":37,"value":1033},"Coordination Overhead",{"type":37,"value":1035},": Nachrichtenfluss zwischen Agenten, Orchestrierungs-Logik, Error Handling — alles addiert Latenz. Eine einfache Query, die Single-Agent in 1 Sekunde beendet, könnte Multi-Agent 1,5 Sekunden kosten (Orchestrator + Routing + Merge). Architektur-Komplexität wächst — Codebasis wird größer, Testen schwerer, Deployment heikler.",{"type":32,"tag":33,"props":1037,"children":1038},{},[1039],{"type":37,"value":1040},"Multi-Agent macht Sinn bei:",{"type":32,"tag":884,"props":1042,"children":1043},{},[1044,1054,1064],{"type":32,"tag":888,"props":1045,"children":1046},{},[1047,1052],{"type":32,"tag":248,"props":1048,"children":1049},{},[1050],{"type":37,"value":1051},"Paralleler Datenzug erforderlich:",{"type":37,"value":1053}," 5 verschiedene APIs-Abfragen → Scatter-Gather spart Zeit",{"type":32,"tag":888,"props":1055,"children":1056},{},[1057,1062],{"type":32,"tag":248,"props":1058,"children":1059},{},[1060],{"type":37,"value":1061},"Spezialisierte Modelle optimal:",{"type":37,"value":1063}," Kleine für Query-Planung, große für Code-Generation — Pipeline senkt Cost",{"type":32,"tag":888,"props":1065,"children":1066},{},[1067,1072],{"type":32,"tag":248,"props":1068,"children":1069},{},[1070],{"type":37,"value":1071},"Long-Running-Task:",{"type":37,"value":1073}," Agent_1 startet Arbeit, agent_2 überwacht async, agent_3 beendet, Orchestrator notifiziert — Event-Driven statt Sync-Call",{"type":32,"tag":33,"props":1075,"children":1076},{},[1077],{"type":37,"value":1078},"Bei kurzen, häufigen, einfachen Queries schlägt Single-Agent + Caching Multi-Agent. Multi-Agent schafft Wert durch Decomposition und Optimierung komplexer Aufgaben.",{"type":32,"tag":1080,"props":1081,"children":1082},"hr",{},[],{"type":32,"tag":33,"props":1084,"children":1085},{},[1086],{"type":37,"value":1087},"Multi-Agent-Orchestrierung transformiert LLMs von stateless Funktionsaufrufen zu stateful, beobachtbaren, skalierbaren Systemen. Parallele Topologie bricht Latenz, Pipeline senkt Cost, Orchestrator bringt Zuverlässigkeit. In Production: starte mit Scatter-Gather, überwache Rate Limits und Cost, wechsle bei Bedarf zur Pipeline. Logge Agent-Traces, schichte Error Handling, teste Routing-Logik. Multi-Agent ist der Übergangspunkt von LLM-Engineering zu LLM-Infrastructure.",{"type":32,"tag":1089,"props":1090,"children":1091},"style",{},[1092],{"type":37,"value":1093},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":106,"depth":106,"links":1095},[1096,1097,1100,1101,1102,1103],{"id":42,"depth":97,"text":45},{"id":238,"depth":97,"text":241,"children":1098},[1099],{"id":308,"depth":106,"text":311},{"id":426,"depth":97,"text":429},{"id":866,"depth":97,"text":869},{"id":984,"depth":97,"text":987},{"id":1020,"depth":97,"text":1023},"markdown","content:de:ai:multi-agent-orchestrierung-llm-aufrufe-systeme.md","content","de\u002Fai\u002Fmulti-agent-orchestrierung-llm-aufrufe-systeme.md","de\u002Fai\u002Fmulti-agent-orchestrierung-llm-aufrufe-systeme","md",1780898612929]