[{"data":1,"prerenderedAt":1603},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fde\u002Fai\u002Frag-production-retrieval-kalitesi":13},{"i18nKey":4,"paths":5},"ai-003-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fen\u002Fai\u002Frag-retrieval-quality-over-cost","\u002Fes\u002Fai\u002Frag-en-produccion-calidad-de-recuperacion-antes-que-costo","\u002Ffr\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fit\u002Fai\u002Frag-production-retrieval-kalitesi-once-gelir","\u002Fru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost","\u002Ftr\u002Fai\u002Fproductionda-rag-retrieval-kalitesi-costtan-once-gelir",{"_path":6,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1597,"_id":1598,"_source":1599,"_file":1600,"_stem":1601,"_extension":1602},"ai",false,"","Production RAG: Retrieval-Qualität vor Kosteneinsparungen","Falsche Embedding-Modell, Chunking-Strategie oder Eval-Setup führen zu teuren oder langsamen RAG-Systemen. Was muss man in Production beachten?","2026-05-11",[21,22,23,24,25],"rag","embedding","chunking","llm-eval","retrieval-qualität",9,"Roibase",{"type":29,"children":30,"toc":1585},"root",[31,39,46,51,73,94,103,123,139,145,150,155,174,218,223,256,269,276,281,287,292,300,827,837,842,848,860,1241,1246,1252,1264,1285,1290,1308,1313,1319,1324,1332,1350,1355,1361,1373,1478,1483,1501,1515,1521,1526,1569,1574,1579],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","RAG-Systeme sind seit 2024 in der Production weit verbreitet. Unternehmen integrieren ihre eigenen Dokumentkorpora in LLMs über Embedding + Vector-DB-Stack. Aber die meisten Pilotprojekte stoßen auf dasselbe Problem: Retrieval-Qualität ist niedrig, Antworten sind inkonsistent, Kosten außer Kontrolle. Das liegt meist daran, dass die Embedding-Modell-Auswahl, Chunking-Strategie und das Eval-Setup zu schnell über den Tisch gegangen werden. Dieser Artikel zeigt, welche Entscheidungen nicht umkehrbar sind, bevor du die RAG-Pipeline in die Production verschiebst.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"embedding-modell-dimension-ist-nicht-alles-domain-ausrichtung-zählt",[44],{"type":37,"value":45},"Embedding-Modell: Dimension ist nicht alles, Domain-Ausrichtung zählt",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"Die erste Reaktion bei der Embedding-Modell-Wahl ist \"welcher hat den höchsten MTEB-Score?\" Aber Benchmark-Rankings garantieren keine Production-Performance. Entscheidend ist, wie gut das Modell auf deine Dokumenttypen und Query-Patterns passt.",{"type":32,"tag":33,"props":52,"children":53},{},[54,56,63,65,71],{"type":37,"value":55},"Als wir OpenAI ",{"type":32,"tag":57,"props":58,"children":60},"code",{"className":59},[],[61],{"type":37,"value":62},"text-embedding-3-large",{"type":37,"value":64}," (3072 dim) mit Cohere ",{"type":32,"tag":57,"props":66,"children":68},{"className":67},[],[69],{"type":37,"value":70},"embed-v3",{"type":37,"value":72}," (1024 dim) verglichen, lieferte Cohere bei Marketing-Dokumenten (Blogs, Case Studies, Landing Pages) besseren Recall@10 — das Training-Set enthielt mehr geschäftsinhalte. OpenAI's größere Dimension ist zwar in allgemeinen Benchmarks überlegen, aber die Query-Distribution unterscheidet sich im spezifischen Domain.",{"type":32,"tag":33,"props":74,"children":75},{},[76,78,84,86,92],{"type":37,"value":77},"Ein weiteres Beispiel: ",{"type":32,"tag":57,"props":79,"children":81},{"className":80},[],[82],{"type":37,"value":83},"bge-large-en-v1.5",{"type":37,"value":85}," (1024 dim, selbstgehostet) reicht für juristische Dokumente aus. Aber bei mehrsprachigem Corpus schlägt ",{"type":32,"tag":57,"props":87,"children":89},{"className":88},[],[90],{"type":37,"value":91},"multilingual-e5-large",{"type":37,"value":93}," (1024 dim) die Konkurrenz deutlich. Die Modellgröße ist nicht immer ein Qualitätssignal — die Übereinstimmung der Training-Daten mit deinem Domain ist kritischer.",{"type":32,"tag":33,"props":95,"children":96},{},[97],{"type":32,"tag":98,"props":99,"children":100},"strong",{},[101],{"type":37,"value":102},"Auswahlkriterien:",{"type":32,"tag":104,"props":105,"children":106},"ol",{},[107,113,118],{"type":32,"tag":108,"props":109,"children":110},"li",{},[111],{"type":37,"value":112},"MTEB-Score nicht — Recall@5 \u002F MRR-Metrik auf eigenem Eval-Set",{"type":32,"tag":108,"props":114,"children":115},{},[116],{"type":37,"value":117},"Latenz (selbstgehostet vs API) — Batch-Embedding-Zeit für 512 Dokumente",{"type":32,"tag":108,"props":119,"children":120},{},[121],{"type":37,"value":122},"Kosten pro 1M Token — OpenAI 3-large $0,13, Cohere v3 $0,10, selbstgehostet $0 plus Infrastruktur",{"type":32,"tag":33,"props":124,"children":125},{},[126,128,137],{"type":37,"value":127},"Wenn dein Dokumentset spezifische Domain-Begriffe enthält (Pharma, Finanzen, Legal), steigert Fine-Tuning eines Embedding-Modells oder ein selbsttrainierter Sentence Transformer die Retrieval-Qualität um 15-20%. Das fällt unter ",{"type":32,"tag":129,"props":130,"children":134},"a",{"href":131,"rel":132},"https:\u002F\u002Fwww.roibase.com.tr\u002Fde\u002Fverianalizi",[133],"nofollow",[135],{"type":37,"value":136},"Datenanalyse & Insights-Engineering",{"type":37,"value":138}," — du musst eine Training-Pipeline aufbauen und Datenqualität überwachen.",{"type":32,"tag":40,"props":140,"children":142},{"id":141},"chunking-strategie-feste-größe-funktioniert-nicht",[143],{"type":37,"value":144},"Chunking-Strategie: Feste Größe funktioniert nicht",{"type":32,"tag":33,"props":146,"children":147},{},[148],{"type":37,"value":149},"Die meisten RAG-Implementierungen starten mit \"512 Token mit Overlap-Fenster\" als Standard. Bei Mixed-Format-Corpus (PDF, HTML, JSON) funktioniert das sofort nicht mehr.",{"type":32,"tag":33,"props":151,"children":152},{},[153],{"type":37,"value":154},"Probleme mit fester Größe:",{"type":32,"tag":156,"props":157,"children":158},"ul",{},[159,164,169],{"type":32,"tag":108,"props":160,"children":161},{},[162],{"type":37,"value":163},"Überschriften werden zerrissen, semantische Integrität geht verloren",{"type":32,"tag":108,"props":165,"children":166},{},[167],{"type":37,"value":168},"Tabellen, Code-Blöcke werden mitten durchgeteilt",{"type":32,"tag":108,"props":170,"children":171},{},[172],{"type":37,"value":173},"Overlap-Strategie dupliziert überlappendes Context, Retrieval-Rauschen nimmt zu",{"type":32,"tag":33,"props":175,"children":176},{},[177,179,184,186,192,194,200,202,208,210,216],{"type":37,"value":178},"Alternative: ",{"type":32,"tag":98,"props":180,"children":181},{},[182],{"type":37,"value":183},"Semantic Chunking",{"type":37,"value":185},". Dokumentfragmente nach Satzbegrenzungen, Überschriften-Hierarchie aufteilen und semantische Integrität bewahren. Nutze ",{"type":32,"tag":57,"props":187,"children":189},{"className":188},[],[190],{"type":37,"value":191},"langchain",{"type":37,"value":193},"'s ",{"type":32,"tag":57,"props":195,"children":197},{"className":196},[],[198],{"type":37,"value":199},"MarkdownTextSplitter",{"type":37,"value":201}," statt ",{"type":32,"tag":57,"props":203,"children":205},{"className":204},[],[206],{"type":37,"value":207},"RecursiveCharacterTextSplitter",{"type":37,"value":209},". Bei PDFs ",{"type":32,"tag":57,"props":211,"children":213},{"className":212},[],[214],{"type":37,"value":215},"pdfplumber",{"type":37,"value":217}," nutzen für Tabel + Text-Trennung und unterschiedliche Chunk-Strategien pro Typ.",{"type":32,"tag":33,"props":219,"children":220},{},[221],{"type":37,"value":222},"Bei einer E-Commerce-Firma haben wir die Produkt-Dokumentation in 3 Chunk-Typen aufgeteilt:",{"type":32,"tag":156,"props":224,"children":225},{},[226,236,246],{"type":32,"tag":108,"props":227,"children":228},{},[229,234],{"type":32,"tag":98,"props":230,"children":231},{},[232],{"type":37,"value":233},"Titel + Kurzbeschreibung:",{"type":37,"value":235}," 128 Token, leicht für Retrieval",{"type":32,"tag":108,"props":237,"children":238},{},[239,244],{"type":32,"tag":98,"props":240,"children":241},{},[242],{"type":37,"value":243},"Technische Spezifikationen + Tabelle:",{"type":37,"value":245}," 256 Token, strukturierte Daten",{"type":32,"tag":108,"props":247,"children":248},{},[249,254],{"type":32,"tag":98,"props":250,"children":251},{},[252],{"type":37,"value":253},"Langform (Blog, Guides):",{"type":37,"value":255}," 512 Token, semantische Aufteilung",{"type":32,"tag":33,"props":257,"children":258},{},[259,261,267],{"type":37,"value":260},"Wir markierten jeden Chunk mit Metadaten (chunk_type, source_page). Im Retrieval filterten wir nach Query-Typ. Z.B. \"Produktvergleich\"-Anfragen schauten nur auf ",{"type":32,"tag":57,"props":262,"children":264},{"className":263},[],[265],{"type":37,"value":266},"technical_specs",{"type":37,"value":268},"-Chunks. Das steigerte Precision@3 um 18%.",{"type":32,"tag":270,"props":271,"children":273},"h3",{"id":272},"overlap-strategie-wie-viel-ist-genug",[274],{"type":37,"value":275},"Overlap-Strategie: Wie viel ist genug?",{"type":32,"tag":33,"props":277,"children":278},{},[279],{"type":37,"value":280},"Overlap wird typischerweise auf 10-20% empfohlen, aber das ist willkürlich. Unser Test: 50 Token Overlap bei 512 Token Chunk erhält semantische Kontinuität. 100 Token Overlap steigert Retrieval-Latenz um 12%, ohne Qualitätsgewinn. Der Sweet Spot hängt vom Domain ab — teste mit deinem Eval-Set.",{"type":32,"tag":40,"props":282,"children":284},{"id":283},"eval-setup-muss-vor-production-aufgebaut-werden",[285],{"type":37,"value":286},"Eval Setup: Muss vor Production aufgebaut werden",{"type":32,"tag":33,"props":288,"children":289},{},[290],{"type":37,"value":291},"Die meisten RAG-Systeme gehen \"sieht visuell gut aus\" in Production. Ohne strukturiertes Eval-Setup für Retrieval-Qualität wirst du in den ersten 1000 Queries nicht zuverlässig sein.",{"type":32,"tag":33,"props":293,"children":294},{},[295],{"type":32,"tag":98,"props":296,"children":297},{},[298],{"type":37,"value":299},"Minimale Eval-Pipeline:",{"type":32,"tag":301,"props":302,"children":306},"pre",{"className":303,"code":304,"language":305,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","# eval_set.json — Golden Dataset\n[\n  {\n    \"query\": \"Wie kann man DSGVO-konform Benutzereinwilligung einholen?\",\n    \"expected_docs\": [\"doc_42\", \"doc_89\"],\n    \"expected_answer_contains\": [\"Cookie-Hinweis\", \"explizite Zustimmung\"]\n  },\n  ...\n]\n\n# Eval-Metriken\ndef evaluate_retrieval(query, retrieved_docs, expected_docs):\n    recall_at_k = len(set(retrieved_docs[:5]) & set(expected_docs)) \u002F len(expected_docs)\n    mrr = 1 \u002F (retrieved_docs.index(expected_docs[0]) + 1) if expected_docs[0] in retrieved_docs else 0\n    return {\"recall@5\": recall_at_k, \"mrr\": mrr}\n\ndef evaluate_generation(generated_answer, expected_contains):\n    # LLM-as-judge: Frage Claude: \"Enthält diese Antwort erwartete Inhalte?\"\n    prompt = f\"Erwartet: {expected_contains}\\nGeneriert: {generated_answer}\\nScore 0-1:\"\n    score = claude_api(prompt)\n    return float(score)\n","python",[307],{"type":32,"tag":57,"props":308,"children":309},{"__ignoreMap":16},[310,322,332,341,366,400,432,441,451,458,468,477,498,571,661,695,703,721,730,791,809],{"type":32,"tag":311,"props":312,"children":315},"span",{"class":313,"line":314},"line",1,[316],{"type":32,"tag":311,"props":317,"children":319},{"style":318},"--shiki-default:#6A737D",[320],{"type":37,"value":321},"# eval_set.json — Golden Dataset\n",{"type":32,"tag":311,"props":323,"children":325},{"class":313,"line":324},2,[326],{"type":32,"tag":311,"props":327,"children":329},{"style":328},"--shiki-default:#E1E4E8",[330],{"type":37,"value":331},"[\n",{"type":32,"tag":311,"props":333,"children":335},{"class":313,"line":334},3,[336],{"type":32,"tag":311,"props":337,"children":338},{"style":328},[339],{"type":37,"value":340},"  {\n",{"type":32,"tag":311,"props":342,"children":344},{"class":313,"line":343},4,[345,351,356,361],{"type":32,"tag":311,"props":346,"children":348},{"style":347},"--shiki-default:#9ECBFF",[349],{"type":37,"value":350},"    \"query\"",{"type":32,"tag":311,"props":352,"children":353},{"style":328},[354],{"type":37,"value":355},": ",{"type":32,"tag":311,"props":357,"children":358},{"style":347},[359],{"type":37,"value":360},"\"Wie kann man DSGVO-konform Benutzereinwilligung einholen?\"",{"type":32,"tag":311,"props":362,"children":363},{"style":328},[364],{"type":37,"value":365},",\n",{"type":32,"tag":311,"props":367,"children":369},{"class":313,"line":368},5,[370,375,380,385,390,395],{"type":32,"tag":311,"props":371,"children":372},{"style":347},[373],{"type":37,"value":374},"    \"expected_docs\"",{"type":32,"tag":311,"props":376,"children":377},{"style":328},[378],{"type":37,"value":379},": [",{"type":32,"tag":311,"props":381,"children":382},{"style":347},[383],{"type":37,"value":384},"\"doc_42\"",{"type":32,"tag":311,"props":386,"children":387},{"style":328},[388],{"type":37,"value":389},", ",{"type":32,"tag":311,"props":391,"children":392},{"style":347},[393],{"type":37,"value":394},"\"doc_89\"",{"type":32,"tag":311,"props":396,"children":397},{"style":328},[398],{"type":37,"value":399},"],\n",{"type":32,"tag":311,"props":401,"children":403},{"class":313,"line":402},6,[404,409,413,418,422,427],{"type":32,"tag":311,"props":405,"children":406},{"style":347},[407],{"type":37,"value":408},"    \"expected_answer_contains\"",{"type":32,"tag":311,"props":410,"children":411},{"style":328},[412],{"type":37,"value":379},{"type":32,"tag":311,"props":414,"children":415},{"style":347},[416],{"type":37,"value":417},"\"Cookie-Hinweis\"",{"type":32,"tag":311,"props":419,"children":420},{"style":328},[421],{"type":37,"value":389},{"type":32,"tag":311,"props":423,"children":424},{"style":347},[425],{"type":37,"value":426},"\"explizite Zustimmung\"",{"type":32,"tag":311,"props":428,"children":429},{"style":328},[430],{"type":37,"value":431},"]\n",{"type":32,"tag":311,"props":433,"children":435},{"class":313,"line":434},7,[436],{"type":32,"tag":311,"props":437,"children":438},{"style":328},[439],{"type":37,"value":440},"  },\n",{"type":32,"tag":311,"props":442,"children":444},{"class":313,"line":443},8,[445],{"type":32,"tag":311,"props":446,"children":448},{"style":447},"--shiki-default:#79B8FF",[449],{"type":37,"value":450},"  ...\n",{"type":32,"tag":311,"props":452,"children":453},{"class":313,"line":26},[454],{"type":32,"tag":311,"props":455,"children":456},{"style":328},[457],{"type":37,"value":431},{"type":32,"tag":311,"props":459,"children":461},{"class":313,"line":460},10,[462],{"type":32,"tag":311,"props":463,"children":465},{"emptyLinePlaceholder":464},true,[466],{"type":37,"value":467},"\n",{"type":32,"tag":311,"props":469,"children":471},{"class":313,"line":470},11,[472],{"type":32,"tag":311,"props":473,"children":474},{"style":318},[475],{"type":37,"value":476},"# Eval-Metriken\n",{"type":32,"tag":311,"props":478,"children":480},{"class":313,"line":479},12,[481,487,493],{"type":32,"tag":311,"props":482,"children":484},{"style":483},"--shiki-default:#F97583",[485],{"type":37,"value":486},"def",{"type":32,"tag":311,"props":488,"children":490},{"style":489},"--shiki-default:#B392F0",[491],{"type":37,"value":492}," evaluate_retrieval",{"type":32,"tag":311,"props":494,"children":495},{"style":328},[496],{"type":37,"value":497},"(query, retrieved_docs, expected_docs):\n",{"type":32,"tag":311,"props":499,"children":501},{"class":313,"line":500},13,[502,507,512,517,522,527,532,537,542,547,552,557,562,566],{"type":32,"tag":311,"props":503,"children":504},{"style":328},[505],{"type":37,"value":506},"    recall_at_k ",{"type":32,"tag":311,"props":508,"children":509},{"style":483},[510],{"type":37,"value":511},"=",{"type":32,"tag":311,"props":513,"children":514},{"style":447},[515],{"type":37,"value":516}," len",{"type":32,"tag":311,"props":518,"children":519},{"style":328},[520],{"type":37,"value":521},"(",{"type":32,"tag":311,"props":523,"children":524},{"style":447},[525],{"type":37,"value":526},"set",{"type":32,"tag":311,"props":528,"children":529},{"style":328},[530],{"type":37,"value":531},"(retrieved_docs[:",{"type":32,"tag":311,"props":533,"children":534},{"style":447},[535],{"type":37,"value":536},"5",{"type":32,"tag":311,"props":538,"children":539},{"style":328},[540],{"type":37,"value":541},"]) ",{"type":32,"tag":311,"props":543,"children":544},{"style":483},[545],{"type":37,"value":546},"&",{"type":32,"tag":311,"props":548,"children":549},{"style":447},[550],{"type":37,"value":551}," set",{"type":32,"tag":311,"props":553,"children":554},{"style":328},[555],{"type":37,"value":556},"(expected_docs)) ",{"type":32,"tag":311,"props":558,"children":559},{"style":483},[560],{"type":37,"value":561},"\u002F",{"type":32,"tag":311,"props":563,"children":564},{"style":447},[565],{"type":37,"value":516},{"type":32,"tag":311,"props":567,"children":568},{"style":328},[569],{"type":37,"value":570},"(expected_docs)\n",{"type":32,"tag":311,"props":572,"children":574},{"class":313,"line":573},14,[575,580,584,589,594,599,604,608,613,617,622,627,632,636,641,646,651,656],{"type":32,"tag":311,"props":576,"children":577},{"style":328},[578],{"type":37,"value":579},"    mrr ",{"type":32,"tag":311,"props":581,"children":582},{"style":483},[583],{"type":37,"value":511},{"type":32,"tag":311,"props":585,"children":586},{"style":447},[587],{"type":37,"value":588}," 1",{"type":32,"tag":311,"props":590,"children":591},{"style":483},[592],{"type":37,"value":593}," \u002F",{"type":32,"tag":311,"props":595,"children":596},{"style":328},[597],{"type":37,"value":598}," (retrieved_docs.index(expected_docs[",{"type":32,"tag":311,"props":600,"children":601},{"style":447},[602],{"type":37,"value":603},"0",{"type":32,"tag":311,"props":605,"children":606},{"style":328},[607],{"type":37,"value":541},{"type":32,"tag":311,"props":609,"children":610},{"style":483},[611],{"type":37,"value":612},"+",{"type":32,"tag":311,"props":614,"children":615},{"style":447},[616],{"type":37,"value":588},{"type":32,"tag":311,"props":618,"children":619},{"style":328},[620],{"type":37,"value":621},") ",{"type":32,"tag":311,"props":623,"children":624},{"style":483},[625],{"type":37,"value":626},"if",{"type":32,"tag":311,"props":628,"children":629},{"style":328},[630],{"type":37,"value":631}," expected_docs[",{"type":32,"tag":311,"props":633,"children":634},{"style":447},[635],{"type":37,"value":603},{"type":32,"tag":311,"props":637,"children":638},{"style":328},[639],{"type":37,"value":640},"] ",{"type":32,"tag":311,"props":642,"children":643},{"style":483},[644],{"type":37,"value":645},"in",{"type":32,"tag":311,"props":647,"children":648},{"style":328},[649],{"type":37,"value":650}," retrieved_docs ",{"type":32,"tag":311,"props":652,"children":653},{"style":483},[654],{"type":37,"value":655},"else",{"type":32,"tag":311,"props":657,"children":658},{"style":447},[659],{"type":37,"value":660}," 0\n",{"type":32,"tag":311,"props":662,"children":664},{"class":313,"line":663},15,[665,670,675,680,685,690],{"type":32,"tag":311,"props":666,"children":667},{"style":483},[668],{"type":37,"value":669},"    return",{"type":32,"tag":311,"props":671,"children":672},{"style":328},[673],{"type":37,"value":674}," {",{"type":32,"tag":311,"props":676,"children":677},{"style":347},[678],{"type":37,"value":679},"\"recall@5\"",{"type":32,"tag":311,"props":681,"children":682},{"style":328},[683],{"type":37,"value":684},": recall_at_k, ",{"type":32,"tag":311,"props":686,"children":687},{"style":347},[688],{"type":37,"value":689},"\"mrr\"",{"type":32,"tag":311,"props":691,"children":692},{"style":328},[693],{"type":37,"value":694},": mrr}\n",{"type":32,"tag":311,"props":696,"children":698},{"class":313,"line":697},16,[699],{"type":32,"tag":311,"props":700,"children":701},{"emptyLinePlaceholder":464},[702],{"type":37,"value":467},{"type":32,"tag":311,"props":704,"children":706},{"class":313,"line":705},17,[707,711,716],{"type":32,"tag":311,"props":708,"children":709},{"style":483},[710],{"type":37,"value":486},{"type":32,"tag":311,"props":712,"children":713},{"style":489},[714],{"type":37,"value":715}," evaluate_generation",{"type":32,"tag":311,"props":717,"children":718},{"style":328},[719],{"type":37,"value":720},"(generated_answer, expected_contains):\n",{"type":32,"tag":311,"props":722,"children":724},{"class":313,"line":723},18,[725],{"type":32,"tag":311,"props":726,"children":727},{"style":318},[728],{"type":37,"value":729},"    # LLM-as-judge: Frage Claude: \"Enthält diese Antwort erwartete Inhalte?\"\n",{"type":32,"tag":311,"props":731,"children":733},{"class":313,"line":732},19,[734,739,743,748,753,758,763,768,773,777,782,786],{"type":32,"tag":311,"props":735,"children":736},{"style":328},[737],{"type":37,"value":738},"    prompt ",{"type":32,"tag":311,"props":740,"children":741},{"style":483},[742],{"type":37,"value":511},{"type":32,"tag":311,"props":744,"children":745},{"style":483},[746],{"type":37,"value":747}," f",{"type":32,"tag":311,"props":749,"children":750},{"style":347},[751],{"type":37,"value":752},"\"Erwartet: ",{"type":32,"tag":311,"props":754,"children":755},{"style":447},[756],{"type":37,"value":757},"{",{"type":32,"tag":311,"props":759,"children":760},{"style":328},[761],{"type":37,"value":762},"expected_contains",{"type":32,"tag":311,"props":764,"children":765},{"style":447},[766],{"type":37,"value":767},"}\\n",{"type":32,"tag":311,"props":769,"children":770},{"style":347},[771],{"type":37,"value":772},"Generiert: ",{"type":32,"tag":311,"props":774,"children":775},{"style":447},[776],{"type":37,"value":757},{"type":32,"tag":311,"props":778,"children":779},{"style":328},[780],{"type":37,"value":781},"generated_answer",{"type":32,"tag":311,"props":783,"children":784},{"style":447},[785],{"type":37,"value":767},{"type":32,"tag":311,"props":787,"children":788},{"style":347},[789],{"type":37,"value":790},"Score 0-1:\"\n",{"type":32,"tag":311,"props":792,"children":794},{"class":313,"line":793},20,[795,800,804],{"type":32,"tag":311,"props":796,"children":797},{"style":328},[798],{"type":37,"value":799},"    score ",{"type":32,"tag":311,"props":801,"children":802},{"style":483},[803],{"type":37,"value":511},{"type":32,"tag":311,"props":805,"children":806},{"style":328},[807],{"type":37,"value":808}," claude_api(prompt)\n",{"type":32,"tag":311,"props":810,"children":812},{"class":313,"line":811},21,[813,817,822],{"type":32,"tag":311,"props":814,"children":815},{"style":483},[816],{"type":37,"value":669},{"type":32,"tag":311,"props":818,"children":819},{"style":447},[820],{"type":37,"value":821}," float",{"type":32,"tag":311,"props":823,"children":824},{"style":328},[825],{"type":37,"value":826},"(score)\n",{"type":32,"tag":33,"props":828,"children":829},{},[830,835],{"type":32,"tag":98,"props":831,"children":832},{},[833],{"type":37,"value":834},"Eval-Häufigkeit:",{"type":37,"value":836}," Nach jeder Embedding-Modell-Änderung, Chunking-Tweak. Im CI\u002FCD automatisch ausführen. Falls Recall@5 \u003C 0,7, Deployment blockieren.",{"type":32,"tag":33,"props":838,"children":839},{},[840],{"type":37,"value":841},"Im realen Szenario: Für einen Kunden preparierte ich 200-er Eval-Set. Die Eval-Pipeline lief bei jedem Commit. Eine Chunking-Änderung steigerte Recall@5 von 0,68 auf 0,81, aber P95-Latenz sprang von 340ms auf 520ms. Mit dem Cost\u002FLatenz-Tradeoff auf dem Dashboard sichtbar, rollten wir Chunking zurück und testeten einen anderen Ansatz. Ohne Eval wären wir blind auf diese Regression gewesen.",{"type":32,"tag":40,"props":843,"children":845},{"id":844},"hybrid-search-sparse-dense-retrieval-kombinieren",[846],{"type":37,"value":847},"Hybrid Search: Sparse + Dense Retrieval kombinieren",{"type":32,"tag":33,"props":849,"children":850},{},[851,853,858],{"type":37,"value":852},"Nur auf Vector-Ähnlichkeit zu setzen schlägt bei Edge Cases fehl. Exact-Keyword-Matches (Produktcode, API-Endpoint-Name) bekommen oft niedrige Vector-Scores. Hier kommt ",{"type":32,"tag":98,"props":854,"children":855},{},[856],{"type":37,"value":857},"Hybrid Search",{"type":37,"value":859}," ins Spiel: Kombiniere BM25 (Sparse) + Embedding (Dense) Scores.",{"type":32,"tag":301,"props":861,"children":863},{"className":303,"code":862,"language":305,"meta":16,"style":16},"# Hybrid-Retrieval-Beispiel\nbm25_results = bm25_index.search(query, top_k=20)\nvector_results = vector_db.search(query_embedding, top_k=20)\n\n# RRF (Reciprocal Rank Fusion)\ndef rrf_score(rank, k=60):\n    return 1 \u002F (k + rank)\n\ncombined_scores = {}\nfor rank, doc in enumerate(bm25_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\nfor rank, doc in enumerate(vector_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\n\nfinal_results = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:5]\n",[864],{"type":32,"tag":57,"props":865,"children":866},{"__ignoreMap":16},[867,875,912,945,952,960,991,1020,1027,1044,1071,1105,1129,1160,1167],{"type":32,"tag":311,"props":868,"children":869},{"class":313,"line":314},[870],{"type":32,"tag":311,"props":871,"children":872},{"style":318},[873],{"type":37,"value":874},"# Hybrid-Retrieval-Beispiel\n",{"type":32,"tag":311,"props":876,"children":877},{"class":313,"line":324},[878,883,887,892,898,902,907],{"type":32,"tag":311,"props":879,"children":880},{"style":328},[881],{"type":37,"value":882},"bm25_results ",{"type":32,"tag":311,"props":884,"children":885},{"style":483},[886],{"type":37,"value":511},{"type":32,"tag":311,"props":888,"children":889},{"style":328},[890],{"type":37,"value":891}," bm25_index.search(query, ",{"type":32,"tag":311,"props":893,"children":895},{"style":894},"--shiki-default:#FFAB70",[896],{"type":37,"value":897},"top_k",{"type":32,"tag":311,"props":899,"children":900},{"style":483},[901],{"type":37,"value":511},{"type":32,"tag":311,"props":903,"children":904},{"style":447},[905],{"type":37,"value":906},"20",{"type":32,"tag":311,"props":908,"children":909},{"style":328},[910],{"type":37,"value":911},")\n",{"type":32,"tag":311,"props":913,"children":914},{"class":313,"line":334},[915,920,924,929,933,937,941],{"type":32,"tag":311,"props":916,"children":917},{"style":328},[918],{"type":37,"value":919},"vector_results ",{"type":32,"tag":311,"props":921,"children":922},{"style":483},[923],{"type":37,"value":511},{"type":32,"tag":311,"props":925,"children":926},{"style":328},[927],{"type":37,"value":928}," vector_db.search(query_embedding, ",{"type":32,"tag":311,"props":930,"children":931},{"style":894},[932],{"type":37,"value":897},{"type":32,"tag":311,"props":934,"children":935},{"style":483},[936],{"type":37,"value":511},{"type":32,"tag":311,"props":938,"children":939},{"style":447},[940],{"type":37,"value":906},{"type":32,"tag":311,"props":942,"children":943},{"style":328},[944],{"type":37,"value":911},{"type":32,"tag":311,"props":946,"children":947},{"class":313,"line":343},[948],{"type":32,"tag":311,"props":949,"children":950},{"emptyLinePlaceholder":464},[951],{"type":37,"value":467},{"type":32,"tag":311,"props":953,"children":954},{"class":313,"line":368},[955],{"type":32,"tag":311,"props":956,"children":957},{"style":318},[958],{"type":37,"value":959},"# RRF (Reciprocal Rank Fusion)\n",{"type":32,"tag":311,"props":961,"children":962},{"class":313,"line":402},[963,967,972,977,981,986],{"type":32,"tag":311,"props":964,"children":965},{"style":483},[966],{"type":37,"value":486},{"type":32,"tag":311,"props":968,"children":969},{"style":489},[970],{"type":37,"value":971}," rrf_score",{"type":32,"tag":311,"props":973,"children":974},{"style":328},[975],{"type":37,"value":976},"(rank, k",{"type":32,"tag":311,"props":978,"children":979},{"style":483},[980],{"type":37,"value":511},{"type":32,"tag":311,"props":982,"children":983},{"style":447},[984],{"type":37,"value":985},"60",{"type":32,"tag":311,"props":987,"children":988},{"style":328},[989],{"type":37,"value":990},"):\n",{"type":32,"tag":311,"props":992,"children":993},{"class":313,"line":434},[994,998,1002,1006,1011,1015],{"type":32,"tag":311,"props":995,"children":996},{"style":483},[997],{"type":37,"value":669},{"type":32,"tag":311,"props":999,"children":1000},{"style":447},[1001],{"type":37,"value":588},{"type":32,"tag":311,"props":1003,"children":1004},{"style":483},[1005],{"type":37,"value":593},{"type":32,"tag":311,"props":1007,"children":1008},{"style":328},[1009],{"type":37,"value":1010}," (k ",{"type":32,"tag":311,"props":1012,"children":1013},{"style":483},[1014],{"type":37,"value":612},{"type":32,"tag":311,"props":1016,"children":1017},{"style":328},[1018],{"type":37,"value":1019}," rank)\n",{"type":32,"tag":311,"props":1021,"children":1022},{"class":313,"line":443},[1023],{"type":32,"tag":311,"props":1024,"children":1025},{"emptyLinePlaceholder":464},[1026],{"type":37,"value":467},{"type":32,"tag":311,"props":1028,"children":1029},{"class":313,"line":26},[1030,1035,1039],{"type":32,"tag":311,"props":1031,"children":1032},{"style":328},[1033],{"type":37,"value":1034},"combined_scores ",{"type":32,"tag":311,"props":1036,"children":1037},{"style":483},[1038],{"type":37,"value":511},{"type":32,"tag":311,"props":1040,"children":1041},{"style":328},[1042],{"type":37,"value":1043}," {}\n",{"type":32,"tag":311,"props":1045,"children":1046},{"class":313,"line":460},[1047,1052,1057,1061,1066],{"type":32,"tag":311,"props":1048,"children":1049},{"style":483},[1050],{"type":37,"value":1051},"for",{"type":32,"tag":311,"props":1053,"children":1054},{"style":328},[1055],{"type":37,"value":1056}," rank, doc ",{"type":32,"tag":311,"props":1058,"children":1059},{"style":483},[1060],{"type":37,"value":645},{"type":32,"tag":311,"props":1062,"children":1063},{"style":447},[1064],{"type":37,"value":1065}," enumerate",{"type":32,"tag":311,"props":1067,"children":1068},{"style":328},[1069],{"type":37,"value":1070},"(bm25_results):\n",{"type":32,"tag":311,"props":1072,"children":1073},{"class":313,"line":470},[1074,1079,1083,1088,1092,1096,1100],{"type":32,"tag":311,"props":1075,"children":1076},{"style":328},[1077],{"type":37,"value":1078},"    combined_scores[doc.id] ",{"type":32,"tag":311,"props":1080,"children":1081},{"style":483},[1082],{"type":37,"value":511},{"type":32,"tag":311,"props":1084,"children":1085},{"style":328},[1086],{"type":37,"value":1087}," combined_scores.get(doc.id, ",{"type":32,"tag":311,"props":1089,"children":1090},{"style":447},[1091],{"type":37,"value":603},{"type":32,"tag":311,"props":1093,"children":1094},{"style":328},[1095],{"type":37,"value":621},{"type":32,"tag":311,"props":1097,"children":1098},{"style":483},[1099],{"type":37,"value":612},{"type":32,"tag":311,"props":1101,"children":1102},{"style":328},[1103],{"type":37,"value":1104}," rrf_score(rank)\n",{"type":32,"tag":311,"props":1106,"children":1107},{"class":313,"line":479},[1108,1112,1116,1120,1124],{"type":32,"tag":311,"props":1109,"children":1110},{"style":483},[1111],{"type":37,"value":1051},{"type":32,"tag":311,"props":1113,"children":1114},{"style":328},[1115],{"type":37,"value":1056},{"type":32,"tag":311,"props":1117,"children":1118},{"style":483},[1119],{"type":37,"value":645},{"type":32,"tag":311,"props":1121,"children":1122},{"style":447},[1123],{"type":37,"value":1065},{"type":32,"tag":311,"props":1125,"children":1126},{"style":328},[1127],{"type":37,"value":1128},"(vector_results):\n",{"type":32,"tag":311,"props":1130,"children":1131},{"class":313,"line":500},[1132,1136,1140,1144,1148,1152,1156],{"type":32,"tag":311,"props":1133,"children":1134},{"style":328},[1135],{"type":37,"value":1078},{"type":32,"tag":311,"props":1137,"children":1138},{"style":483},[1139],{"type":37,"value":511},{"type":32,"tag":311,"props":1141,"children":1142},{"style":328},[1143],{"type":37,"value":1087},{"type":32,"tag":311,"props":1145,"children":1146},{"style":447},[1147],{"type":37,"value":603},{"type":32,"tag":311,"props":1149,"children":1150},{"style":328},[1151],{"type":37,"value":621},{"type":32,"tag":311,"props":1153,"children":1154},{"style":483},[1155],{"type":37,"value":612},{"type":32,"tag":311,"props":1157,"children":1158},{"style":328},[1159],{"type":37,"value":1104},{"type":32,"tag":311,"props":1161,"children":1162},{"class":313,"line":573},[1163],{"type":32,"tag":311,"props":1164,"children":1165},{"emptyLinePlaceholder":464},[1166],{"type":37,"value":467},{"type":32,"tag":311,"props":1168,"children":1169},{"class":313,"line":663},[1170,1175,1179,1184,1189,1194,1199,1204,1209,1214,1219,1223,1228,1233,1237],{"type":32,"tag":311,"props":1171,"children":1172},{"style":328},[1173],{"type":37,"value":1174},"final_results ",{"type":32,"tag":311,"props":1176,"children":1177},{"style":483},[1178],{"type":37,"value":511},{"type":32,"tag":311,"props":1180,"children":1181},{"style":447},[1182],{"type":37,"value":1183}," sorted",{"type":32,"tag":311,"props":1185,"children":1186},{"style":328},[1187],{"type":37,"value":1188},"(combined_scores.items(), ",{"type":32,"tag":311,"props":1190,"children":1191},{"style":894},[1192],{"type":37,"value":1193},"key",{"type":32,"tag":311,"props":1195,"children":1196},{"style":483},[1197],{"type":37,"value":1198},"=lambda",{"type":32,"tag":311,"props":1200,"children":1201},{"style":328},[1202],{"type":37,"value":1203}," x: x[",{"type":32,"tag":311,"props":1205,"children":1206},{"style":447},[1207],{"type":37,"value":1208},"1",{"type":32,"tag":311,"props":1210,"children":1211},{"style":328},[1212],{"type":37,"value":1213},"], ",{"type":32,"tag":311,"props":1215,"children":1216},{"style":894},[1217],{"type":37,"value":1218},"reverse",{"type":32,"tag":311,"props":1220,"children":1221},{"style":483},[1222],{"type":37,"value":511},{"type":32,"tag":311,"props":1224,"children":1225},{"style":447},[1226],{"type":37,"value":1227},"True",{"type":32,"tag":311,"props":1229,"children":1230},{"style":328},[1231],{"type":37,"value":1232},")[:",{"type":32,"tag":311,"props":1234,"children":1235},{"style":447},[1236],{"type":37,"value":536},{"type":32,"tag":311,"props":1238,"children":1239},{"style":328},[1240],{"type":37,"value":431},{"type":32,"tag":33,"props":1242,"children":1243},{},[1244],{"type":37,"value":1245},"Test-Ergebnis: Hybrid Search steigerte Recall@5 bei technischen Queries um 22%. Aber Latenz verdoppelte sich, weil zwei separate Indizes abgefragt werden. Falls dieser Tradeoff akzeptabel ist (z.B. internes Tool, \u003C500ms ausreichend), funktioniert Hybrid Search in Production.",{"type":32,"tag":40,"props":1247,"children":1249},{"id":1248},"reranking-zweite-filterbühne",[1250],{"type":37,"value":1251},"Reranking: Zweite Filterbühne",{"type":32,"tag":33,"props":1253,"children":1254},{},[1255,1257,1262],{"type":37,"value":1256},"Die erste Retrieval-Phase (BM25 + Vector) holt 20-50 Dokumente. Aber nicht alle passen in den LLM-Context (Cost + Token-Limit). Ein ",{"type":32,"tag":98,"props":1258,"children":1259},{},[1260],{"type":37,"value":1261},"Reranker-Modell",{"type":37,"value":1263}," kommt ins Spiel: Es bewertet die Relevanz jedes Dokuments zur Query neu und wählt Top-5.",{"type":32,"tag":33,"props":1265,"children":1266},{},[1267,1269,1275,1277,1283],{"type":37,"value":1268},"Modelle wie Cohere ",{"type":32,"tag":57,"props":1270,"children":1272},{"className":1271},[],[1273],{"type":37,"value":1274},"rerank-english-v2.0",{"type":37,"value":1276}," oder ",{"type":32,"tag":57,"props":1278,"children":1280},{"className":1279},[],[1281],{"type":37,"value":1282},"bge-reranker-large",{"type":37,"value":1284}," werden genutzt. Reranker nutzen Cross-Encoder-Architektur — sie encodieren Query + Dokument zusammen, deshalb sind sie teurer als Embeddings, aber genauer.",{"type":32,"tag":33,"props":1286,"children":1287},{},[1288],{"type":37,"value":1289},"Benchmark: Beim Reranking über 50 Dokumente:",{"type":32,"tag":156,"props":1291,"children":1292},{},[1293,1298,1303],{"type":32,"tag":108,"props":1294,"children":1295},{},[1296],{"type":37,"value":1297},"Recall@5: 0,73 → 0,89",{"type":32,"tag":108,"props":1299,"children":1300},{},[1301],{"type":37,"value":1302},"Latenz: +180ms (akzeptabel)",{"type":32,"tag":108,"props":1304,"children":1305},{},[1306],{"type":37,"value":1307},"Kosten: +$0,002 pro Retrieval (Cohere API)",{"type":32,"tag":33,"props":1309,"children":1310},{},[1311],{"type":37,"value":1312},"Wenn Budget eng ist, nutze selbstgehostete Reranker, benötigst aber GPU-Inference. Hier rechnest du Infrastructure-Kosten vs API-Kosten.",{"type":32,"tag":40,"props":1314,"children":1316},{"id":1315},"context-fenster-optimieren-weniger-dokumente-bessere-antworten",[1317],{"type":37,"value":1318},"Context-Fenster optimieren: Weniger Dokumente, bessere Antworten",{"type":32,"tag":33,"props":1320,"children":1321},{},[1322],{"type":37,"value":1323},"20 Dokumente an LLM zu geben erzeugt nicht immer bessere Antworten. Großer Context führt zu \"Lost in the Middle\" — das Modell ignoriert mittlere Informationen. Test-Ergebnis: GPT-4 Turbo mit 5 Dokumenten produziert bessere Antworten als mit 15 (BLEU-Score 11% Unterschied).",{"type":32,"tag":33,"props":1325,"children":1326},{},[1327],{"type":32,"tag":98,"props":1328,"children":1329},{},[1330],{"type":37,"value":1331},"Optimierungs-Strategie:",{"type":32,"tag":104,"props":1333,"children":1334},{},[1335,1340,1345],{"type":32,"tag":108,"props":1336,"children":1337},{},[1338],{"type":37,"value":1339},"Mit Reranker Top-5 wählen",{"type":32,"tag":108,"props":1341,"children":1342},{},[1343],{"type":37,"value":1344},"Dokumente mit Relevance-Score \u003C 0,6 ausfiltern",{"type":32,"tag":108,"props":1346,"children":1347},{},[1348],{"type":37,"value":1349},"Verbleibende 3-5 Dokumente ins LLM-Context schicken",{"type":32,"tag":33,"props":1351,"children":1352},{},[1353],{"type":37,"value":1354},"Dieser Ansatz senkt Token-Kosten (Input-Tokens um 70% weniger) und steigert Antwortqualität. In Production musst du den Sweet Spot im Cost\u002FLatenz\u002FQuality-Dreieck finden — Eval-Pipeline macht das sichtbar.",{"type":32,"tag":40,"props":1356,"children":1358},{"id":1357},"production-monitoring-retrieval-drift",[1359],{"type":37,"value":1360},"Production-Monitoring: Retrieval-Drift",{"type":32,"tag":33,"props":1362,"children":1363},{},[1364,1366,1371],{"type":37,"value":1365},"Retrieval-Qualität kann mit der Zeit sinken — neue Dokumente, sich ändernde Query-Verteilung. ",{"type":32,"tag":98,"props":1367,"children":1368},{},[1369],{"type":37,"value":1370},"Retrieval-Drift",{"type":37,"value":1372}," muss mit einem Dashboard überwacht werden:",{"type":32,"tag":1374,"props":1375,"children":1376},"table",{},[1377,1401],{"type":32,"tag":1378,"props":1379,"children":1380},"thead",{},[1381],{"type":32,"tag":1382,"props":1383,"children":1384},"tr",{},[1385,1391,1396],{"type":32,"tag":1386,"props":1387,"children":1388},"th",{},[1389],{"type":37,"value":1390},"Metrik",{"type":32,"tag":1386,"props":1392,"children":1393},{},[1394],{"type":37,"value":1395},"Ziel",{"type":32,"tag":1386,"props":1397,"children":1398},{},[1399],{"type":37,"value":1400},"Alarm-Schwelle",{"type":32,"tag":1402,"props":1403,"children":1404},"tbody",{},[1405,1424,1442,1460],{"type":32,"tag":1382,"props":1406,"children":1407},{},[1408,1414,1419],{"type":32,"tag":1409,"props":1410,"children":1411},"td",{},[1412],{"type":37,"value":1413},"Recall@5 (wöchentlich Eval)",{"type":32,"tag":1409,"props":1415,"children":1416},{},[1417],{"type":37,"value":1418},"> 0,75",{"type":32,"tag":1409,"props":1420,"children":1421},{},[1422],{"type":37,"value":1423},"\u003C 0,70",{"type":32,"tag":1382,"props":1425,"children":1426},{},[1427,1432,1437],{"type":32,"tag":1409,"props":1428,"children":1429},{},[1430],{"type":37,"value":1431},"P95 Latenz",{"type":32,"tag":1409,"props":1433,"children":1434},{},[1435],{"type":37,"value":1436},"\u003C 400ms",{"type":32,"tag":1409,"props":1438,"children":1439},{},[1440],{"type":37,"value":1441},"> 600ms",{"type":32,"tag":1382,"props":1443,"children":1444},{},[1445,1450,1455],{"type":32,"tag":1409,"props":1446,"children":1447},{},[1448],{"type":37,"value":1449},"Null-Result-Queries (%)",{"type":32,"tag":1409,"props":1451,"children":1452},{},[1453],{"type":37,"value":1454},"\u003C 5%",{"type":32,"tag":1409,"props":1456,"children":1457},{},[1458],{"type":37,"value":1459},"> 10%",{"type":32,"tag":1382,"props":1461,"children":1462},{},[1463,1468,1473],{"type":32,"tag":1409,"props":1464,"children":1465},{},[1466],{"type":37,"value":1467},"Durchschn. Relevance-Score",{"type":32,"tag":1409,"props":1469,"children":1470},{},[1471],{"type":37,"value":1472},"> 0,65",{"type":32,"tag":1409,"props":1474,"children":1475},{},[1476],{"type":37,"value":1477},"\u003C 0,55",{"type":32,"tag":33,"props":1479,"children":1480},{},[1481],{"type":37,"value":1482},"Falls Recall-Drift auftritt:",{"type":32,"tag":104,"props":1484,"children":1485},{},[1486,1491,1496],{"type":32,"tag":108,"props":1487,"children":1488},{},[1489],{"type":37,"value":1490},"Eval-Set aktualisieren (neue Query-Patterns hinzufügen)",{"type":32,"tag":108,"props":1492,"children":1493},{},[1494],{"type":37,"value":1495},"Embedding-Modell fine-tunen oder ersetzen",{"type":32,"tag":108,"props":1497,"children":1498},{},[1499],{"type":37,"value":1500},"Chunking-Strategie überprüfen",{"type":32,"tag":33,"props":1502,"children":1503},{},[1504,1506,1513],{"type":37,"value":1505},"Dieses Monitoring fällt unter ",{"type":32,"tag":129,"props":1507,"children":1510},{"href":1508,"rel":1509},"https:\u002F\u002Fwww.roibase.com.tr\u002Fde\u002Ffirstparty",[133],[1511],{"type":37,"value":1512},"First-Party-Daten & Measurement-Architektur",{"type":37,"value":1514}," — RAG ist auch eine Data-Pipeline und muss observable sein.",{"type":32,"tag":40,"props":1516,"children":1518},{"id":1517},"cost-vs-quality-tradeoff-pragmatische-entscheidungen",[1519],{"type":37,"value":1520},"Cost vs Quality Tradeoff: Pragmatische Entscheidungen",{"type":32,"tag":33,"props":1522,"children":1523},{},[1524],{"type":37,"value":1525},"Jede Production-RAG-Entscheidung enthält einen Cost\u002FQuality\u002FLatenz-Tradeoff. Einige pragmatische Wähle:",{"type":32,"tag":156,"props":1527,"children":1528},{},[1529,1539,1549,1559],{"type":32,"tag":108,"props":1530,"children":1531},{},[1532,1537],{"type":32,"tag":98,"props":1533,"children":1534},{},[1535],{"type":37,"value":1536},"Embedding-Modell:",{"type":37,"value":1538}," OpenAI 3-large durch Cohere v3 ersetzen → 30% Kostenersparnis, 2% Quality-Verlust (akzeptabel)",{"type":32,"tag":108,"props":1540,"children":1541},{},[1542,1547],{"type":32,"tag":98,"props":1543,"children":1544},{},[1545],{"type":37,"value":1546},"Reranking:",{"type":37,"value":1548}," Nicht jeden Query reranken, nur ambige → Latenz 40% weniger",{"type":32,"tag":108,"props":1550,"children":1551},{},[1552,1557],{"type":32,"tag":98,"props":1553,"children":1554},{},[1555],{"type":37,"value":1556},"Hybrid Search:",{"type":37,"value":1558}," Nur Vector statt BM25 + Vector (wenn Exact-Match unwichtig) → Latenz 50% weniger",{"type":32,"tag":108,"props":1560,"children":1561},{},[1562,1567],{"type":32,"tag":98,"props":1563,"children":1564},{},[1565],{"type":37,"value":1566},"Context-Fenster:",{"type":37,"value":1568}," 10 statt 5 Dokumente → 60% Token-Kosten weniger, 8% Quality-Gewinn",{"type":32,"tag":33,"props":1570,"children":1571},{},[1572],{"type":37,"value":1573},"Ohne Eval-Pipeline siehst du diese Tradeoffs nicht. Sonst sagst du \"Embedding-Modell geändert, ist billiger\" und merkst nicht, dass Retrieval-Qualität um 15% sank.",{"type":32,"tag":33,"props":1575,"children":1576},{},[1577],{"type":37,"value":1578},"Bevor du dein RAG-System in Production verschiebst: Nimm Embedding-Modell, Chunking-Strategie und Eval-Setup ernst. Cost-Optimierung kommt später — erst Retrieval-Qualität stabilisieren, dann Kosten senken. Sonst wird die Unzuverlässigkeit sichtbar und Adoption sinkt.",{"type":32,"tag":1580,"props":1581,"children":1582},"style",{},[1583],{"type":37,"value":1584},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":334,"depth":334,"links":1586},[1587,1588,1591,1592,1593,1594,1595,1596],{"id":42,"depth":324,"text":45},{"id":141,"depth":324,"text":144,"children":1589},[1590],{"id":272,"depth":334,"text":275},{"id":283,"depth":324,"text":286},{"id":844,"depth":324,"text":847},{"id":1248,"depth":324,"text":1251},{"id":1315,"depth":324,"text":1318},{"id":1357,"depth":324,"text":1360},{"id":1517,"depth":324,"text":1520},"markdown","content:de:ai:rag-production-retrieval-kalitesi.md","content","de\u002Fai\u002Frag-production-retrieval-kalitesi.md","de\u002Fai\u002Frag-production-retrieval-kalitesi","md",1778709808334]