[{"data":1,"prerenderedAt":1603},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost":13},{"i18nKey":4,"paths":5},"ai-003-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fen\u002Fai\u002Frag-retrieval-quality-over-cost","\u002Fes\u002Fai\u002Frag-en-produccion-calidad-de-recuperacion-antes-que-costo","\u002Ffr\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fit\u002Fai\u002Frag-production-retrieval-kalitesi-once-gelir","\u002Fru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost","\u002Ftr\u002Fai\u002Fproductionda-rag-retrieval-kalitesi-costtan-once-gelir",{"_path":11,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1597,"_id":1598,"_source":1599,"_file":1600,"_stem":1601,"_extension":1602},"ai",false,"","Production RAG: Retrieval Quality Comes Before Cost","Wrong embedding model, chunking strategy, and eval setup choice makes RAG either expensive or slow—or both. What to focus on in production?","2026-05-11",[21,22,23,24,25],"rag","embedding","chunking","llm-eval","retrieval-quality",8,"Roibase",{"type":29,"children":30,"toc":1585},"root",[31,39,46,51,73,94,103,123,139,145,150,155,174,218,223,256,269,276,281,287,292,300,827,837,842,848,860,1241,1246,1252,1264,1285,1290,1308,1313,1319,1324,1332,1350,1355,1361,1373,1478,1483,1501,1515,1521,1526,1569,1574,1579],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","RAG systems have become widespread in production since 2024. Companies are building embedding + vector DB stacks to feed their own document corpus to LLMs. But most pilot projects hit the same wall: retrieval quality is low, answers are inconsistent, costs spiral. The problem usually comes down to hasty decisions on embedding model choice, chunking strategy, and eval setup. This post shows which decisions have no undo button before shipping your RAG pipeline to production.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"embedding-model-alignment-over-dimensionality",[44],{"type":37,"value":45},"Embedding Model: Alignment Over Dimensionality",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"The first instinct when choosing an embedding model is \"which has the highest MTEB score.\" But benchmark rankings don't guarantee production performance. What matters is how well the model aligns with your document type and query pattern.",{"type":32,"tag":33,"props":52,"children":53},{},[54,56,63,65,71],{"type":37,"value":55},"When we compared OpenAI's ",{"type":32,"tag":57,"props":58,"children":60},"code",{"className":59},[],[61],{"type":37,"value":62},"text-embedding-3-large",{"type":37,"value":64}," (3072 dim) against Cohere's ",{"type":32,"tag":57,"props":66,"children":68},{"className":67},[],[69],{"type":37,"value":70},"embed-v3",{"type":37,"value":72}," (1024 dim), Cohere delivered more consistent recall@10 on marketing documents (blogs, case studies, landing pages) because its training set was heavy on business content. OpenAI's larger size scored well on general benchmarks, but the distribution of domain-specific queries was different.",{"type":32,"tag":33,"props":74,"children":75},{},[76,78,84,86,92],{"type":37,"value":77},"Another example: ",{"type":32,"tag":57,"props":79,"children":81},{"className":80},[],[82],{"type":37,"value":83},"bge-large-en-v1.5",{"type":37,"value":85}," (1024 dim, self-hosted) is sufficient for legal documents. But on a multilingual corpus, ",{"type":32,"tag":57,"props":87,"children":89},{"className":88},[],[90],{"type":37,"value":91},"multilingual-e5-large",{"type":37,"value":93}," (1024 dim) clearly outperforms. Model size isn't always a quality signal—training data overlap with your domain is more critical.",{"type":32,"tag":33,"props":95,"children":96},{},[97],{"type":32,"tag":98,"props":99,"children":100},"strong",{},[101],{"type":37,"value":102},"Selection criteria:",{"type":32,"tag":104,"props":105,"children":106},"ol",{},[107,113,118],{"type":32,"tag":108,"props":109,"children":110},"li",{},[111],{"type":37,"value":112},"Not MTEB score—recall@5 \u002F MRR metric on your own eval set",{"type":32,"tag":108,"props":114,"children":115},{},[116],{"type":37,"value":117},"Latency (self-hosted vs API)—batch embedding time for 512 documents",{"type":32,"tag":108,"props":119,"children":120},{},[121],{"type":37,"value":122},"Cost per 1M tokens—OpenAI 3-large $0.13, Cohere v3 $0.10, self-hosted $0 but infrastructure exists",{"type":32,"tag":33,"props":124,"children":125},{},[126,128,137],{"type":37,"value":127},"If your document set has domain-specific jargon (pharma, finance, legal), fine-tuning an embedding model or sentence transformers on your own data lifts retrieval quality by 15–20%. This falls under ",{"type":32,"tag":129,"props":130,"children":134},"a",{"href":131,"rel":132},"https:\u002F\u002Fwww.roibase.com.tr\u002Fru\u002Fverianalizi",[133],"nofollow",[135],{"type":37,"value":136},"data analytics & insight engineering",{"type":37,"value":138},"—you need a training pipeline and data quality observation.",{"type":32,"tag":40,"props":140,"children":142},{"id":141},"chunking-strategy-fixed-size-doesnt-scale",[143],{"type":37,"value":144},"Chunking Strategy: Fixed Size Doesn't Scale",{"type":32,"tag":33,"props":146,"children":147},{},[148],{"type":37,"value":149},"Most RAG implementations start with \"512 token overlapping window\" as default. This barely works for markdown blogs but breaks immediately on mixed-format corpus (PDF, HTML, JSON).",{"type":32,"tag":33,"props":151,"children":152},{},[153],{"type":37,"value":154},"Problems with fixed-size chunking:",{"type":32,"tag":156,"props":157,"children":158},"ul",{},[159,164,169],{"type":32,"tag":108,"props":160,"children":161},{},[162],{"type":37,"value":163},"Headers get split, semantic integrity lost",{"type":32,"tag":108,"props":165,"children":166},{},[167],{"type":37,"value":168},"Tables, code blocks cut in half",{"type":32,"tag":108,"props":170,"children":171},{},[172],{"type":37,"value":173},"Overlap strategy duplicates overlapping context, retrieval noise increases",{"type":32,"tag":33,"props":175,"children":176},{},[177,179,184,186,192,194,200,202,208,210,216],{"type":37,"value":178},"Alternative: ",{"type":32,"tag":98,"props":180,"children":181},{},[182],{"type":37,"value":183},"semantic chunking",{"type":37,"value":185},". Split on sentence boundaries and heading hierarchy to preserve semantic units. Use ",{"type":32,"tag":57,"props":187,"children":189},{"className":188},[],[190],{"type":37,"value":191},"MarkdownTextSplitter",{"type":37,"value":193}," instead of ",{"type":32,"tag":57,"props":195,"children":197},{"className":196},[],[198],{"type":37,"value":199},"langchain",{"type":37,"value":201},"'s ",{"type":32,"tag":57,"props":203,"children":205},{"className":204},[],[206],{"type":37,"value":207},"RecursiveCharacterTextSplitter",{"type":37,"value":209},". Parse PDFs with ",{"type":32,"tag":57,"props":211,"children":213},{"className":212},[],[214],{"type":37,"value":215},"pdfplumber",{"type":37,"value":217}," to separate tables from text and apply different strategies to each.",{"type":32,"tag":33,"props":219,"children":220},{},[221],{"type":37,"value":222},"For an e-commerce company's RAG stack, we split product documentation into three chunk types:",{"type":32,"tag":156,"props":224,"children":225},{},[226,236,246],{"type":32,"tag":108,"props":227,"children":228},{},[229,234],{"type":32,"tag":98,"props":230,"children":231},{},[232],{"type":37,"value":233},"Title + short description:",{"type":37,"value":235}," 128 tokens, lightweight for retrieval",{"type":32,"tag":108,"props":237,"children":238},{},[239,244],{"type":32,"tag":98,"props":240,"children":241},{},[242],{"type":37,"value":243},"Technical specs + table:",{"type":37,"value":245}," 256 tokens, structured data",{"type":32,"tag":108,"props":247,"children":248},{},[249,254],{"type":32,"tag":98,"props":250,"children":251},{},[252],{"type":37,"value":253},"Long-form content (blog, guide):",{"type":37,"value":255}," 512 tokens, semantic split",{"type":32,"tag":33,"props":257,"children":258},{},[259,261,267],{"type":37,"value":260},"We added metadata to each chunk (chunk_type, source_page). During retrieval, we filtered by chunk_type based on query type. For example, \"product comparison\" queries only looked at ",{"type":32,"tag":57,"props":262,"children":264},{"className":263},[],[265],{"type":37,"value":266},"technical_specs",{"type":37,"value":268}," chunks. This lifted precision@3 by 18%.",{"type":32,"tag":270,"props":271,"children":273},"h3",{"id":272},"overlap-strategy-how-much-is-enough",[274],{"type":37,"value":275},"Overlap Strategy: How Much Is Enough?",{"type":32,"tag":33,"props":277,"children":278},{},[279],{"type":37,"value":280},"Overlap is usually recommended at 10–20% but that's arbitrary. Our test: 50-token overlap on 512-token chunks preserves semantic continuity. 100-token overlap bumped retrieval latency 12% with no quality gain. The sweet spot varies by domain—test on your eval set.",{"type":32,"tag":40,"props":282,"children":284},{"id":283},"eval-setup-must-exist-before-production",[285],{"type":37,"value":286},"Eval Setup: Must Exist Before Production",{"type":32,"tag":33,"props":288,"children":289},{},[290],{"type":37,"value":291},"Most RAG systems pass to production on a \"looks good visually\" test. But without a structured eval setup to measure retrieval quality, the system won't be trustworthy after the first 1,000 queries.",{"type":32,"tag":33,"props":293,"children":294},{},[295],{"type":32,"tag":98,"props":296,"children":297},{},[298],{"type":37,"value":299},"Minimal eval pipeline:",{"type":32,"tag":301,"props":302,"children":306},"pre",{"className":303,"code":304,"language":305,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","# eval_set.json — golden dataset\n[\n  {\n    \"query\": \"How to collect user consent in GDPR-compliant way?\",\n    \"expected_docs\": [\"doc_42\", \"doc_89\"],\n    \"expected_answer_contains\": [\"cookie notice\", \"explicit consent\"]\n  },\n  ...\n]\n\n# eval metrics\ndef evaluate_retrieval(query, retrieved_docs, expected_docs):\n    recall_at_k = len(set(retrieved_docs[:5]) & set(expected_docs)) \u002F len(expected_docs)\n    mrr = 1 \u002F (retrieved_docs.index(expected_docs[0]) + 1) if expected_docs[0] in retrieved_docs else 0\n    return {\"recall@5\": recall_at_k, \"mrr\": mrr}\n\ndef evaluate_generation(generated_answer, expected_contains):\n    # LLM-as-judge: ask Claude \"does this answer cover the expected content?\"\n    prompt = f\"Expected: {expected_contains}\\nGenerated: {generated_answer}\\nScore 0-1:\"\n    score = claude_api(prompt)\n    return float(score)\n","python",[307],{"type":32,"tag":57,"props":308,"children":309},{"__ignoreMap":16},[310,322,332,341,366,400,432,441,450,458,468,477,498,571,661,695,703,721,730,791,809],{"type":32,"tag":311,"props":312,"children":315},"span",{"class":313,"line":314},"line",1,[316],{"type":32,"tag":311,"props":317,"children":319},{"style":318},"--shiki-default:#6A737D",[320],{"type":37,"value":321},"# eval_set.json — golden dataset\n",{"type":32,"tag":311,"props":323,"children":325},{"class":313,"line":324},2,[326],{"type":32,"tag":311,"props":327,"children":329},{"style":328},"--shiki-default:#E1E4E8",[330],{"type":37,"value":331},"[\n",{"type":32,"tag":311,"props":333,"children":335},{"class":313,"line":334},3,[336],{"type":32,"tag":311,"props":337,"children":338},{"style":328},[339],{"type":37,"value":340},"  {\n",{"type":32,"tag":311,"props":342,"children":344},{"class":313,"line":343},4,[345,351,356,361],{"type":32,"tag":311,"props":346,"children":348},{"style":347},"--shiki-default:#9ECBFF",[349],{"type":37,"value":350},"    \"query\"",{"type":32,"tag":311,"props":352,"children":353},{"style":328},[354],{"type":37,"value":355},": ",{"type":32,"tag":311,"props":357,"children":358},{"style":347},[359],{"type":37,"value":360},"\"How to collect user consent in GDPR-compliant way?\"",{"type":32,"tag":311,"props":362,"children":363},{"style":328},[364],{"type":37,"value":365},",\n",{"type":32,"tag":311,"props":367,"children":369},{"class":313,"line":368},5,[370,375,380,385,390,395],{"type":32,"tag":311,"props":371,"children":372},{"style":347},[373],{"type":37,"value":374},"    \"expected_docs\"",{"type":32,"tag":311,"props":376,"children":377},{"style":328},[378],{"type":37,"value":379},": [",{"type":32,"tag":311,"props":381,"children":382},{"style":347},[383],{"type":37,"value":384},"\"doc_42\"",{"type":32,"tag":311,"props":386,"children":387},{"style":328},[388],{"type":37,"value":389},", ",{"type":32,"tag":311,"props":391,"children":392},{"style":347},[393],{"type":37,"value":394},"\"doc_89\"",{"type":32,"tag":311,"props":396,"children":397},{"style":328},[398],{"type":37,"value":399},"],\n",{"type":32,"tag":311,"props":401,"children":403},{"class":313,"line":402},6,[404,409,413,418,422,427],{"type":32,"tag":311,"props":405,"children":406},{"style":347},[407],{"type":37,"value":408},"    \"expected_answer_contains\"",{"type":32,"tag":311,"props":410,"children":411},{"style":328},[412],{"type":37,"value":379},{"type":32,"tag":311,"props":414,"children":415},{"style":347},[416],{"type":37,"value":417},"\"cookie notice\"",{"type":32,"tag":311,"props":419,"children":420},{"style":328},[421],{"type":37,"value":389},{"type":32,"tag":311,"props":423,"children":424},{"style":347},[425],{"type":37,"value":426},"\"explicit consent\"",{"type":32,"tag":311,"props":428,"children":429},{"style":328},[430],{"type":37,"value":431},"]\n",{"type":32,"tag":311,"props":433,"children":435},{"class":313,"line":434},7,[436],{"type":32,"tag":311,"props":437,"children":438},{"style":328},[439],{"type":37,"value":440},"  },\n",{"type":32,"tag":311,"props":442,"children":443},{"class":313,"line":26},[444],{"type":32,"tag":311,"props":445,"children":447},{"style":446},"--shiki-default:#79B8FF",[448],{"type":37,"value":449},"  ...\n",{"type":32,"tag":311,"props":451,"children":453},{"class":313,"line":452},9,[454],{"type":32,"tag":311,"props":455,"children":456},{"style":328},[457],{"type":37,"value":431},{"type":32,"tag":311,"props":459,"children":461},{"class":313,"line":460},10,[462],{"type":32,"tag":311,"props":463,"children":465},{"emptyLinePlaceholder":464},true,[466],{"type":37,"value":467},"\n",{"type":32,"tag":311,"props":469,"children":471},{"class":313,"line":470},11,[472],{"type":32,"tag":311,"props":473,"children":474},{"style":318},[475],{"type":37,"value":476},"# eval metrics\n",{"type":32,"tag":311,"props":478,"children":480},{"class":313,"line":479},12,[481,487,493],{"type":32,"tag":311,"props":482,"children":484},{"style":483},"--shiki-default:#F97583",[485],{"type":37,"value":486},"def",{"type":32,"tag":311,"props":488,"children":490},{"style":489},"--shiki-default:#B392F0",[491],{"type":37,"value":492}," evaluate_retrieval",{"type":32,"tag":311,"props":494,"children":495},{"style":328},[496],{"type":37,"value":497},"(query, retrieved_docs, expected_docs):\n",{"type":32,"tag":311,"props":499,"children":501},{"class":313,"line":500},13,[502,507,512,517,522,527,532,537,542,547,552,557,562,566],{"type":32,"tag":311,"props":503,"children":504},{"style":328},[505],{"type":37,"value":506},"    recall_at_k ",{"type":32,"tag":311,"props":508,"children":509},{"style":483},[510],{"type":37,"value":511},"=",{"type":32,"tag":311,"props":513,"children":514},{"style":446},[515],{"type":37,"value":516}," len",{"type":32,"tag":311,"props":518,"children":519},{"style":328},[520],{"type":37,"value":521},"(",{"type":32,"tag":311,"props":523,"children":524},{"style":446},[525],{"type":37,"value":526},"set",{"type":32,"tag":311,"props":528,"children":529},{"style":328},[530],{"type":37,"value":531},"(retrieved_docs[:",{"type":32,"tag":311,"props":533,"children":534},{"style":446},[535],{"type":37,"value":536},"5",{"type":32,"tag":311,"props":538,"children":539},{"style":328},[540],{"type":37,"value":541},"]) ",{"type":32,"tag":311,"props":543,"children":544},{"style":483},[545],{"type":37,"value":546},"&",{"type":32,"tag":311,"props":548,"children":549},{"style":446},[550],{"type":37,"value":551}," set",{"type":32,"tag":311,"props":553,"children":554},{"style":328},[555],{"type":37,"value":556},"(expected_docs)) ",{"type":32,"tag":311,"props":558,"children":559},{"style":483},[560],{"type":37,"value":561},"\u002F",{"type":32,"tag":311,"props":563,"children":564},{"style":446},[565],{"type":37,"value":516},{"type":32,"tag":311,"props":567,"children":568},{"style":328},[569],{"type":37,"value":570},"(expected_docs)\n",{"type":32,"tag":311,"props":572,"children":574},{"class":313,"line":573},14,[575,580,584,589,594,599,604,608,613,617,622,627,632,636,641,646,651,656],{"type":32,"tag":311,"props":576,"children":577},{"style":328},[578],{"type":37,"value":579},"    mrr ",{"type":32,"tag":311,"props":581,"children":582},{"style":483},[583],{"type":37,"value":511},{"type":32,"tag":311,"props":585,"children":586},{"style":446},[587],{"type":37,"value":588}," 1",{"type":32,"tag":311,"props":590,"children":591},{"style":483},[592],{"type":37,"value":593}," \u002F",{"type":32,"tag":311,"props":595,"children":596},{"style":328},[597],{"type":37,"value":598}," (retrieved_docs.index(expected_docs[",{"type":32,"tag":311,"props":600,"children":601},{"style":446},[602],{"type":37,"value":603},"0",{"type":32,"tag":311,"props":605,"children":606},{"style":328},[607],{"type":37,"value":541},{"type":32,"tag":311,"props":609,"children":610},{"style":483},[611],{"type":37,"value":612},"+",{"type":32,"tag":311,"props":614,"children":615},{"style":446},[616],{"type":37,"value":588},{"type":32,"tag":311,"props":618,"children":619},{"style":328},[620],{"type":37,"value":621},") ",{"type":32,"tag":311,"props":623,"children":624},{"style":483},[625],{"type":37,"value":626},"if",{"type":32,"tag":311,"props":628,"children":629},{"style":328},[630],{"type":37,"value":631}," expected_docs[",{"type":32,"tag":311,"props":633,"children":634},{"style":446},[635],{"type":37,"value":603},{"type":32,"tag":311,"props":637,"children":638},{"style":328},[639],{"type":37,"value":640},"] ",{"type":32,"tag":311,"props":642,"children":643},{"style":483},[644],{"type":37,"value":645},"in",{"type":32,"tag":311,"props":647,"children":648},{"style":328},[649],{"type":37,"value":650}," retrieved_docs ",{"type":32,"tag":311,"props":652,"children":653},{"style":483},[654],{"type":37,"value":655},"else",{"type":32,"tag":311,"props":657,"children":658},{"style":446},[659],{"type":37,"value":660}," 0\n",{"type":32,"tag":311,"props":662,"children":664},{"class":313,"line":663},15,[665,670,675,680,685,690],{"type":32,"tag":311,"props":666,"children":667},{"style":483},[668],{"type":37,"value":669},"    return",{"type":32,"tag":311,"props":671,"children":672},{"style":328},[673],{"type":37,"value":674}," {",{"type":32,"tag":311,"props":676,"children":677},{"style":347},[678],{"type":37,"value":679},"\"recall@5\"",{"type":32,"tag":311,"props":681,"children":682},{"style":328},[683],{"type":37,"value":684},": recall_at_k, ",{"type":32,"tag":311,"props":686,"children":687},{"style":347},[688],{"type":37,"value":689},"\"mrr\"",{"type":32,"tag":311,"props":691,"children":692},{"style":328},[693],{"type":37,"value":694},": mrr}\n",{"type":32,"tag":311,"props":696,"children":698},{"class":313,"line":697},16,[699],{"type":32,"tag":311,"props":700,"children":701},{"emptyLinePlaceholder":464},[702],{"type":37,"value":467},{"type":32,"tag":311,"props":704,"children":706},{"class":313,"line":705},17,[707,711,716],{"type":32,"tag":311,"props":708,"children":709},{"style":483},[710],{"type":37,"value":486},{"type":32,"tag":311,"props":712,"children":713},{"style":489},[714],{"type":37,"value":715}," evaluate_generation",{"type":32,"tag":311,"props":717,"children":718},{"style":328},[719],{"type":37,"value":720},"(generated_answer, expected_contains):\n",{"type":32,"tag":311,"props":722,"children":724},{"class":313,"line":723},18,[725],{"type":32,"tag":311,"props":726,"children":727},{"style":318},[728],{"type":37,"value":729},"    # LLM-as-judge: ask Claude \"does this answer cover the expected content?\"\n",{"type":32,"tag":311,"props":731,"children":733},{"class":313,"line":732},19,[734,739,743,748,753,758,763,768,773,777,782,786],{"type":32,"tag":311,"props":735,"children":736},{"style":328},[737],{"type":37,"value":738},"    prompt ",{"type":32,"tag":311,"props":740,"children":741},{"style":483},[742],{"type":37,"value":511},{"type":32,"tag":311,"props":744,"children":745},{"style":483},[746],{"type":37,"value":747}," f",{"type":32,"tag":311,"props":749,"children":750},{"style":347},[751],{"type":37,"value":752},"\"Expected: ",{"type":32,"tag":311,"props":754,"children":755},{"style":446},[756],{"type":37,"value":757},"{",{"type":32,"tag":311,"props":759,"children":760},{"style":328},[761],{"type":37,"value":762},"expected_contains",{"type":32,"tag":311,"props":764,"children":765},{"style":446},[766],{"type":37,"value":767},"}\\n",{"type":32,"tag":311,"props":769,"children":770},{"style":347},[771],{"type":37,"value":772},"Generated: ",{"type":32,"tag":311,"props":774,"children":775},{"style":446},[776],{"type":37,"value":757},{"type":32,"tag":311,"props":778,"children":779},{"style":328},[780],{"type":37,"value":781},"generated_answer",{"type":32,"tag":311,"props":783,"children":784},{"style":446},[785],{"type":37,"value":767},{"type":32,"tag":311,"props":787,"children":788},{"style":347},[789],{"type":37,"value":790},"Score 0-1:\"\n",{"type":32,"tag":311,"props":792,"children":794},{"class":313,"line":793},20,[795,800,804],{"type":32,"tag":311,"props":796,"children":797},{"style":328},[798],{"type":37,"value":799},"    score ",{"type":32,"tag":311,"props":801,"children":802},{"style":483},[803],{"type":37,"value":511},{"type":32,"tag":311,"props":805,"children":806},{"style":328},[807],{"type":37,"value":808}," claude_api(prompt)\n",{"type":32,"tag":311,"props":810,"children":812},{"class":313,"line":811},21,[813,817,822],{"type":32,"tag":311,"props":814,"children":815},{"style":483},[816],{"type":37,"value":669},{"type":32,"tag":311,"props":818,"children":819},{"style":446},[820],{"type":37,"value":821}," float",{"type":32,"tag":311,"props":823,"children":824},{"style":328},[825],{"type":37,"value":826},"(score)\n",{"type":32,"tag":33,"props":828,"children":829},{},[830,835],{"type":32,"tag":98,"props":831,"children":832},{},[833],{"type":37,"value":834},"Eval frequency:",{"type":37,"value":836}," After every embedding model change, every chunking strategy tweak. Run automatically in CI\u002FCD. If recall@5 drops below 0.7, block the deploy.",{"type":32,"tag":33,"props":838,"children":839},{},[840],{"type":37,"value":841},"Real scenario: we built a 200-query eval set for a customer. The eval pipeline ran automatically on every commit. One chunking change lifted recall@5 from 0.68 to 0.81 but p95 latency went from 340ms to 520ms. Without eval, this latency-quality tradeoff would have been invisible on the dashboard.",{"type":32,"tag":40,"props":843,"children":845},{"id":844},"hybrid-search-sparse-dense-retrieval-combined",[846],{"type":37,"value":847},"Hybrid Search: Sparse + Dense Retrieval Combined",{"type":32,"tag":33,"props":849,"children":850},{},[851,853,858],{"type":37,"value":852},"Relying only on vector similarity fails on edge cases. For example, queries needing exact keyword matches (product codes, API endpoint names) score low in vector search. This is where ",{"type":32,"tag":98,"props":854,"children":855},{},[856],{"type":37,"value":857},"hybrid search",{"type":37,"value":859}," enters: combine BM25 (sparse) + embedding (dense) scores.",{"type":32,"tag":301,"props":861,"children":863},{"className":303,"code":862,"language":305,"meta":16,"style":16},"# Hybrid retrieval example\nbm25_results = bm25_index.search(query, top_k=20)\nvector_results = vector_db.search(query_embedding, top_k=20)\n\n# RRF (Reciprocal Rank Fusion)\ndef rrf_score(rank, k=60):\n    return 1 \u002F (k + rank)\n\ncombined_scores = {}\nfor rank, doc in enumerate(bm25_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\nfor rank, doc in enumerate(vector_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\n\nfinal_results = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:5]\n",[864],{"type":32,"tag":57,"props":865,"children":866},{"__ignoreMap":16},[867,875,912,945,952,960,991,1020,1027,1044,1071,1105,1129,1160,1167],{"type":32,"tag":311,"props":868,"children":869},{"class":313,"line":314},[870],{"type":32,"tag":311,"props":871,"children":872},{"style":318},[873],{"type":37,"value":874},"# Hybrid retrieval example\n",{"type":32,"tag":311,"props":876,"children":877},{"class":313,"line":324},[878,883,887,892,898,902,907],{"type":32,"tag":311,"props":879,"children":880},{"style":328},[881],{"type":37,"value":882},"bm25_results ",{"type":32,"tag":311,"props":884,"children":885},{"style":483},[886],{"type":37,"value":511},{"type":32,"tag":311,"props":888,"children":889},{"style":328},[890],{"type":37,"value":891}," bm25_index.search(query, ",{"type":32,"tag":311,"props":893,"children":895},{"style":894},"--shiki-default:#FFAB70",[896],{"type":37,"value":897},"top_k",{"type":32,"tag":311,"props":899,"children":900},{"style":483},[901],{"type":37,"value":511},{"type":32,"tag":311,"props":903,"children":904},{"style":446},[905],{"type":37,"value":906},"20",{"type":32,"tag":311,"props":908,"children":909},{"style":328},[910],{"type":37,"value":911},")\n",{"type":32,"tag":311,"props":913,"children":914},{"class":313,"line":334},[915,920,924,929,933,937,941],{"type":32,"tag":311,"props":916,"children":917},{"style":328},[918],{"type":37,"value":919},"vector_results ",{"type":32,"tag":311,"props":921,"children":922},{"style":483},[923],{"type":37,"value":511},{"type":32,"tag":311,"props":925,"children":926},{"style":328},[927],{"type":37,"value":928}," vector_db.search(query_embedding, ",{"type":32,"tag":311,"props":930,"children":931},{"style":894},[932],{"type":37,"value":897},{"type":32,"tag":311,"props":934,"children":935},{"style":483},[936],{"type":37,"value":511},{"type":32,"tag":311,"props":938,"children":939},{"style":446},[940],{"type":37,"value":906},{"type":32,"tag":311,"props":942,"children":943},{"style":328},[944],{"type":37,"value":911},{"type":32,"tag":311,"props":946,"children":947},{"class":313,"line":343},[948],{"type":32,"tag":311,"props":949,"children":950},{"emptyLinePlaceholder":464},[951],{"type":37,"value":467},{"type":32,"tag":311,"props":953,"children":954},{"class":313,"line":368},[955],{"type":32,"tag":311,"props":956,"children":957},{"style":318},[958],{"type":37,"value":959},"# RRF (Reciprocal Rank Fusion)\n",{"type":32,"tag":311,"props":961,"children":962},{"class":313,"line":402},[963,967,972,977,981,986],{"type":32,"tag":311,"props":964,"children":965},{"style":483},[966],{"type":37,"value":486},{"type":32,"tag":311,"props":968,"children":969},{"style":489},[970],{"type":37,"value":971}," rrf_score",{"type":32,"tag":311,"props":973,"children":974},{"style":328},[975],{"type":37,"value":976},"(rank, k",{"type":32,"tag":311,"props":978,"children":979},{"style":483},[980],{"type":37,"value":511},{"type":32,"tag":311,"props":982,"children":983},{"style":446},[984],{"type":37,"value":985},"60",{"type":32,"tag":311,"props":987,"children":988},{"style":328},[989],{"type":37,"value":990},"):\n",{"type":32,"tag":311,"props":992,"children":993},{"class":313,"line":434},[994,998,1002,1006,1011,1015],{"type":32,"tag":311,"props":995,"children":996},{"style":483},[997],{"type":37,"value":669},{"type":32,"tag":311,"props":999,"children":1000},{"style":446},[1001],{"type":37,"value":588},{"type":32,"tag":311,"props":1003,"children":1004},{"style":483},[1005],{"type":37,"value":593},{"type":32,"tag":311,"props":1007,"children":1008},{"style":328},[1009],{"type":37,"value":1010}," (k ",{"type":32,"tag":311,"props":1012,"children":1013},{"style":483},[1014],{"type":37,"value":612},{"type":32,"tag":311,"props":1016,"children":1017},{"style":328},[1018],{"type":37,"value":1019}," rank)\n",{"type":32,"tag":311,"props":1021,"children":1022},{"class":313,"line":26},[1023],{"type":32,"tag":311,"props":1024,"children":1025},{"emptyLinePlaceholder":464},[1026],{"type":37,"value":467},{"type":32,"tag":311,"props":1028,"children":1029},{"class":313,"line":452},[1030,1035,1039],{"type":32,"tag":311,"props":1031,"children":1032},{"style":328},[1033],{"type":37,"value":1034},"combined_scores ",{"type":32,"tag":311,"props":1036,"children":1037},{"style":483},[1038],{"type":37,"value":511},{"type":32,"tag":311,"props":1040,"children":1041},{"style":328},[1042],{"type":37,"value":1043}," {}\n",{"type":32,"tag":311,"props":1045,"children":1046},{"class":313,"line":460},[1047,1052,1057,1061,1066],{"type":32,"tag":311,"props":1048,"children":1049},{"style":483},[1050],{"type":37,"value":1051},"for",{"type":32,"tag":311,"props":1053,"children":1054},{"style":328},[1055],{"type":37,"value":1056}," rank, doc ",{"type":32,"tag":311,"props":1058,"children":1059},{"style":483},[1060],{"type":37,"value":645},{"type":32,"tag":311,"props":1062,"children":1063},{"style":446},[1064],{"type":37,"value":1065}," enumerate",{"type":32,"tag":311,"props":1067,"children":1068},{"style":328},[1069],{"type":37,"value":1070},"(bm25_results):\n",{"type":32,"tag":311,"props":1072,"children":1073},{"class":313,"line":470},[1074,1079,1083,1088,1092,1096,1100],{"type":32,"tag":311,"props":1075,"children":1076},{"style":328},[1077],{"type":37,"value":1078},"    combined_scores[doc.id] ",{"type":32,"tag":311,"props":1080,"children":1081},{"style":483},[1082],{"type":37,"value":511},{"type":32,"tag":311,"props":1084,"children":1085},{"style":328},[1086],{"type":37,"value":1087}," combined_scores.get(doc.id, ",{"type":32,"tag":311,"props":1089,"children":1090},{"style":446},[1091],{"type":37,"value":603},{"type":32,"tag":311,"props":1093,"children":1094},{"style":328},[1095],{"type":37,"value":621},{"type":32,"tag":311,"props":1097,"children":1098},{"style":483},[1099],{"type":37,"value":612},{"type":32,"tag":311,"props":1101,"children":1102},{"style":328},[1103],{"type":37,"value":1104}," rrf_score(rank)\n",{"type":32,"tag":311,"props":1106,"children":1107},{"class":313,"line":479},[1108,1112,1116,1120,1124],{"type":32,"tag":311,"props":1109,"children":1110},{"style":483},[1111],{"type":37,"value":1051},{"type":32,"tag":311,"props":1113,"children":1114},{"style":328},[1115],{"type":37,"value":1056},{"type":32,"tag":311,"props":1117,"children":1118},{"style":483},[1119],{"type":37,"value":645},{"type":32,"tag":311,"props":1121,"children":1122},{"style":446},[1123],{"type":37,"value":1065},{"type":32,"tag":311,"props":1125,"children":1126},{"style":328},[1127],{"type":37,"value":1128},"(vector_results):\n",{"type":32,"tag":311,"props":1130,"children":1131},{"class":313,"line":500},[1132,1136,1140,1144,1148,1152,1156],{"type":32,"tag":311,"props":1133,"children":1134},{"style":328},[1135],{"type":37,"value":1078},{"type":32,"tag":311,"props":1137,"children":1138},{"style":483},[1139],{"type":37,"value":511},{"type":32,"tag":311,"props":1141,"children":1142},{"style":328},[1143],{"type":37,"value":1087},{"type":32,"tag":311,"props":1145,"children":1146},{"style":446},[1147],{"type":37,"value":603},{"type":32,"tag":311,"props":1149,"children":1150},{"style":328},[1151],{"type":37,"value":621},{"type":32,"tag":311,"props":1153,"children":1154},{"style":483},[1155],{"type":37,"value":612},{"type":32,"tag":311,"props":1157,"children":1158},{"style":328},[1159],{"type":37,"value":1104},{"type":32,"tag":311,"props":1161,"children":1162},{"class":313,"line":573},[1163],{"type":32,"tag":311,"props":1164,"children":1165},{"emptyLinePlaceholder":464},[1166],{"type":37,"value":467},{"type":32,"tag":311,"props":1168,"children":1169},{"class":313,"line":663},[1170,1175,1179,1184,1189,1194,1199,1204,1209,1214,1219,1223,1228,1233,1237],{"type":32,"tag":311,"props":1171,"children":1172},{"style":328},[1173],{"type":37,"value":1174},"final_results ",{"type":32,"tag":311,"props":1176,"children":1177},{"style":483},[1178],{"type":37,"value":511},{"type":32,"tag":311,"props":1180,"children":1181},{"style":446},[1182],{"type":37,"value":1183}," sorted",{"type":32,"tag":311,"props":1185,"children":1186},{"style":328},[1187],{"type":37,"value":1188},"(combined_scores.items(), ",{"type":32,"tag":311,"props":1190,"children":1191},{"style":894},[1192],{"type":37,"value":1193},"key",{"type":32,"tag":311,"props":1195,"children":1196},{"style":483},[1197],{"type":37,"value":1198},"=lambda",{"type":32,"tag":311,"props":1200,"children":1201},{"style":328},[1202],{"type":37,"value":1203}," x: x[",{"type":32,"tag":311,"props":1205,"children":1206},{"style":446},[1207],{"type":37,"value":1208},"1",{"type":32,"tag":311,"props":1210,"children":1211},{"style":328},[1212],{"type":37,"value":1213},"], ",{"type":32,"tag":311,"props":1215,"children":1216},{"style":894},[1217],{"type":37,"value":1218},"reverse",{"type":32,"tag":311,"props":1220,"children":1221},{"style":483},[1222],{"type":37,"value":511},{"type":32,"tag":311,"props":1224,"children":1225},{"style":446},[1226],{"type":37,"value":1227},"True",{"type":32,"tag":311,"props":1229,"children":1230},{"style":328},[1231],{"type":37,"value":1232},")[:",{"type":32,"tag":311,"props":1234,"children":1235},{"style":446},[1236],{"type":37,"value":536},{"type":32,"tag":311,"props":1238,"children":1239},{"style":328},[1240],{"type":37,"value":431},{"type":32,"tag":33,"props":1242,"children":1243},{},[1244],{"type":37,"value":1245},"Test result: hybrid search lifted recall@5 by 22% on technical queries. But latency doubled because you're querying two separate indexes. If this tradeoff is acceptable (e.g., internal tool with \u003C500ms requirement), hybrid search works in production.",{"type":32,"tag":40,"props":1247,"children":1249},{"id":1248},"reranking-second-stage-filtering",[1250],{"type":37,"value":1251},"Reranking: Second-Stage Filtering",{"type":32,"tag":33,"props":1253,"children":1254},{},[1255,1257,1262],{"type":37,"value":1256},"First retrieval (BM25 + vector) returns 20–50 documents. But not all fit in LLM context (cost + token limit). A ",{"type":32,"tag":98,"props":1258,"children":1259},{},[1260],{"type":37,"value":1261},"reranker model",{"type":37,"value":1263}," steps in: rescores each document by relevance to the query and picks top-5.",{"type":32,"tag":33,"props":1265,"children":1266},{},[1267,1269,1275,1277,1283],{"type":37,"value":1268},"Models like Cohere's ",{"type":32,"tag":57,"props":1270,"children":1272},{"className":1271},[],[1273],{"type":37,"value":1274},"rerank-english-v2.0",{"type":37,"value":1276}," or ",{"type":32,"tag":57,"props":1278,"children":1280},{"className":1279},[],[1281],{"type":37,"value":1282},"bge-reranker-large",{"type":37,"value":1284}," do this. Rerankers use cross-encoder architecture—they encode query + document together, so they're pricier than embeddings but more accurate.",{"type":32,"tag":33,"props":1286,"children":1287},{},[1288],{"type":37,"value":1289},"Benchmark: applying reranking over 50 documents:",{"type":32,"tag":156,"props":1291,"children":1292},{},[1293,1298,1303],{"type":32,"tag":108,"props":1294,"children":1295},{},[1296],{"type":37,"value":1297},"Recall@5: 0.73 → 0.89",{"type":32,"tag":108,"props":1299,"children":1300},{},[1301],{"type":37,"value":1302},"Latency: +180ms (acceptable)",{"type":32,"tag":108,"props":1304,"children":1305},{},[1306],{"type":37,"value":1307},"Cost: +$0.002 per retrieval (Cohere API)",{"type":32,"tag":33,"props":1309,"children":1310},{},[1311],{"type":37,"value":1312},"If budget is tight, use a self-hosted reranker—but you need GPU inference. At this point, calculate self-hosted infra cost vs API cost.",{"type":32,"tag":40,"props":1314,"children":1316},{"id":1315},"context-window-optimization-fewer-documents-better-answers",[1317],{"type":37,"value":1318},"Context Window Optimization: Fewer Documents, Better Answers",{"type":32,"tag":33,"props":1320,"children":1321},{},[1322],{"type":37,"value":1323},"Sending 20 documents to an LLM doesn't always produce better answers. Long context triggers the \"lost in the middle\" problem—the model skips information in the middle. Test result: sending GPT-4 Turbo 5 documents produces better answers than 15 documents (11% BLEU score difference).",{"type":32,"tag":33,"props":1325,"children":1326},{},[1327],{"type":32,"tag":98,"props":1328,"children":1329},{},[1330],{"type":37,"value":1331},"Optimization strategy:",{"type":32,"tag":104,"props":1333,"children":1334},{},[1335,1340,1345],{"type":32,"tag":108,"props":1336,"children":1337},{},[1338],{"type":37,"value":1339},"Use reranker to pick top-5",{"type":32,"tag":108,"props":1341,"children":1342},{},[1343],{"type":37,"value":1344},"Drop documents with relevance score \u003C 0.6",{"type":32,"tag":108,"props":1346,"children":1347},{},[1348],{"type":37,"value":1349},"Send remaining 3–5 documents to LLM context",{"type":32,"tag":33,"props":1351,"children":1352},{},[1353],{"type":37,"value":1354},"This cuts input token cost by 70% and improves answer quality. In production, you're balancing the cost\u002Flatency\u002Fquality triangle—eval pipeline makes this visible.",{"type":32,"tag":40,"props":1356,"children":1358},{"id":1357},"production-monitoring-retrieval-drift",[1359],{"type":37,"value":1360},"Production Monitoring: Retrieval Drift",{"type":32,"tag":33,"props":1362,"children":1363},{},[1364,1366,1371],{"type":37,"value":1365},"Retrieval quality can degrade over time—as new documents are added, query distribution shifts. Set up a ",{"type":32,"tag":98,"props":1367,"children":1368},{},[1369],{"type":37,"value":1370},"retrieval drift",{"type":37,"value":1372}," dashboard:",{"type":32,"tag":1374,"props":1375,"children":1376},"table",{},[1377,1401],{"type":32,"tag":1378,"props":1379,"children":1380},"thead",{},[1381],{"type":32,"tag":1382,"props":1383,"children":1384},"tr",{},[1385,1391,1396],{"type":32,"tag":1386,"props":1387,"children":1388},"th",{},[1389],{"type":37,"value":1390},"Metric",{"type":32,"tag":1386,"props":1392,"children":1393},{},[1394],{"type":37,"value":1395},"Target",{"type":32,"tag":1386,"props":1397,"children":1398},{},[1399],{"type":37,"value":1400},"Alarm Threshold",{"type":32,"tag":1402,"props":1403,"children":1404},"tbody",{},[1405,1424,1442,1460],{"type":32,"tag":1382,"props":1406,"children":1407},{},[1408,1414,1419],{"type":32,"tag":1409,"props":1410,"children":1411},"td",{},[1412],{"type":37,"value":1413},"Recall@5 (weekly eval)",{"type":32,"tag":1409,"props":1415,"children":1416},{},[1417],{"type":37,"value":1418},"> 0.75",{"type":32,"tag":1409,"props":1420,"children":1421},{},[1422],{"type":37,"value":1423},"\u003C 0.70",{"type":32,"tag":1382,"props":1425,"children":1426},{},[1427,1432,1437],{"type":32,"tag":1409,"props":1428,"children":1429},{},[1430],{"type":37,"value":1431},"P95 latency",{"type":32,"tag":1409,"props":1433,"children":1434},{},[1435],{"type":37,"value":1436},"\u003C 400ms",{"type":32,"tag":1409,"props":1438,"children":1439},{},[1440],{"type":37,"value":1441},"> 600ms",{"type":32,"tag":1382,"props":1443,"children":1444},{},[1445,1450,1455],{"type":32,"tag":1409,"props":1446,"children":1447},{},[1448],{"type":37,"value":1449},"Zero-result queries (%)",{"type":32,"tag":1409,"props":1451,"children":1452},{},[1453],{"type":37,"value":1454},"\u003C 5%",{"type":32,"tag":1409,"props":1456,"children":1457},{},[1458],{"type":37,"value":1459},"> 10%",{"type":32,"tag":1382,"props":1461,"children":1462},{},[1463,1468,1473],{"type":32,"tag":1409,"props":1464,"children":1465},{},[1466],{"type":37,"value":1467},"Average relevance score",{"type":32,"tag":1409,"props":1469,"children":1470},{},[1471],{"type":37,"value":1472},"> 0.65",{"type":32,"tag":1409,"props":1474,"children":1475},{},[1476],{"type":37,"value":1477},"\u003C 0.55",{"type":32,"tag":33,"props":1479,"children":1480},{},[1481],{"type":37,"value":1482},"If you see recall drift:",{"type":32,"tag":104,"props":1484,"children":1485},{},[1486,1491,1496],{"type":32,"tag":108,"props":1487,"children":1488},{},[1489],{"type":37,"value":1490},"Update your eval set (add new query patterns)",{"type":32,"tag":108,"props":1492,"children":1493},{},[1494],{"type":37,"value":1495},"Fine-tune the embedding model or swap it",{"type":32,"tag":108,"props":1497,"children":1498},{},[1499],{"type":37,"value":1500},"Revisit chunking strategy",{"type":32,"tag":33,"props":1502,"children":1503},{},[1504,1506,1513],{"type":37,"value":1505},"This monitoring falls under ",{"type":32,"tag":129,"props":1507,"children":1510},{"href":1508,"rel":1509},"https:\u002F\u002Fwww.roibase.com.tr\u002Fru\u002Ffirstparty",[133],[1511],{"type":37,"value":1512},"first-party data & measurement architecture",{"type":37,"value":1514},"—a RAG system is also a data pipeline and must be observable.",{"type":32,"tag":40,"props":1516,"children":1518},{"id":1517},"cost-vs-quality-tradeoff-pragmatic-choices",[1519],{"type":37,"value":1520},"Cost vs Quality Tradeoff: Pragmatic Choices",{"type":32,"tag":33,"props":1522,"children":1523},{},[1524],{"type":37,"value":1525},"In production RAG, every decision involves a cost\u002Fquality\u002Flatency tradeoff. Some pragmatic moves:",{"type":32,"tag":156,"props":1527,"children":1528},{},[1529,1539,1549,1559],{"type":32,"tag":108,"props":1530,"children":1531},{},[1532,1537],{"type":32,"tag":98,"props":1533,"children":1534},{},[1535],{"type":37,"value":1536},"Embedding model:",{"type":37,"value":1538}," Use Cohere v3 instead of OpenAI 3-large → 30% cost savings, 2% quality loss (acceptable)",{"type":32,"tag":108,"props":1540,"children":1541},{},[1542,1547],{"type":32,"tag":98,"props":1543,"children":1544},{},[1545],{"type":37,"value":1546},"Reranking:",{"type":37,"value":1548}," Rerank only ambiguous queries instead of all → 40% latency reduction",{"type":32,"tag":108,"props":1550,"children":1551},{},[1552,1557],{"type":32,"tag":98,"props":1553,"children":1554},{},[1555],{"type":37,"value":1556},"Hybrid search:",{"type":37,"value":1558}," Vector-only instead of BM25 + vector (if exact match isn't critical) → 50% latency reduction",{"type":32,"tag":108,"props":1560,"children":1561},{},[1562,1567],{"type":32,"tag":98,"props":1563,"children":1564},{},[1565],{"type":37,"value":1566},"Context window:",{"type":37,"value":1568}," 5 documents instead of 10 → 60% token cost reduction, 8% quality improvement",{"type":32,"tag":33,"props":1570,"children":1571},{},[1572],{"type":37,"value":1573},"To see these tradeoffs, you need an eval pipeline. Otherwise you say \"I swapped the embedding model, it's cheaper now\" but miss the 15% retrieval quality drop.",{"type":32,"tag":33,"props":1575,"children":1576},{},[1577],{"type":37,"value":1578},"Before shipping your RAG system to production, take embedding model, chunking strategy, and eval setup seriously. Cost optimization comes second—first stabilize retrieval quality, then optimize cost. Otherwise, the system's unreliability hits your users and adoption suffers.",{"type":32,"tag":1580,"props":1581,"children":1582},"style",{},[1583],{"type":37,"value":1584},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":334,"depth":334,"links":1586},[1587,1588,1591,1592,1593,1594,1595,1596],{"id":42,"depth":324,"text":45},{"id":141,"depth":324,"text":144,"children":1589},[1590],{"id":272,"depth":334,"text":275},{"id":283,"depth":324,"text":286},{"id":844,"depth":324,"text":847},{"id":1248,"depth":324,"text":1251},{"id":1315,"depth":324,"text":1318},{"id":1357,"depth":324,"text":1360},{"id":1517,"depth":324,"text":1520},"markdown","content:ru:ai:production-rag-retrieval-quality-over-cost.md","content","ru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost.md","ru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost","md",1778709808853]