[{"data":1,"prerenderedAt":1601},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fen\u002Fai\u002Frag-retrieval-quality-over-cost":13},{"i18nKey":4,"paths":5},"ai-003-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fen\u002Fai\u002Frag-retrieval-quality-over-cost","\u002Fes\u002Fai\u002Frag-en-produccion-calidad-de-recuperacion-antes-que-costo","\u002Ffr\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fit\u002Fai\u002Frag-production-retrieval-kalitesi-once-gelir","\u002Fru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost","\u002Ftr\u002Fai\u002Fproductionda-rag-retrieval-kalitesi-costtan-once-gelir",{"_path":7,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1595,"_id":1596,"_source":1597,"_file":1598,"_stem":1599,"_extension":1600},"ai",false,"","Production RAG: Retrieval Quality Comes Before Cost","Choose your embedding model, chunking strategy, and eval setup wrong, and your RAG system becomes expensive or slow—or both. What matters in production?","2026-05-11",[21,22,23,24,25],"rag","embedding","chunking","llm-eval","retrieval-quality",8,"Roibase",{"type":29,"children":30,"toc":1583},"root",[31,39,46,51,73,94,103,123,139,145,150,155,174,218,223,256,269,276,281,287,292,300,827,837,842,848,860,1241,1246,1252,1264,1285,1290,1308,1313,1319,1324,1332,1350,1355,1361,1371,1476,1481,1499,1513,1519,1524,1567,1572,1577],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","RAG systems have become mainstream in production since 2024. Companies are building embedding + vector DB stacks to feed their own document corpus into LLMs. Yet most pilot projects hit the same wall: retrieval quality drops, answers become inconsistent, costs spiral. The culprit is usually hasty decisions on embedding model selection, chunking strategy, and eval setup. This piece shows you which decisions in your RAG pipeline have no reversible path before you move to production.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"embedding-model-alignment-not-dimension",[44],{"type":37,"value":45},"Embedding Model: Alignment, Not Dimension",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"Your first instinct when choosing an embedding model is \"which one has the highest MTEB score.\" But benchmark rankings don't guarantee production performance. What matters is how well the model aligns with your document types and query patterns.",{"type":32,"tag":33,"props":52,"children":53},{},[54,56,63,65,71],{"type":37,"value":55},"When we compared OpenAI's ",{"type":32,"tag":57,"props":58,"children":60},"code",{"className":59},[],[61],{"type":37,"value":62},"text-embedding-3-large",{"type":37,"value":64}," (3072 dim) with Cohere's ",{"type":32,"tag":57,"props":66,"children":68},{"className":67},[],[69],{"type":37,"value":70},"embed-v3",{"type":37,"value":72}," (1024 dim), Cohere delivered more consistent recall@10 on marketing documents (blogs, case studies, landing pages)—because its training set was heavy on business content. OpenAI's larger size scored well on general benchmarks, but the distribution of domain-specific queries was different.",{"type":32,"tag":33,"props":74,"children":75},{},[76,78,84,86,92],{"type":37,"value":77},"Another example: ",{"type":32,"tag":57,"props":79,"children":81},{"className":80},[],[82],{"type":37,"value":83},"bge-large-en-v1.5",{"type":37,"value":85}," (1024 dim, self-hosted) is sufficient for legal documents. But on a multilingual corpus, ",{"type":32,"tag":57,"props":87,"children":89},{"className":88},[],[90],{"type":37,"value":91},"multilingual-e5-large",{"type":37,"value":93}," (1024 dim) clearly wins. Model size isn't always a quality signal—alignment between training data and your domain is more critical.",{"type":32,"tag":33,"props":95,"children":96},{},[97],{"type":32,"tag":98,"props":99,"children":100},"strong",{},[101],{"type":37,"value":102},"Selection criteria:",{"type":32,"tag":104,"props":105,"children":106},"ol",{},[107,113,118],{"type":32,"tag":108,"props":109,"children":110},"li",{},[111],{"type":37,"value":112},"Not MTEB score, but recall@5 \u002F MRR metric on your own eval set",{"type":32,"tag":108,"props":114,"children":115},{},[116],{"type":37,"value":117},"Latency (self-hosted vs API)—batch embedding time for 512 documents",{"type":32,"tag":108,"props":119,"children":120},{},[121],{"type":37,"value":122},"Cost per 1M tokens—OpenAI 3-large costs $0.13, Cohere v3 $0.10, self-hosted $0 but infrastructure overhead exists",{"type":32,"tag":33,"props":124,"children":125},{},[126,128,137],{"type":37,"value":127},"If your document set contains domain-specific jargon (pharma, finance, legal), fine-tuning an embedding model or adapting sentence transformers to your data increases retrieval quality by 15–20%. This falls under ",{"type":32,"tag":129,"props":130,"children":134},"a",{"href":131,"rel":132},"https:\u002F\u002Fwww.roibase.com.tr\u002Fen\u002Fverianalizi",[133],"nofollow",[135],{"type":37,"value":136},"data analysis & insights engineering",{"type":37,"value":138},"—you need to build a training pipeline and monitor data quality.",{"type":32,"tag":40,"props":140,"children":142},{"id":141},"chunking-strategy-fixed-size-doesnt-work",[143],{"type":37,"value":144},"Chunking Strategy: Fixed Size Doesn't Work",{"type":32,"tag":33,"props":146,"children":147},{},[148],{"type":37,"value":149},"Most RAG implementations start with \"512 token overlapping window\" as default. This barely works for markdown blogs but breaks immediately on mixed-format corpus (PDF, HTML, JSON).",{"type":32,"tag":33,"props":151,"children":152},{},[153],{"type":37,"value":154},"Fixed-size chunking problems:",{"type":32,"tag":156,"props":157,"children":158},"ul",{},[159,164,169],{"type":32,"tag":108,"props":160,"children":161},{},[162],{"type":37,"value":163},"Headings get split, semantic integrity is lost",{"type":32,"tag":108,"props":165,"children":166},{},[167],{"type":37,"value":168},"Tables and code blocks are severed mid-block",{"type":32,"tag":108,"props":170,"children":171},{},[172],{"type":37,"value":173},"Overlap strategy duplicates overlapping context, adding retrieval noise",{"type":32,"tag":33,"props":175,"children":176},{},[177,179,184,186,192,194,200,202,208,210,216],{"type":37,"value":178},"Alternative: ",{"type":32,"tag":98,"props":180,"children":181},{},[182],{"type":37,"value":183},"semantic chunking",{"type":37,"value":185},". Split documents respecting sentence boundaries, heading hierarchy, and structural integrity. Replace ",{"type":32,"tag":57,"props":187,"children":189},{"className":188},[],[190],{"type":37,"value":191},"langchain",{"type":37,"value":193},"'s ",{"type":32,"tag":57,"props":195,"children":197},{"className":196},[],[198],{"type":37,"value":199},"RecursiveCharacterTextSplitter",{"type":37,"value":201}," with ",{"type":32,"tag":57,"props":203,"children":205},{"className":204},[],[206],{"type":37,"value":207},"MarkdownTextSplitter",{"type":37,"value":209}," or a custom parser. On PDFs, use ",{"type":32,"tag":57,"props":211,"children":213},{"className":212},[],[214],{"type":37,"value":215},"pdfplumber",{"type":37,"value":217}," to separate tables from text and apply different chunk strategies to each.",{"type":32,"tag":33,"props":219,"children":220},{},[221],{"type":37,"value":222},"For an e-commerce firm, we split product documents into three chunk types:",{"type":32,"tag":156,"props":224,"children":225},{},[226,236,246],{"type":32,"tag":108,"props":227,"children":228},{},[229,234],{"type":32,"tag":98,"props":230,"children":231},{},[232],{"type":37,"value":233},"Title + short description:",{"type":37,"value":235}," 128 tokens, lightweight for retrieval",{"type":32,"tag":108,"props":237,"children":238},{},[239,244],{"type":32,"tag":98,"props":240,"children":241},{},[242],{"type":37,"value":243},"Technical specs + table:",{"type":37,"value":245}," 256 tokens, structured data",{"type":32,"tag":108,"props":247,"children":248},{},[249,254],{"type":32,"tag":98,"props":250,"children":251},{},[252],{"type":37,"value":253},"Long-form content (blog, guide):",{"type":37,"value":255}," 512 tokens, semantically split",{"type":32,"tag":33,"props":257,"children":258},{},[259,261,267],{"type":37,"value":260},"Each chunk type got metadata (chunk_type, source_page). During retrieval, we filtered by chunk_type based on query type. For example, \"product comparison\" queries only looked at ",{"type":32,"tag":57,"props":262,"children":264},{"className":263},[],[265],{"type":37,"value":266},"technical_specs",{"type":37,"value":268}," chunks. This improved precision@3 by 18%.",{"type":32,"tag":270,"props":271,"children":273},"h3",{"id":272},"overlap-strategy-how-much-is-enough",[274],{"type":37,"value":275},"Overlap Strategy: How Much Is Enough?",{"type":32,"tag":33,"props":277,"children":278},{},[279],{"type":37,"value":280},"Overlap is usually recommended at 10–20%, but that's arbitrary. Test results: 50-token overlap on 512-token chunks preserves semantic continuity. 100-token overlap increases retrieval latency by 12% without quality gains. The sweet spot varies by domain—test it on your own eval set.",{"type":32,"tag":40,"props":282,"children":284},{"id":283},"eval-setup-build-it-before-production",[285],{"type":37,"value":286},"Eval Setup: Build It Before Production",{"type":32,"tag":33,"props":288,"children":289},{},[290],{"type":37,"value":291},"Most RAG systems pass the \"looks good visually\" test into production. But without a structured eval pipeline to measure retrieval quality, the system won't be trustworthy on the first 1000 queries.",{"type":32,"tag":33,"props":293,"children":294},{},[295],{"type":32,"tag":98,"props":296,"children":297},{},[298],{"type":37,"value":299},"Minimal eval pipeline:",{"type":32,"tag":301,"props":302,"children":306},"pre",{"className":303,"code":304,"language":305,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","# eval_set.json — golden dataset\n[\n  {\n    \"query\": \"How do I ensure GDPR-compliant user consent?\",\n    \"expected_docs\": [\"doc_42\", \"doc_89\"],\n    \"expected_answer_contains\": [\"cookie notice\", \"explicit consent\"]\n  },\n  ...\n]\n\n# eval metrics\ndef evaluate_retrieval(query, retrieved_docs, expected_docs):\n    recall_at_k = len(set(retrieved_docs[:5]) & set(expected_docs)) \u002F len(expected_docs)\n    mrr = 1 \u002F (retrieved_docs.index(expected_docs[0]) + 1) if expected_docs[0] in retrieved_docs else 0\n    return {\"recall@5\": recall_at_k, \"mrr\": mrr}\n\ndef evaluate_generation(generated_answer, expected_contains):\n    # LLM-as-judge: ask Claude \"does this answer cover expected content?\"\n    prompt = f\"Expected: {expected_contains}\\nGenerated: {generated_answer}\\nScore 0-1:\"\n    score = claude_api(prompt)\n    return float(score)\n","python",[307],{"type":32,"tag":57,"props":308,"children":309},{"__ignoreMap":16},[310,322,332,341,366,400,432,441,450,458,468,477,498,571,661,695,703,721,730,791,809],{"type":32,"tag":311,"props":312,"children":315},"span",{"class":313,"line":314},"line",1,[316],{"type":32,"tag":311,"props":317,"children":319},{"style":318},"--shiki-default:#6A737D",[320],{"type":37,"value":321},"# eval_set.json — golden dataset\n",{"type":32,"tag":311,"props":323,"children":325},{"class":313,"line":324},2,[326],{"type":32,"tag":311,"props":327,"children":329},{"style":328},"--shiki-default:#E1E4E8",[330],{"type":37,"value":331},"[\n",{"type":32,"tag":311,"props":333,"children":335},{"class":313,"line":334},3,[336],{"type":32,"tag":311,"props":337,"children":338},{"style":328},[339],{"type":37,"value":340},"  {\n",{"type":32,"tag":311,"props":342,"children":344},{"class":313,"line":343},4,[345,351,356,361],{"type":32,"tag":311,"props":346,"children":348},{"style":347},"--shiki-default:#9ECBFF",[349],{"type":37,"value":350},"    \"query\"",{"type":32,"tag":311,"props":352,"children":353},{"style":328},[354],{"type":37,"value":355},": ",{"type":32,"tag":311,"props":357,"children":358},{"style":347},[359],{"type":37,"value":360},"\"How do I ensure GDPR-compliant user consent?\"",{"type":32,"tag":311,"props":362,"children":363},{"style":328},[364],{"type":37,"value":365},",\n",{"type":32,"tag":311,"props":367,"children":369},{"class":313,"line":368},5,[370,375,380,385,390,395],{"type":32,"tag":311,"props":371,"children":372},{"style":347},[373],{"type":37,"value":374},"    \"expected_docs\"",{"type":32,"tag":311,"props":376,"children":377},{"style":328},[378],{"type":37,"value":379},": [",{"type":32,"tag":311,"props":381,"children":382},{"style":347},[383],{"type":37,"value":384},"\"doc_42\"",{"type":32,"tag":311,"props":386,"children":387},{"style":328},[388],{"type":37,"value":389},", ",{"type":32,"tag":311,"props":391,"children":392},{"style":347},[393],{"type":37,"value":394},"\"doc_89\"",{"type":32,"tag":311,"props":396,"children":397},{"style":328},[398],{"type":37,"value":399},"],\n",{"type":32,"tag":311,"props":401,"children":403},{"class":313,"line":402},6,[404,409,413,418,422,427],{"type":32,"tag":311,"props":405,"children":406},{"style":347},[407],{"type":37,"value":408},"    \"expected_answer_contains\"",{"type":32,"tag":311,"props":410,"children":411},{"style":328},[412],{"type":37,"value":379},{"type":32,"tag":311,"props":414,"children":415},{"style":347},[416],{"type":37,"value":417},"\"cookie notice\"",{"type":32,"tag":311,"props":419,"children":420},{"style":328},[421],{"type":37,"value":389},{"type":32,"tag":311,"props":423,"children":424},{"style":347},[425],{"type":37,"value":426},"\"explicit consent\"",{"type":32,"tag":311,"props":428,"children":429},{"style":328},[430],{"type":37,"value":431},"]\n",{"type":32,"tag":311,"props":433,"children":435},{"class":313,"line":434},7,[436],{"type":32,"tag":311,"props":437,"children":438},{"style":328},[439],{"type":37,"value":440},"  },\n",{"type":32,"tag":311,"props":442,"children":443},{"class":313,"line":26},[444],{"type":32,"tag":311,"props":445,"children":447},{"style":446},"--shiki-default:#79B8FF",[448],{"type":37,"value":449},"  ...\n",{"type":32,"tag":311,"props":451,"children":453},{"class":313,"line":452},9,[454],{"type":32,"tag":311,"props":455,"children":456},{"style":328},[457],{"type":37,"value":431},{"type":32,"tag":311,"props":459,"children":461},{"class":313,"line":460},10,[462],{"type":32,"tag":311,"props":463,"children":465},{"emptyLinePlaceholder":464},true,[466],{"type":37,"value":467},"\n",{"type":32,"tag":311,"props":469,"children":471},{"class":313,"line":470},11,[472],{"type":32,"tag":311,"props":473,"children":474},{"style":318},[475],{"type":37,"value":476},"# eval metrics\n",{"type":32,"tag":311,"props":478,"children":480},{"class":313,"line":479},12,[481,487,493],{"type":32,"tag":311,"props":482,"children":484},{"style":483},"--shiki-default:#F97583",[485],{"type":37,"value":486},"def",{"type":32,"tag":311,"props":488,"children":490},{"style":489},"--shiki-default:#B392F0",[491],{"type":37,"value":492}," evaluate_retrieval",{"type":32,"tag":311,"props":494,"children":495},{"style":328},[496],{"type":37,"value":497},"(query, retrieved_docs, expected_docs):\n",{"type":32,"tag":311,"props":499,"children":501},{"class":313,"line":500},13,[502,507,512,517,522,527,532,537,542,547,552,557,562,566],{"type":32,"tag":311,"props":503,"children":504},{"style":328},[505],{"type":37,"value":506},"    recall_at_k ",{"type":32,"tag":311,"props":508,"children":509},{"style":483},[510],{"type":37,"value":511},"=",{"type":32,"tag":311,"props":513,"children":514},{"style":446},[515],{"type":37,"value":516}," len",{"type":32,"tag":311,"props":518,"children":519},{"style":328},[520],{"type":37,"value":521},"(",{"type":32,"tag":311,"props":523,"children":524},{"style":446},[525],{"type":37,"value":526},"set",{"type":32,"tag":311,"props":528,"children":529},{"style":328},[530],{"type":37,"value":531},"(retrieved_docs[:",{"type":32,"tag":311,"props":533,"children":534},{"style":446},[535],{"type":37,"value":536},"5",{"type":32,"tag":311,"props":538,"children":539},{"style":328},[540],{"type":37,"value":541},"]) ",{"type":32,"tag":311,"props":543,"children":544},{"style":483},[545],{"type":37,"value":546},"&",{"type":32,"tag":311,"props":548,"children":549},{"style":446},[550],{"type":37,"value":551}," set",{"type":32,"tag":311,"props":553,"children":554},{"style":328},[555],{"type":37,"value":556},"(expected_docs)) ",{"type":32,"tag":311,"props":558,"children":559},{"style":483},[560],{"type":37,"value":561},"\u002F",{"type":32,"tag":311,"props":563,"children":564},{"style":446},[565],{"type":37,"value":516},{"type":32,"tag":311,"props":567,"children":568},{"style":328},[569],{"type":37,"value":570},"(expected_docs)\n",{"type":32,"tag":311,"props":572,"children":574},{"class":313,"line":573},14,[575,580,584,589,594,599,604,608,613,617,622,627,632,636,641,646,651,656],{"type":32,"tag":311,"props":576,"children":577},{"style":328},[578],{"type":37,"value":579},"    mrr ",{"type":32,"tag":311,"props":581,"children":582},{"style":483},[583],{"type":37,"value":511},{"type":32,"tag":311,"props":585,"children":586},{"style":446},[587],{"type":37,"value":588}," 1",{"type":32,"tag":311,"props":590,"children":591},{"style":483},[592],{"type":37,"value":593}," \u002F",{"type":32,"tag":311,"props":595,"children":596},{"style":328},[597],{"type":37,"value":598}," (retrieved_docs.index(expected_docs[",{"type":32,"tag":311,"props":600,"children":601},{"style":446},[602],{"type":37,"value":603},"0",{"type":32,"tag":311,"props":605,"children":606},{"style":328},[607],{"type":37,"value":541},{"type":32,"tag":311,"props":609,"children":610},{"style":483},[611],{"type":37,"value":612},"+",{"type":32,"tag":311,"props":614,"children":615},{"style":446},[616],{"type":37,"value":588},{"type":32,"tag":311,"props":618,"children":619},{"style":328},[620],{"type":37,"value":621},") ",{"type":32,"tag":311,"props":623,"children":624},{"style":483},[625],{"type":37,"value":626},"if",{"type":32,"tag":311,"props":628,"children":629},{"style":328},[630],{"type":37,"value":631}," expected_docs[",{"type":32,"tag":311,"props":633,"children":634},{"style":446},[635],{"type":37,"value":603},{"type":32,"tag":311,"props":637,"children":638},{"style":328},[639],{"type":37,"value":640},"] ",{"type":32,"tag":311,"props":642,"children":643},{"style":483},[644],{"type":37,"value":645},"in",{"type":32,"tag":311,"props":647,"children":648},{"style":328},[649],{"type":37,"value":650}," retrieved_docs ",{"type":32,"tag":311,"props":652,"children":653},{"style":483},[654],{"type":37,"value":655},"else",{"type":32,"tag":311,"props":657,"children":658},{"style":446},[659],{"type":37,"value":660}," 0\n",{"type":32,"tag":311,"props":662,"children":664},{"class":313,"line":663},15,[665,670,675,680,685,690],{"type":32,"tag":311,"props":666,"children":667},{"style":483},[668],{"type":37,"value":669},"    return",{"type":32,"tag":311,"props":671,"children":672},{"style":328},[673],{"type":37,"value":674}," {",{"type":32,"tag":311,"props":676,"children":677},{"style":347},[678],{"type":37,"value":679},"\"recall@5\"",{"type":32,"tag":311,"props":681,"children":682},{"style":328},[683],{"type":37,"value":684},": recall_at_k, ",{"type":32,"tag":311,"props":686,"children":687},{"style":347},[688],{"type":37,"value":689},"\"mrr\"",{"type":32,"tag":311,"props":691,"children":692},{"style":328},[693],{"type":37,"value":694},": mrr}\n",{"type":32,"tag":311,"props":696,"children":698},{"class":313,"line":697},16,[699],{"type":32,"tag":311,"props":700,"children":701},{"emptyLinePlaceholder":464},[702],{"type":37,"value":467},{"type":32,"tag":311,"props":704,"children":706},{"class":313,"line":705},17,[707,711,716],{"type":32,"tag":311,"props":708,"children":709},{"style":483},[710],{"type":37,"value":486},{"type":32,"tag":311,"props":712,"children":713},{"style":489},[714],{"type":37,"value":715}," evaluate_generation",{"type":32,"tag":311,"props":717,"children":718},{"style":328},[719],{"type":37,"value":720},"(generated_answer, expected_contains):\n",{"type":32,"tag":311,"props":722,"children":724},{"class":313,"line":723},18,[725],{"type":32,"tag":311,"props":726,"children":727},{"style":318},[728],{"type":37,"value":729},"    # LLM-as-judge: ask Claude \"does this answer cover expected content?\"\n",{"type":32,"tag":311,"props":731,"children":733},{"class":313,"line":732},19,[734,739,743,748,753,758,763,768,773,777,782,786],{"type":32,"tag":311,"props":735,"children":736},{"style":328},[737],{"type":37,"value":738},"    prompt ",{"type":32,"tag":311,"props":740,"children":741},{"style":483},[742],{"type":37,"value":511},{"type":32,"tag":311,"props":744,"children":745},{"style":483},[746],{"type":37,"value":747}," f",{"type":32,"tag":311,"props":749,"children":750},{"style":347},[751],{"type":37,"value":752},"\"Expected: ",{"type":32,"tag":311,"props":754,"children":755},{"style":446},[756],{"type":37,"value":757},"{",{"type":32,"tag":311,"props":759,"children":760},{"style":328},[761],{"type":37,"value":762},"expected_contains",{"type":32,"tag":311,"props":764,"children":765},{"style":446},[766],{"type":37,"value":767},"}\\n",{"type":32,"tag":311,"props":769,"children":770},{"style":347},[771],{"type":37,"value":772},"Generated: ",{"type":32,"tag":311,"props":774,"children":775},{"style":446},[776],{"type":37,"value":757},{"type":32,"tag":311,"props":778,"children":779},{"style":328},[780],{"type":37,"value":781},"generated_answer",{"type":32,"tag":311,"props":783,"children":784},{"style":446},[785],{"type":37,"value":767},{"type":32,"tag":311,"props":787,"children":788},{"style":347},[789],{"type":37,"value":790},"Score 0-1:\"\n",{"type":32,"tag":311,"props":792,"children":794},{"class":313,"line":793},20,[795,800,804],{"type":32,"tag":311,"props":796,"children":797},{"style":328},[798],{"type":37,"value":799},"    score ",{"type":32,"tag":311,"props":801,"children":802},{"style":483},[803],{"type":37,"value":511},{"type":32,"tag":311,"props":805,"children":806},{"style":328},[807],{"type":37,"value":808}," claude_api(prompt)\n",{"type":32,"tag":311,"props":810,"children":812},{"class":313,"line":811},21,[813,817,822],{"type":32,"tag":311,"props":814,"children":815},{"style":483},[816],{"type":37,"value":669},{"type":32,"tag":311,"props":818,"children":819},{"style":446},[820],{"type":37,"value":821}," float",{"type":32,"tag":311,"props":823,"children":824},{"style":328},[825],{"type":37,"value":826},"(score)\n",{"type":32,"tag":33,"props":828,"children":829},{},[830,835],{"type":32,"tag":98,"props":831,"children":832},{},[833],{"type":37,"value":834},"Eval frequency:",{"type":37,"value":836}," After every embedding model change, chunking tweak, or strategy shift. Run it automatically in CI\u002FCD. Block deployment if recall@5 drops below 0.7.",{"type":32,"tag":33,"props":838,"children":839},{},[840],{"type":37,"value":841},"In practice: we built a 200-query eval set for a customer. The eval pipeline ran automatically on every commit. One chunking change lifted recall@5 from 0.68 to 0.81, but p95 latency jumped from 340ms to 520ms. Once we visualized the cost\u002Flatency tradeoff on the dashboard, we reverted the chunking and tried a different approach. Without eval, we'd never have seen this tradeoff.",{"type":32,"tag":40,"props":843,"children":845},{"id":844},"hybrid-search-sparse-dense-retrieval",[846],{"type":37,"value":847},"Hybrid Search: Sparse + Dense Retrieval",{"type":32,"tag":33,"props":849,"children":850},{},[851,853,858],{"type":37,"value":852},"Relying only on vector similarity fails on edge cases. For example, queries requiring exact keyword matches (product codes, API endpoints) score low on vector search. This is where ",{"type":32,"tag":98,"props":854,"children":855},{},[856],{"type":37,"value":857},"hybrid search",{"type":37,"value":859}," comes in: combine BM25 (sparse) + embedding (dense) scores.",{"type":32,"tag":301,"props":861,"children":863},{"className":303,"code":862,"language":305,"meta":16,"style":16},"# Hybrid retrieval example\nbm25_results = bm25_index.search(query, top_k=20)\nvector_results = vector_db.search(query_embedding, top_k=20)\n\n# RRF (Reciprocal Rank Fusion)\ndef rrf_score(rank, k=60):\n    return 1 \u002F (k + rank)\n\ncombined_scores = {}\nfor rank, doc in enumerate(bm25_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\nfor rank, doc in enumerate(vector_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\n\nfinal_results = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:5]\n",[864],{"type":32,"tag":57,"props":865,"children":866},{"__ignoreMap":16},[867,875,912,945,952,960,991,1020,1027,1044,1071,1105,1129,1160,1167],{"type":32,"tag":311,"props":868,"children":869},{"class":313,"line":314},[870],{"type":32,"tag":311,"props":871,"children":872},{"style":318},[873],{"type":37,"value":874},"# Hybrid retrieval example\n",{"type":32,"tag":311,"props":876,"children":877},{"class":313,"line":324},[878,883,887,892,898,902,907],{"type":32,"tag":311,"props":879,"children":880},{"style":328},[881],{"type":37,"value":882},"bm25_results ",{"type":32,"tag":311,"props":884,"children":885},{"style":483},[886],{"type":37,"value":511},{"type":32,"tag":311,"props":888,"children":889},{"style":328},[890],{"type":37,"value":891}," bm25_index.search(query, ",{"type":32,"tag":311,"props":893,"children":895},{"style":894},"--shiki-default:#FFAB70",[896],{"type":37,"value":897},"top_k",{"type":32,"tag":311,"props":899,"children":900},{"style":483},[901],{"type":37,"value":511},{"type":32,"tag":311,"props":903,"children":904},{"style":446},[905],{"type":37,"value":906},"20",{"type":32,"tag":311,"props":908,"children":909},{"style":328},[910],{"type":37,"value":911},")\n",{"type":32,"tag":311,"props":913,"children":914},{"class":313,"line":334},[915,920,924,929,933,937,941],{"type":32,"tag":311,"props":916,"children":917},{"style":328},[918],{"type":37,"value":919},"vector_results ",{"type":32,"tag":311,"props":921,"children":922},{"style":483},[923],{"type":37,"value":511},{"type":32,"tag":311,"props":925,"children":926},{"style":328},[927],{"type":37,"value":928}," vector_db.search(query_embedding, ",{"type":32,"tag":311,"props":930,"children":931},{"style":894},[932],{"type":37,"value":897},{"type":32,"tag":311,"props":934,"children":935},{"style":483},[936],{"type":37,"value":511},{"type":32,"tag":311,"props":938,"children":939},{"style":446},[940],{"type":37,"value":906},{"type":32,"tag":311,"props":942,"children":943},{"style":328},[944],{"type":37,"value":911},{"type":32,"tag":311,"props":946,"children":947},{"class":313,"line":343},[948],{"type":32,"tag":311,"props":949,"children":950},{"emptyLinePlaceholder":464},[951],{"type":37,"value":467},{"type":32,"tag":311,"props":953,"children":954},{"class":313,"line":368},[955],{"type":32,"tag":311,"props":956,"children":957},{"style":318},[958],{"type":37,"value":959},"# RRF (Reciprocal Rank Fusion)\n",{"type":32,"tag":311,"props":961,"children":962},{"class":313,"line":402},[963,967,972,977,981,986],{"type":32,"tag":311,"props":964,"children":965},{"style":483},[966],{"type":37,"value":486},{"type":32,"tag":311,"props":968,"children":969},{"style":489},[970],{"type":37,"value":971}," rrf_score",{"type":32,"tag":311,"props":973,"children":974},{"style":328},[975],{"type":37,"value":976},"(rank, k",{"type":32,"tag":311,"props":978,"children":979},{"style":483},[980],{"type":37,"value":511},{"type":32,"tag":311,"props":982,"children":983},{"style":446},[984],{"type":37,"value":985},"60",{"type":32,"tag":311,"props":987,"children":988},{"style":328},[989],{"type":37,"value":990},"):\n",{"type":32,"tag":311,"props":992,"children":993},{"class":313,"line":434},[994,998,1002,1006,1011,1015],{"type":32,"tag":311,"props":995,"children":996},{"style":483},[997],{"type":37,"value":669},{"type":32,"tag":311,"props":999,"children":1000},{"style":446},[1001],{"type":37,"value":588},{"type":32,"tag":311,"props":1003,"children":1004},{"style":483},[1005],{"type":37,"value":593},{"type":32,"tag":311,"props":1007,"children":1008},{"style":328},[1009],{"type":37,"value":1010}," (k ",{"type":32,"tag":311,"props":1012,"children":1013},{"style":483},[1014],{"type":37,"value":612},{"type":32,"tag":311,"props":1016,"children":1017},{"style":328},[1018],{"type":37,"value":1019}," rank)\n",{"type":32,"tag":311,"props":1021,"children":1022},{"class":313,"line":26},[1023],{"type":32,"tag":311,"props":1024,"children":1025},{"emptyLinePlaceholder":464},[1026],{"type":37,"value":467},{"type":32,"tag":311,"props":1028,"children":1029},{"class":313,"line":452},[1030,1035,1039],{"type":32,"tag":311,"props":1031,"children":1032},{"style":328},[1033],{"type":37,"value":1034},"combined_scores ",{"type":32,"tag":311,"props":1036,"children":1037},{"style":483},[1038],{"type":37,"value":511},{"type":32,"tag":311,"props":1040,"children":1041},{"style":328},[1042],{"type":37,"value":1043}," {}\n",{"type":32,"tag":311,"props":1045,"children":1046},{"class":313,"line":460},[1047,1052,1057,1061,1066],{"type":32,"tag":311,"props":1048,"children":1049},{"style":483},[1050],{"type":37,"value":1051},"for",{"type":32,"tag":311,"props":1053,"children":1054},{"style":328},[1055],{"type":37,"value":1056}," rank, doc ",{"type":32,"tag":311,"props":1058,"children":1059},{"style":483},[1060],{"type":37,"value":645},{"type":32,"tag":311,"props":1062,"children":1063},{"style":446},[1064],{"type":37,"value":1065}," enumerate",{"type":32,"tag":311,"props":1067,"children":1068},{"style":328},[1069],{"type":37,"value":1070},"(bm25_results):\n",{"type":32,"tag":311,"props":1072,"children":1073},{"class":313,"line":470},[1074,1079,1083,1088,1092,1096,1100],{"type":32,"tag":311,"props":1075,"children":1076},{"style":328},[1077],{"type":37,"value":1078},"    combined_scores[doc.id] ",{"type":32,"tag":311,"props":1080,"children":1081},{"style":483},[1082],{"type":37,"value":511},{"type":32,"tag":311,"props":1084,"children":1085},{"style":328},[1086],{"type":37,"value":1087}," combined_scores.get(doc.id, ",{"type":32,"tag":311,"props":1089,"children":1090},{"style":446},[1091],{"type":37,"value":603},{"type":32,"tag":311,"props":1093,"children":1094},{"style":328},[1095],{"type":37,"value":621},{"type":32,"tag":311,"props":1097,"children":1098},{"style":483},[1099],{"type":37,"value":612},{"type":32,"tag":311,"props":1101,"children":1102},{"style":328},[1103],{"type":37,"value":1104}," rrf_score(rank)\n",{"type":32,"tag":311,"props":1106,"children":1107},{"class":313,"line":479},[1108,1112,1116,1120,1124],{"type":32,"tag":311,"props":1109,"children":1110},{"style":483},[1111],{"type":37,"value":1051},{"type":32,"tag":311,"props":1113,"children":1114},{"style":328},[1115],{"type":37,"value":1056},{"type":32,"tag":311,"props":1117,"children":1118},{"style":483},[1119],{"type":37,"value":645},{"type":32,"tag":311,"props":1121,"children":1122},{"style":446},[1123],{"type":37,"value":1065},{"type":32,"tag":311,"props":1125,"children":1126},{"style":328},[1127],{"type":37,"value":1128},"(vector_results):\n",{"type":32,"tag":311,"props":1130,"children":1131},{"class":313,"line":500},[1132,1136,1140,1144,1148,1152,1156],{"type":32,"tag":311,"props":1133,"children":1134},{"style":328},[1135],{"type":37,"value":1078},{"type":32,"tag":311,"props":1137,"children":1138},{"style":483},[1139],{"type":37,"value":511},{"type":32,"tag":311,"props":1141,"children":1142},{"style":328},[1143],{"type":37,"value":1087},{"type":32,"tag":311,"props":1145,"children":1146},{"style":446},[1147],{"type":37,"value":603},{"type":32,"tag":311,"props":1149,"children":1150},{"style":328},[1151],{"type":37,"value":621},{"type":32,"tag":311,"props":1153,"children":1154},{"style":483},[1155],{"type":37,"value":612},{"type":32,"tag":311,"props":1157,"children":1158},{"style":328},[1159],{"type":37,"value":1104},{"type":32,"tag":311,"props":1161,"children":1162},{"class":313,"line":573},[1163],{"type":32,"tag":311,"props":1164,"children":1165},{"emptyLinePlaceholder":464},[1166],{"type":37,"value":467},{"type":32,"tag":311,"props":1168,"children":1169},{"class":313,"line":663},[1170,1175,1179,1184,1189,1194,1199,1204,1209,1214,1219,1223,1228,1233,1237],{"type":32,"tag":311,"props":1171,"children":1172},{"style":328},[1173],{"type":37,"value":1174},"final_results ",{"type":32,"tag":311,"props":1176,"children":1177},{"style":483},[1178],{"type":37,"value":511},{"type":32,"tag":311,"props":1180,"children":1181},{"style":446},[1182],{"type":37,"value":1183}," sorted",{"type":32,"tag":311,"props":1185,"children":1186},{"style":328},[1187],{"type":37,"value":1188},"(combined_scores.items(), ",{"type":32,"tag":311,"props":1190,"children":1191},{"style":894},[1192],{"type":37,"value":1193},"key",{"type":32,"tag":311,"props":1195,"children":1196},{"style":483},[1197],{"type":37,"value":1198},"=lambda",{"type":32,"tag":311,"props":1200,"children":1201},{"style":328},[1202],{"type":37,"value":1203}," x: x[",{"type":32,"tag":311,"props":1205,"children":1206},{"style":446},[1207],{"type":37,"value":1208},"1",{"type":32,"tag":311,"props":1210,"children":1211},{"style":328},[1212],{"type":37,"value":1213},"], ",{"type":32,"tag":311,"props":1215,"children":1216},{"style":894},[1217],{"type":37,"value":1218},"reverse",{"type":32,"tag":311,"props":1220,"children":1221},{"style":483},[1222],{"type":37,"value":511},{"type":32,"tag":311,"props":1224,"children":1225},{"style":446},[1226],{"type":37,"value":1227},"True",{"type":32,"tag":311,"props":1229,"children":1230},{"style":328},[1231],{"type":37,"value":1232},")[:",{"type":32,"tag":311,"props":1234,"children":1235},{"style":446},[1236],{"type":37,"value":536},{"type":32,"tag":311,"props":1238,"children":1239},{"style":328},[1240],{"type":37,"value":431},{"type":32,"tag":33,"props":1242,"children":1243},{},[1244],{"type":37,"value":1245},"Test result: hybrid search boosted recall@5 by 22% on technical queries. But latency doubled because you're hitting two separate indexes. If that tradeoff is acceptable (internal tool, sub-500ms is fine), hybrid search works in production.",{"type":32,"tag":40,"props":1247,"children":1249},{"id":1248},"reranking-second-pass-filtering",[1250],{"type":37,"value":1251},"Reranking: Second-Pass Filtering",{"type":32,"tag":33,"props":1253,"children":1254},{},[1255,1257,1262],{"type":37,"value":1256},"Initial retrieval (BM25 + vector) returns 20–50 documents. But not all fit into LLM context (token limits + cost). A ",{"type":32,"tag":98,"props":1258,"children":1259},{},[1260],{"type":37,"value":1261},"reranker model",{"type":37,"value":1263}," rescores each document's relevance to the query and picks the top-5.",{"type":32,"tag":33,"props":1265,"children":1266},{},[1267,1269,1275,1277,1283],{"type":37,"value":1268},"Models like Cohere's ",{"type":32,"tag":57,"props":1270,"children":1272},{"className":1271},[],[1273],{"type":37,"value":1274},"rerank-english-v2.0",{"type":37,"value":1276}," or ",{"type":32,"tag":57,"props":1278,"children":1280},{"className":1279},[],[1281],{"type":37,"value":1282},"bge-reranker-large",{"type":37,"value":1284}," are standard. Rerankers use cross-encoder architecture—encoding query + document together—so they're more expensive than embeddings but far more accurate.",{"type":32,"tag":33,"props":1286,"children":1287},{},[1288],{"type":37,"value":1289},"Benchmark from our work: reranking over 50 documents:",{"type":32,"tag":156,"props":1291,"children":1292},{},[1293,1298,1303],{"type":32,"tag":108,"props":1294,"children":1295},{},[1296],{"type":37,"value":1297},"Recall@5: 0.73 → 0.89",{"type":32,"tag":108,"props":1299,"children":1300},{},[1301],{"type":37,"value":1302},"Latency: +180ms (acceptable)",{"type":32,"tag":108,"props":1304,"children":1305},{},[1306],{"type":37,"value":1307},"Cost: +$0.002 per retrieval (Cohere API)",{"type":32,"tag":33,"props":1309,"children":1310},{},[1311],{"type":37,"value":1312},"If budget is tight, self-hosted rerankers are an option but require GPU inference. At that point, you need to do the math: self-hosted infrastructure cost vs API cost.",{"type":32,"tag":40,"props":1314,"children":1316},{"id":1315},"context-window-optimization-fewer-documents-better-answers",[1317],{"type":37,"value":1318},"Context Window Optimization: Fewer Documents, Better Answers",{"type":32,"tag":33,"props":1320,"children":1321},{},[1322],{"type":37,"value":1323},"Sending 20 documents to an LLM doesn't always produce better answers. Long context triggers the \"lost in the middle\" problem—the model skips information in the middle. Test result: feeding GPT-4 Turbo 5 documents produces better answers than 15 documents (11% BLEU score difference).",{"type":32,"tag":33,"props":1325,"children":1326},{},[1327],{"type":32,"tag":98,"props":1328,"children":1329},{},[1330],{"type":37,"value":1331},"Optimization strategy:",{"type":32,"tag":104,"props":1333,"children":1334},{},[1335,1340,1345],{"type":32,"tag":108,"props":1336,"children":1337},{},[1338],{"type":37,"value":1339},"Use reranking to pick top-5",{"type":32,"tag":108,"props":1341,"children":1342},{},[1343],{"type":37,"value":1344},"Drop any document with relevance score \u003C 0.6",{"type":32,"tag":108,"props":1346,"children":1347},{},[1348],{"type":37,"value":1349},"Send the remaining 3–5 documents to LLM context",{"type":32,"tag":33,"props":1351,"children":1352},{},[1353],{"type":37,"value":1354},"This cuts input token cost (70% reduction) and improves answer quality. In production, you're balancing the cost\u002Flatency\u002Fquality triangle—your eval pipeline makes this visible.",{"type":32,"tag":40,"props":1356,"children":1358},{"id":1357},"production-monitoring-retrieval-drift",[1359],{"type":37,"value":1360},"Production Monitoring: Retrieval Drift",{"type":32,"tag":33,"props":1362,"children":1363},{},[1364,1366],{"type":37,"value":1365},"Retrieval quality degrades over time—as new documents are added, as query distribution shifts. Set up a dashboard to track ",{"type":32,"tag":98,"props":1367,"children":1368},{},[1369],{"type":37,"value":1370},"retrieval drift:",{"type":32,"tag":1372,"props":1373,"children":1374},"table",{},[1375,1399],{"type":32,"tag":1376,"props":1377,"children":1378},"thead",{},[1379],{"type":32,"tag":1380,"props":1381,"children":1382},"tr",{},[1383,1389,1394],{"type":32,"tag":1384,"props":1385,"children":1386},"th",{},[1387],{"type":37,"value":1388},"Metric",{"type":32,"tag":1384,"props":1390,"children":1391},{},[1392],{"type":37,"value":1393},"Target",{"type":32,"tag":1384,"props":1395,"children":1396},{},[1397],{"type":37,"value":1398},"Alert Threshold",{"type":32,"tag":1400,"props":1401,"children":1402},"tbody",{},[1403,1422,1440,1458],{"type":32,"tag":1380,"props":1404,"children":1405},{},[1406,1412,1417],{"type":32,"tag":1407,"props":1408,"children":1409},"td",{},[1410],{"type":37,"value":1411},"Recall@5 (weekly eval)",{"type":32,"tag":1407,"props":1413,"children":1414},{},[1415],{"type":37,"value":1416},"> 0.75",{"type":32,"tag":1407,"props":1418,"children":1419},{},[1420],{"type":37,"value":1421},"\u003C 0.70",{"type":32,"tag":1380,"props":1423,"children":1424},{},[1425,1430,1435],{"type":32,"tag":1407,"props":1426,"children":1427},{},[1428],{"type":37,"value":1429},"P95 latency",{"type":32,"tag":1407,"props":1431,"children":1432},{},[1433],{"type":37,"value":1434},"\u003C 400ms",{"type":32,"tag":1407,"props":1436,"children":1437},{},[1438],{"type":37,"value":1439},"> 600ms",{"type":32,"tag":1380,"props":1441,"children":1442},{},[1443,1448,1453],{"type":32,"tag":1407,"props":1444,"children":1445},{},[1446],{"type":37,"value":1447},"Zero-result queries (%)",{"type":32,"tag":1407,"props":1449,"children":1450},{},[1451],{"type":37,"value":1452},"\u003C 5%",{"type":32,"tag":1407,"props":1454,"children":1455},{},[1456],{"type":37,"value":1457},"> 10%",{"type":32,"tag":1380,"props":1459,"children":1460},{},[1461,1466,1471],{"type":32,"tag":1407,"props":1462,"children":1463},{},[1464],{"type":37,"value":1465},"Average relevance score",{"type":32,"tag":1407,"props":1467,"children":1468},{},[1469],{"type":37,"value":1470},"> 0.65",{"type":32,"tag":1407,"props":1472,"children":1473},{},[1474],{"type":37,"value":1475},"\u003C 0.55",{"type":32,"tag":33,"props":1477,"children":1478},{},[1479],{"type":37,"value":1480},"If you spot recall drift:",{"type":32,"tag":104,"props":1482,"children":1483},{},[1484,1489,1494],{"type":32,"tag":108,"props":1485,"children":1486},{},[1487],{"type":37,"value":1488},"Refresh your eval set (add new query patterns)",{"type":32,"tag":108,"props":1490,"children":1491},{},[1492],{"type":37,"value":1493},"Fine-tune your embedding model or swap it",{"type":32,"tag":108,"props":1495,"children":1496},{},[1497],{"type":37,"value":1498},"Review your chunking strategy",{"type":32,"tag":33,"props":1500,"children":1501},{},[1502,1504,1511],{"type":37,"value":1503},"This monitoring fits into ",{"type":32,"tag":129,"props":1505,"children":1508},{"href":1506,"rel":1507},"https:\u002F\u002Fwww.roibase.com.tr\u002Fen\u002Ffirstparty",[133],[1509],{"type":37,"value":1510},"first-party data & measurement architecture",{"type":37,"value":1512},"—your RAG system is a data pipeline and needs observability.",{"type":32,"tag":40,"props":1514,"children":1516},{"id":1515},"cost-vs-quality-tradeoff-pragmatic-choices",[1517],{"type":37,"value":1518},"Cost vs Quality Tradeoff: Pragmatic Choices",{"type":32,"tag":33,"props":1520,"children":1521},{},[1522],{"type":37,"value":1523},"Every production RAG decision involves a cost\u002Fquality\u002Flatency tradeoff. Some pragmatic picks:",{"type":32,"tag":156,"props":1525,"children":1526},{},[1527,1537,1547,1557],{"type":32,"tag":108,"props":1528,"children":1529},{},[1530,1535],{"type":32,"tag":98,"props":1531,"children":1532},{},[1533],{"type":37,"value":1534},"Embedding model:",{"type":37,"value":1536}," Swap OpenAI 3-large for Cohere v3 → 30% cost cut, 2% quality loss (acceptable)",{"type":32,"tag":108,"props":1538,"children":1539},{},[1540,1545],{"type":32,"tag":98,"props":1541,"children":1542},{},[1543],{"type":37,"value":1544},"Reranking:",{"type":37,"value":1546}," Rerank every query vs only ambiguous ones → 40% latency drop",{"type":32,"tag":108,"props":1548,"children":1549},{},[1550,1555],{"type":32,"tag":98,"props":1551,"children":1552},{},[1553],{"type":37,"value":1554},"Hybrid search:",{"type":37,"value":1556}," BM25 + vector vs vector alone (if exact match doesn't matter) → 50% latency drop",{"type":32,"tag":108,"props":1558,"children":1559},{},[1560,1565],{"type":32,"tag":98,"props":1561,"children":1562},{},[1563],{"type":37,"value":1564},"Context window:",{"type":37,"value":1566}," 10 docs vs 5 docs → 60% token cost cut, 8% quality gain",{"type":32,"tag":33,"props":1568,"children":1569},{},[1570],{"type":37,"value":1571},"Without an eval pipeline, you don't see these tradeoffs. You change embedding models, get cheaper, and miss the 15% drop in retrieval quality.",{"type":32,"tag":33,"props":1573,"children":1574},{},[1575],{"type":37,"value":1576},"Before shipping RAG to production, take embedding models, chunking strategies, and eval setup seriously. Cost optimization comes second—first, nail retrieval quality and keep it stable, then reduce costs. Otherwise, the system's unreliability surfaces to users and adoption tanks.",{"type":32,"tag":1578,"props":1579,"children":1580},"style",{},[1581],{"type":37,"value":1582},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":334,"depth":334,"links":1584},[1585,1586,1589,1590,1591,1592,1593,1594],{"id":42,"depth":324,"text":45},{"id":141,"depth":324,"text":144,"children":1587},[1588],{"id":272,"depth":334,"text":275},{"id":283,"depth":324,"text":286},{"id":844,"depth":324,"text":847},{"id":1248,"depth":324,"text":1251},{"id":1315,"depth":324,"text":1318},{"id":1357,"depth":324,"text":1360},{"id":1515,"depth":324,"text":1518},"markdown","content:en:ai:rag-retrieval-quality-over-cost.md","content","en\u002Fai\u002Frag-retrieval-quality-over-cost.md","en\u002Fai\u002Frag-retrieval-quality-over-cost","md",1778709810245]