[{"data":1,"prerenderedAt":1603},["ShallowReactive",2],{"article-alternates":3,"article-\u002Ffr\u002Fai\u002Frag-production-retrieval-kalitesi":13},{"i18nKey":4,"paths":5},"ai-003-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fen\u002Fai\u002Frag-retrieval-quality-over-cost","\u002Fes\u002Fai\u002Frag-en-produccion-calidad-de-recuperacion-antes-que-costo","\u002Ffr\u002Fai\u002Frag-production-retrieval-kalitesi","\u002Fit\u002Fai\u002Frag-production-retrieval-kalitesi-once-gelir","\u002Fru\u002Fai\u002Fproduction-rag-retrieval-quality-over-cost","\u002Ftr\u002Fai\u002Fproductionda-rag-retrieval-kalitesi-costtan-once-gelir",{"_path":9,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":1597,"_id":1598,"_source":1599,"_file":1600,"_stem":1601,"_extension":1602},"ai",false,"","RAG en Production : La Qualité de Récupération Avant le Coût","Si tu choisis mal ton modèle d'embedding, ta stratégie de chunking et ta setup d'évaluation, ton système RAG devient soit trop cher, soit trop lent, soit les deux. Quelles décisions prendre avant d'aller en production ?","2026-05-11",[21,22,23,24,25],"rag","embedding","chunking","llm-eval","retrieval-quality",9,"Roibase",{"type":29,"children":30,"toc":1585},"root",[31,39,46,51,73,94,103,123,139,145,150,155,174,218,223,256,269,276,281,287,292,300,827,837,842,848,860,1241,1246,1252,1264,1285,1290,1308,1313,1319,1324,1332,1350,1355,1361,1373,1478,1483,1501,1515,1521,1526,1569,1574,1579],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","Les systèmes RAG se sont généralisés en production depuis 2024. Les entreprises construisent des stacks embedding + vector database pour injecter leurs corpus documentaires dans les LLM. Mais la plupart des projets pilotes se heurtent au même problème : qualité de récupération médiocre, réponses incohérentes, coûts hors de contrôle. La racine ? Le choix hâtif du modèle d'embedding, de la stratégie de chunking et de la setup d'évaluation. Dans cet article, nous montrons quelles décisions n'ont pas de retour en arrière avant de mettre le pipeline RAG en production.",{"type":32,"tag":40,"props":41,"children":43},"h2",{"id":42},"modèle-dembedding-lalignement-domain-pas-la-dimension",[44],{"type":37,"value":45},"Modèle d'Embedding : L'Alignement Domain, Pas la Dimension",{"type":32,"tag":33,"props":47,"children":48},{},[49],{"type":37,"value":50},"Quand tu sélectionnes un modèle d'embedding, le premier réflexe est : « Quel est le meilleur score MTEB ? » Or, le classement des benchmarks ne garantit pas la performance en production. Ce qui compte, c'est l'alignement du modèle avec ton type de document et ton pattern de requête.",{"type":32,"tag":33,"props":52,"children":53},{},[54,56,63,65,71],{"type":37,"value":55},"Nous avons comparé OpenAI ",{"type":32,"tag":57,"props":58,"children":60},"code",{"className":59},[],[61],{"type":37,"value":62},"text-embedding-3-large",{"type":37,"value":64}," (3072 dim) et Cohere ",{"type":32,"tag":57,"props":66,"children":68},{"className":67},[],[69],{"type":37,"value":70},"embed-v3",{"type":37,"value":72}," (1024 dim). Cohere a livré un recall@10 plus constant sur les documents marketing (blogs, case studies, landing pages), car son jeu de données d'entraînement était dominé par du contenu business. Bien que la plus grande dimension d'OpenAI affiche une meilleure performance sur les benchmarks généraux, la distribution des requêtes spécifiques au domaine diffère.",{"type":32,"tag":33,"props":74,"children":75},{},[76,78,84,86,92],{"type":37,"value":77},"Autre exemple : ",{"type":32,"tag":57,"props":79,"children":81},{"className":80},[],[82],{"type":37,"value":83},"bge-large-en-v1.5",{"type":37,"value":85}," (1024 dim, auto-hébergé) suffit pour les documents légaux. Mais sur un corpus multilingue, ",{"type":32,"tag":57,"props":87,"children":89},{"className":88},[],[90],{"type":37,"value":91},"multilingual-e5-large",{"type":37,"value":93}," (1024 dim) surpasse clairement. La taille du modèle n'est pas toujours un signal de qualité — le chevauchement entre les données d'entraînement et votre domaine est plus critique.",{"type":32,"tag":33,"props":95,"children":96},{},[97],{"type":32,"tag":98,"props":99,"children":100},"strong",{},[101],{"type":37,"value":102},"Critères de sélection :",{"type":32,"tag":104,"props":105,"children":106},"ol",{},[107,113,118],{"type":32,"tag":108,"props":109,"children":110},"li",{},[111],{"type":37,"value":112},"Pas le score MTEB, mais recall@5 \u002F MRR sur ton propre ensemble d'évaluation",{"type":32,"tag":108,"props":114,"children":115},{},[116],{"type":37,"value":117},"Latence (API vs auto-hébergé) — temps de batch embedding pour 512 documents",{"type":32,"tag":108,"props":119,"children":120},{},[121],{"type":37,"value":122},"Coût par 1M tokens — OpenAI 3-large $0.13, Cohere v3 $0.10, auto-hébergé $0 mais infra requise",{"type":32,"tag":33,"props":124,"children":125},{},[126,128,137],{"type":37,"value":127},"Si ton corpus contient du jargon spécifique au domaine (pharma, finance, légal), fine-tuner un modèle d'embedding ou adapter des sentence transformers sur tes données améliore la qualité de récupération de 15-20 %. Cela relève de ",{"type":32,"tag":129,"props":130,"children":134},"a",{"href":131,"rel":132},"https:\u002F\u002Fwww.roibase.com.tr\u002Ffr\u002Fverianalizi",[133],"nofollow",[135],{"type":37,"value":136},"l'ingénierie de données et d'insights",{"type":37,"value":138}," — construire un pipeline d'entraînement et observer la qualité des données.",{"type":32,"tag":40,"props":140,"children":142},{"id":141},"stratégie-de-chunking-la-taille-fixe-ne-marche-pas",[143],{"type":37,"value":144},"Stratégie de Chunking : La Taille Fixe Ne Marche Pas",{"type":32,"tag":33,"props":146,"children":147},{},[148],{"type":37,"value":149},"La plupart des implémentations RAG commencent avec un chunking « fenêtre de 512 tokens avec chevauchement ». C'est acceptable pour les blogs markdown, mais ça s'effondre sur un corpus hétérogène (PDF, HTML, JSON).",{"type":32,"tag":33,"props":151,"children":152},{},[153],{"type":37,"value":154},"Les problèmes du chunking à taille fixe :",{"type":32,"tag":156,"props":157,"children":158},"ul",{},[159,164,169],{"type":32,"tag":108,"props":160,"children":161},{},[162],{"type":37,"value":163},"Les titres se fragmentent, l'intégrité sémantique se perd",{"type":32,"tag":108,"props":165,"children":166},{},[167],{"type":37,"value":168},"Les tableaux, les blocs de code sont coupés au milieu",{"type":32,"tag":108,"props":170,"children":171},{},[172],{"type":37,"value":173},"La stratégie de chevauchement duplique le contexte, introduit du bruit de récupération",{"type":32,"tag":33,"props":175,"children":176},{},[177,179,184,186,192,194,200,202,208,210,216],{"type":37,"value":178},"Alternative : ",{"type":32,"tag":98,"props":180,"children":181},{},[182],{"type":37,"value":183},"chunking sémantique",{"type":37,"value":185},". Segmenter au respect des limites de phrase, de la hiérarchie des titres. Plutôt que ",{"type":32,"tag":57,"props":187,"children":189},{"className":188},[],[190],{"type":37,"value":191},"RecursiveCharacterTextSplitter",{"type":37,"value":193}," de ",{"type":32,"tag":57,"props":195,"children":197},{"className":196},[],[198],{"type":37,"value":199},"langchain",{"type":37,"value":201},", utiliser ",{"type":32,"tag":57,"props":203,"children":205},{"className":204},[],[206],{"type":37,"value":207},"MarkdownTextSplitter",{"type":37,"value":209}," ou un parseur personnalisé. Sur les PDF, utiliser ",{"type":32,"tag":57,"props":211,"children":213},{"className":212},[],[214],{"type":37,"value":215},"pdfplumber",{"type":37,"value":217}," pour séparer tableaux et texte, appliquer des stratégies de chunking différentes.",{"type":32,"tag":33,"props":219,"children":220},{},[221],{"type":37,"value":222},"Pour une pile RAG en e-commerce, nous avons segmenté les documents produit en 3 types de chunks :",{"type":32,"tag":156,"props":224,"children":225},{},[226,236,246],{"type":32,"tag":108,"props":227,"children":228},{},[229,234],{"type":32,"tag":98,"props":230,"children":231},{},[232],{"type":37,"value":233},"Titre + description courte :",{"type":37,"value":235}," 128 tokens, léger pour la récupération",{"type":32,"tag":108,"props":237,"children":238},{},[239,244],{"type":32,"tag":98,"props":240,"children":241},{},[242],{"type":37,"value":243},"Spécifications techniques + tableau :",{"type":37,"value":245}," 256 tokens, données structurées",{"type":32,"tag":108,"props":247,"children":248},{},[249,254],{"type":32,"tag":98,"props":250,"children":251},{},[252],{"type":37,"value":253},"Contenu long (blog, guide) :",{"type":37,"value":255}," 512 tokens, split sémantique",{"type":32,"tag":33,"props":257,"children":258},{},[259,261,267],{"type":37,"value":260},"Nous avons ajouté des métadonnées à chaque chunk (chunk_type, source_page). Pendant la récupération, nous avons appliqué des filtres chunk_type selon le type de requête. Par exemple, les requêtes « comparaison de produit » ne regardent que les chunks ",{"type":32,"tag":57,"props":262,"children":264},{"className":263},[],[265],{"type":37,"value":266},"technical_specs",{"type":37,"value":268},". Cela a augmenté precision@3 de 18 %.",{"type":32,"tag":270,"props":271,"children":273},"h3",{"id":272},"stratégie-de-chevauchement-combien-suffit",[274],{"type":37,"value":275},"Stratégie de Chevauchement : Combien Suffit ?",{"type":32,"tag":33,"props":277,"children":278},{},[279],{"type":37,"value":280},"Le chevauchement est généralement recommandé à 10-20 %, mais c'est arbitraire. Résultat du test : 50 tokens de chevauchement sur chunks de 512 tokens préserve la continuité sémantique. 100 tokens de chevauchement augmentent la latence de récupération de 12 % sans gain de qualité. Le sweet spot varie selon le domaine — teste avec ton propre ensemble d'évaluation.",{"type":32,"tag":40,"props":282,"children":284},{"id":283},"setup-dévaluation-à-construire-avant-la-production",[285],{"type":37,"value":286},"Setup d'Évaluation : À Construire Avant la Production",{"type":32,"tag":33,"props":288,"children":289},{},[290],{"type":37,"value":291},"La plupart des systèmes RAG passent en production sur le test « ça a l'air visuellement bon ». Mais sans une setup d'évaluation structurée pour mesurer la qualité de récupération, le système ne sera fiable que sur les 1000 premières requêtes.",{"type":32,"tag":33,"props":293,"children":294},{},[295],{"type":32,"tag":98,"props":296,"children":297},{},[298],{"type":37,"value":299},"Pipeline d'évaluation minimal :",{"type":32,"tag":301,"props":302,"children":306},"pre",{"className":303,"code":304,"language":305,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","# eval_set.json — dataset de référence\n[\n  {\n    \"query\": \"Comment collecter les consentements utilisateur conformes au RGPD ?\",\n    \"expected_docs\": [\"doc_42\", \"doc_89\"],\n    \"expected_answer_contains\": [\"déclaration de cookie\", \"consentement explicite\"]\n  },\n  ...\n]\n\n# métriques d'évaluation\ndef evaluate_retrieval(query, retrieved_docs, expected_docs):\n    recall_at_k = len(set(retrieved_docs[:5]) & set(expected_docs)) \u002F len(expected_docs)\n    mrr = 1 \u002F (retrieved_docs.index(expected_docs[0]) + 1) if expected_docs[0] in retrieved_docs else 0\n    return {\"recall@5\": recall_at_k, \"mrr\": mrr}\n\ndef evaluate_generation(generated_answer, expected_contains):\n    # LLM-as-judge: demande à Claude « cette réponse couvre-t-elle le contenu attendu ? »\n    prompt = f\"Expected: {expected_contains}\\nGenerated: {generated_answer}\\nScore 0-1:\"\n    score = claude_api(prompt)\n    return float(score)\n","python",[307],{"type":32,"tag":57,"props":308,"children":309},{"__ignoreMap":16},[310,322,332,341,366,400,432,441,451,458,468,477,498,571,661,695,703,721,730,791,809],{"type":32,"tag":311,"props":312,"children":315},"span",{"class":313,"line":314},"line",1,[316],{"type":32,"tag":311,"props":317,"children":319},{"style":318},"--shiki-default:#6A737D",[320],{"type":37,"value":321},"# eval_set.json — dataset de référence\n",{"type":32,"tag":311,"props":323,"children":325},{"class":313,"line":324},2,[326],{"type":32,"tag":311,"props":327,"children":329},{"style":328},"--shiki-default:#E1E4E8",[330],{"type":37,"value":331},"[\n",{"type":32,"tag":311,"props":333,"children":335},{"class":313,"line":334},3,[336],{"type":32,"tag":311,"props":337,"children":338},{"style":328},[339],{"type":37,"value":340},"  {\n",{"type":32,"tag":311,"props":342,"children":344},{"class":313,"line":343},4,[345,351,356,361],{"type":32,"tag":311,"props":346,"children":348},{"style":347},"--shiki-default:#9ECBFF",[349],{"type":37,"value":350},"    \"query\"",{"type":32,"tag":311,"props":352,"children":353},{"style":328},[354],{"type":37,"value":355},": ",{"type":32,"tag":311,"props":357,"children":358},{"style":347},[359],{"type":37,"value":360},"\"Comment collecter les consentements utilisateur conformes au RGPD ?\"",{"type":32,"tag":311,"props":362,"children":363},{"style":328},[364],{"type":37,"value":365},",\n",{"type":32,"tag":311,"props":367,"children":369},{"class":313,"line":368},5,[370,375,380,385,390,395],{"type":32,"tag":311,"props":371,"children":372},{"style":347},[373],{"type":37,"value":374},"    \"expected_docs\"",{"type":32,"tag":311,"props":376,"children":377},{"style":328},[378],{"type":37,"value":379},": [",{"type":32,"tag":311,"props":381,"children":382},{"style":347},[383],{"type":37,"value":384},"\"doc_42\"",{"type":32,"tag":311,"props":386,"children":387},{"style":328},[388],{"type":37,"value":389},", ",{"type":32,"tag":311,"props":391,"children":392},{"style":347},[393],{"type":37,"value":394},"\"doc_89\"",{"type":32,"tag":311,"props":396,"children":397},{"style":328},[398],{"type":37,"value":399},"],\n",{"type":32,"tag":311,"props":401,"children":403},{"class":313,"line":402},6,[404,409,413,418,422,427],{"type":32,"tag":311,"props":405,"children":406},{"style":347},[407],{"type":37,"value":408},"    \"expected_answer_contains\"",{"type":32,"tag":311,"props":410,"children":411},{"style":328},[412],{"type":37,"value":379},{"type":32,"tag":311,"props":414,"children":415},{"style":347},[416],{"type":37,"value":417},"\"déclaration de cookie\"",{"type":32,"tag":311,"props":419,"children":420},{"style":328},[421],{"type":37,"value":389},{"type":32,"tag":311,"props":423,"children":424},{"style":347},[425],{"type":37,"value":426},"\"consentement explicite\"",{"type":32,"tag":311,"props":428,"children":429},{"style":328},[430],{"type":37,"value":431},"]\n",{"type":32,"tag":311,"props":433,"children":435},{"class":313,"line":434},7,[436],{"type":32,"tag":311,"props":437,"children":438},{"style":328},[439],{"type":37,"value":440},"  },\n",{"type":32,"tag":311,"props":442,"children":444},{"class":313,"line":443},8,[445],{"type":32,"tag":311,"props":446,"children":448},{"style":447},"--shiki-default:#79B8FF",[449],{"type":37,"value":450},"  ...\n",{"type":32,"tag":311,"props":452,"children":453},{"class":313,"line":26},[454],{"type":32,"tag":311,"props":455,"children":456},{"style":328},[457],{"type":37,"value":431},{"type":32,"tag":311,"props":459,"children":461},{"class":313,"line":460},10,[462],{"type":32,"tag":311,"props":463,"children":465},{"emptyLinePlaceholder":464},true,[466],{"type":37,"value":467},"\n",{"type":32,"tag":311,"props":469,"children":471},{"class":313,"line":470},11,[472],{"type":32,"tag":311,"props":473,"children":474},{"style":318},[475],{"type":37,"value":476},"# métriques d'évaluation\n",{"type":32,"tag":311,"props":478,"children":480},{"class":313,"line":479},12,[481,487,493],{"type":32,"tag":311,"props":482,"children":484},{"style":483},"--shiki-default:#F97583",[485],{"type":37,"value":486},"def",{"type":32,"tag":311,"props":488,"children":490},{"style":489},"--shiki-default:#B392F0",[491],{"type":37,"value":492}," evaluate_retrieval",{"type":32,"tag":311,"props":494,"children":495},{"style":328},[496],{"type":37,"value":497},"(query, retrieved_docs, expected_docs):\n",{"type":32,"tag":311,"props":499,"children":501},{"class":313,"line":500},13,[502,507,512,517,522,527,532,537,542,547,552,557,562,566],{"type":32,"tag":311,"props":503,"children":504},{"style":328},[505],{"type":37,"value":506},"    recall_at_k ",{"type":32,"tag":311,"props":508,"children":509},{"style":483},[510],{"type":37,"value":511},"=",{"type":32,"tag":311,"props":513,"children":514},{"style":447},[515],{"type":37,"value":516}," len",{"type":32,"tag":311,"props":518,"children":519},{"style":328},[520],{"type":37,"value":521},"(",{"type":32,"tag":311,"props":523,"children":524},{"style":447},[525],{"type":37,"value":526},"set",{"type":32,"tag":311,"props":528,"children":529},{"style":328},[530],{"type":37,"value":531},"(retrieved_docs[:",{"type":32,"tag":311,"props":533,"children":534},{"style":447},[535],{"type":37,"value":536},"5",{"type":32,"tag":311,"props":538,"children":539},{"style":328},[540],{"type":37,"value":541},"]) ",{"type":32,"tag":311,"props":543,"children":544},{"style":483},[545],{"type":37,"value":546},"&",{"type":32,"tag":311,"props":548,"children":549},{"style":447},[550],{"type":37,"value":551}," set",{"type":32,"tag":311,"props":553,"children":554},{"style":328},[555],{"type":37,"value":556},"(expected_docs)) ",{"type":32,"tag":311,"props":558,"children":559},{"style":483},[560],{"type":37,"value":561},"\u002F",{"type":32,"tag":311,"props":563,"children":564},{"style":447},[565],{"type":37,"value":516},{"type":32,"tag":311,"props":567,"children":568},{"style":328},[569],{"type":37,"value":570},"(expected_docs)\n",{"type":32,"tag":311,"props":572,"children":574},{"class":313,"line":573},14,[575,580,584,589,594,599,604,608,613,617,622,627,632,636,641,646,651,656],{"type":32,"tag":311,"props":576,"children":577},{"style":328},[578],{"type":37,"value":579},"    mrr ",{"type":32,"tag":311,"props":581,"children":582},{"style":483},[583],{"type":37,"value":511},{"type":32,"tag":311,"props":585,"children":586},{"style":447},[587],{"type":37,"value":588}," 1",{"type":32,"tag":311,"props":590,"children":591},{"style":483},[592],{"type":37,"value":593}," \u002F",{"type":32,"tag":311,"props":595,"children":596},{"style":328},[597],{"type":37,"value":598}," (retrieved_docs.index(expected_docs[",{"type":32,"tag":311,"props":600,"children":601},{"style":447},[602],{"type":37,"value":603},"0",{"type":32,"tag":311,"props":605,"children":606},{"style":328},[607],{"type":37,"value":541},{"type":32,"tag":311,"props":609,"children":610},{"style":483},[611],{"type":37,"value":612},"+",{"type":32,"tag":311,"props":614,"children":615},{"style":447},[616],{"type":37,"value":588},{"type":32,"tag":311,"props":618,"children":619},{"style":328},[620],{"type":37,"value":621},") ",{"type":32,"tag":311,"props":623,"children":624},{"style":483},[625],{"type":37,"value":626},"if",{"type":32,"tag":311,"props":628,"children":629},{"style":328},[630],{"type":37,"value":631}," expected_docs[",{"type":32,"tag":311,"props":633,"children":634},{"style":447},[635],{"type":37,"value":603},{"type":32,"tag":311,"props":637,"children":638},{"style":328},[639],{"type":37,"value":640},"] ",{"type":32,"tag":311,"props":642,"children":643},{"style":483},[644],{"type":37,"value":645},"in",{"type":32,"tag":311,"props":647,"children":648},{"style":328},[649],{"type":37,"value":650}," retrieved_docs ",{"type":32,"tag":311,"props":652,"children":653},{"style":483},[654],{"type":37,"value":655},"else",{"type":32,"tag":311,"props":657,"children":658},{"style":447},[659],{"type":37,"value":660}," 0\n",{"type":32,"tag":311,"props":662,"children":664},{"class":313,"line":663},15,[665,670,675,680,685,690],{"type":32,"tag":311,"props":666,"children":667},{"style":483},[668],{"type":37,"value":669},"    return",{"type":32,"tag":311,"props":671,"children":672},{"style":328},[673],{"type":37,"value":674}," {",{"type":32,"tag":311,"props":676,"children":677},{"style":347},[678],{"type":37,"value":679},"\"recall@5\"",{"type":32,"tag":311,"props":681,"children":682},{"style":328},[683],{"type":37,"value":684},": recall_at_k, ",{"type":32,"tag":311,"props":686,"children":687},{"style":347},[688],{"type":37,"value":689},"\"mrr\"",{"type":32,"tag":311,"props":691,"children":692},{"style":328},[693],{"type":37,"value":694},": mrr}\n",{"type":32,"tag":311,"props":696,"children":698},{"class":313,"line":697},16,[699],{"type":32,"tag":311,"props":700,"children":701},{"emptyLinePlaceholder":464},[702],{"type":37,"value":467},{"type":32,"tag":311,"props":704,"children":706},{"class":313,"line":705},17,[707,711,716],{"type":32,"tag":311,"props":708,"children":709},{"style":483},[710],{"type":37,"value":486},{"type":32,"tag":311,"props":712,"children":713},{"style":489},[714],{"type":37,"value":715}," evaluate_generation",{"type":32,"tag":311,"props":717,"children":718},{"style":328},[719],{"type":37,"value":720},"(generated_answer, expected_contains):\n",{"type":32,"tag":311,"props":722,"children":724},{"class":313,"line":723},18,[725],{"type":32,"tag":311,"props":726,"children":727},{"style":318},[728],{"type":37,"value":729},"    # LLM-as-judge: demande à Claude « cette réponse couvre-t-elle le contenu attendu ? »\n",{"type":32,"tag":311,"props":731,"children":733},{"class":313,"line":732},19,[734,739,743,748,753,758,763,768,773,777,782,786],{"type":32,"tag":311,"props":735,"children":736},{"style":328},[737],{"type":37,"value":738},"    prompt ",{"type":32,"tag":311,"props":740,"children":741},{"style":483},[742],{"type":37,"value":511},{"type":32,"tag":311,"props":744,"children":745},{"style":483},[746],{"type":37,"value":747}," f",{"type":32,"tag":311,"props":749,"children":750},{"style":347},[751],{"type":37,"value":752},"\"Expected: ",{"type":32,"tag":311,"props":754,"children":755},{"style":447},[756],{"type":37,"value":757},"{",{"type":32,"tag":311,"props":759,"children":760},{"style":328},[761],{"type":37,"value":762},"expected_contains",{"type":32,"tag":311,"props":764,"children":765},{"style":447},[766],{"type":37,"value":767},"}\\n",{"type":32,"tag":311,"props":769,"children":770},{"style":347},[771],{"type":37,"value":772},"Generated: ",{"type":32,"tag":311,"props":774,"children":775},{"style":447},[776],{"type":37,"value":757},{"type":32,"tag":311,"props":778,"children":779},{"style":328},[780],{"type":37,"value":781},"generated_answer",{"type":32,"tag":311,"props":783,"children":784},{"style":447},[785],{"type":37,"value":767},{"type":32,"tag":311,"props":787,"children":788},{"style":347},[789],{"type":37,"value":790},"Score 0-1:\"\n",{"type":32,"tag":311,"props":792,"children":794},{"class":313,"line":793},20,[795,800,804],{"type":32,"tag":311,"props":796,"children":797},{"style":328},[798],{"type":37,"value":799},"    score ",{"type":32,"tag":311,"props":801,"children":802},{"style":483},[803],{"type":37,"value":511},{"type":32,"tag":311,"props":805,"children":806},{"style":328},[807],{"type":37,"value":808}," claude_api(prompt)\n",{"type":32,"tag":311,"props":810,"children":812},{"class":313,"line":811},21,[813,817,822],{"type":32,"tag":311,"props":814,"children":815},{"style":483},[816],{"type":37,"value":669},{"type":32,"tag":311,"props":818,"children":819},{"style":447},[820],{"type":37,"value":821}," float",{"type":32,"tag":311,"props":823,"children":824},{"style":328},[825],{"type":37,"value":826},"(score)\n",{"type":32,"tag":33,"props":828,"children":829},{},[830,835],{"type":32,"tag":98,"props":831,"children":832},{},[833],{"type":37,"value":834},"Fréquence d'évaluation :",{"type":37,"value":836}," Après chaque changement de modèle d'embedding, ajustement de stratégie de chunking. Doit être automatisé dans la CI\u002FCD. Si recall@5 \u003C 0.7, le déploiement doit être bloqué.",{"type":32,"tag":33,"props":838,"children":839},{},[840],{"type":37,"value":841},"En production réelle : nous avons préparé un ensemble d'évaluation de 200 requêtes pour un client. Le pipeline d'évaluation s'exécutait automatiquement à chaque commit. Un changement de chunking a augmenté recall@5 de 0.68 à 0.81, mais la latence p95 a grimpé de 340ms à 520ms. En voyant ce tradeoff sur le dashboard, nous avons rejeté le chunking et testé une autre approche. Sans évaluation, ce tradeoff aurait été invisible.",{"type":32,"tag":40,"props":843,"children":845},{"id":844},"recherche-hybride-récupération-creuse-dense",[846],{"type":37,"value":847},"Recherche Hybride : Récupération Creuse + Dense",{"type":32,"tag":33,"props":849,"children":850},{},[851,853,858],{"type":37,"value":852},"S'appuyer uniquement sur la similarité vectorielle échoue sur les cas limites. Par exemple, les requêtes demandant une correspondance exacte (code produit, nom d'endpoint API) peuvent obtenir des scores bas en recherche vectorielle. C'est là que la ",{"type":32,"tag":98,"props":854,"children":855},{},[856],{"type":37,"value":857},"recherche hybride",{"type":37,"value":859}," intervient : combine les scores BM25 (creux) + embedding (dense).",{"type":32,"tag":301,"props":861,"children":863},{"className":303,"code":862,"language":305,"meta":16,"style":16},"# exemple de récupération hybride\nbm25_results = bm25_index.search(query, top_k=20)\nvector_results = vector_db.search(query_embedding, top_k=20)\n\n# RRF (Reciprocal Rank Fusion)\ndef rrf_score(rank, k=60):\n    return 1 \u002F (k + rank)\n\ncombined_scores = {}\nfor rank, doc in enumerate(bm25_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\nfor rank, doc in enumerate(vector_results):\n    combined_scores[doc.id] = combined_scores.get(doc.id, 0) + rrf_score(rank)\n\nfinal_results = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)[:5]\n",[864],{"type":32,"tag":57,"props":865,"children":866},{"__ignoreMap":16},[867,875,912,945,952,960,991,1020,1027,1044,1071,1105,1129,1160,1167],{"type":32,"tag":311,"props":868,"children":869},{"class":313,"line":314},[870],{"type":32,"tag":311,"props":871,"children":872},{"style":318},[873],{"type":37,"value":874},"# exemple de récupération hybride\n",{"type":32,"tag":311,"props":876,"children":877},{"class":313,"line":324},[878,883,887,892,898,902,907],{"type":32,"tag":311,"props":879,"children":880},{"style":328},[881],{"type":37,"value":882},"bm25_results ",{"type":32,"tag":311,"props":884,"children":885},{"style":483},[886],{"type":37,"value":511},{"type":32,"tag":311,"props":888,"children":889},{"style":328},[890],{"type":37,"value":891}," bm25_index.search(query, ",{"type":32,"tag":311,"props":893,"children":895},{"style":894},"--shiki-default:#FFAB70",[896],{"type":37,"value":897},"top_k",{"type":32,"tag":311,"props":899,"children":900},{"style":483},[901],{"type":37,"value":511},{"type":32,"tag":311,"props":903,"children":904},{"style":447},[905],{"type":37,"value":906},"20",{"type":32,"tag":311,"props":908,"children":909},{"style":328},[910],{"type":37,"value":911},")\n",{"type":32,"tag":311,"props":913,"children":914},{"class":313,"line":334},[915,920,924,929,933,937,941],{"type":32,"tag":311,"props":916,"children":917},{"style":328},[918],{"type":37,"value":919},"vector_results ",{"type":32,"tag":311,"props":921,"children":922},{"style":483},[923],{"type":37,"value":511},{"type":32,"tag":311,"props":925,"children":926},{"style":328},[927],{"type":37,"value":928}," vector_db.search(query_embedding, ",{"type":32,"tag":311,"props":930,"children":931},{"style":894},[932],{"type":37,"value":897},{"type":32,"tag":311,"props":934,"children":935},{"style":483},[936],{"type":37,"value":511},{"type":32,"tag":311,"props":938,"children":939},{"style":447},[940],{"type":37,"value":906},{"type":32,"tag":311,"props":942,"children":943},{"style":328},[944],{"type":37,"value":911},{"type":32,"tag":311,"props":946,"children":947},{"class":313,"line":343},[948],{"type":32,"tag":311,"props":949,"children":950},{"emptyLinePlaceholder":464},[951],{"type":37,"value":467},{"type":32,"tag":311,"props":953,"children":954},{"class":313,"line":368},[955],{"type":32,"tag":311,"props":956,"children":957},{"style":318},[958],{"type":37,"value":959},"# RRF (Reciprocal Rank Fusion)\n",{"type":32,"tag":311,"props":961,"children":962},{"class":313,"line":402},[963,967,972,977,981,986],{"type":32,"tag":311,"props":964,"children":965},{"style":483},[966],{"type":37,"value":486},{"type":32,"tag":311,"props":968,"children":969},{"style":489},[970],{"type":37,"value":971}," rrf_score",{"type":32,"tag":311,"props":973,"children":974},{"style":328},[975],{"type":37,"value":976},"(rank, k",{"type":32,"tag":311,"props":978,"children":979},{"style":483},[980],{"type":37,"value":511},{"type":32,"tag":311,"props":982,"children":983},{"style":447},[984],{"type":37,"value":985},"60",{"type":32,"tag":311,"props":987,"children":988},{"style":328},[989],{"type":37,"value":990},"):\n",{"type":32,"tag":311,"props":992,"children":993},{"class":313,"line":434},[994,998,1002,1006,1011,1015],{"type":32,"tag":311,"props":995,"children":996},{"style":483},[997],{"type":37,"value":669},{"type":32,"tag":311,"props":999,"children":1000},{"style":447},[1001],{"type":37,"value":588},{"type":32,"tag":311,"props":1003,"children":1004},{"style":483},[1005],{"type":37,"value":593},{"type":32,"tag":311,"props":1007,"children":1008},{"style":328},[1009],{"type":37,"value":1010}," (k ",{"type":32,"tag":311,"props":1012,"children":1013},{"style":483},[1014],{"type":37,"value":612},{"type":32,"tag":311,"props":1016,"children":1017},{"style":328},[1018],{"type":37,"value":1019}," rank)\n",{"type":32,"tag":311,"props":1021,"children":1022},{"class":313,"line":443},[1023],{"type":32,"tag":311,"props":1024,"children":1025},{"emptyLinePlaceholder":464},[1026],{"type":37,"value":467},{"type":32,"tag":311,"props":1028,"children":1029},{"class":313,"line":26},[1030,1035,1039],{"type":32,"tag":311,"props":1031,"children":1032},{"style":328},[1033],{"type":37,"value":1034},"combined_scores ",{"type":32,"tag":311,"props":1036,"children":1037},{"style":483},[1038],{"type":37,"value":511},{"type":32,"tag":311,"props":1040,"children":1041},{"style":328},[1042],{"type":37,"value":1043}," {}\n",{"type":32,"tag":311,"props":1045,"children":1046},{"class":313,"line":460},[1047,1052,1057,1061,1066],{"type":32,"tag":311,"props":1048,"children":1049},{"style":483},[1050],{"type":37,"value":1051},"for",{"type":32,"tag":311,"props":1053,"children":1054},{"style":328},[1055],{"type":37,"value":1056}," rank, doc ",{"type":32,"tag":311,"props":1058,"children":1059},{"style":483},[1060],{"type":37,"value":645},{"type":32,"tag":311,"props":1062,"children":1063},{"style":447},[1064],{"type":37,"value":1065}," enumerate",{"type":32,"tag":311,"props":1067,"children":1068},{"style":328},[1069],{"type":37,"value":1070},"(bm25_results):\n",{"type":32,"tag":311,"props":1072,"children":1073},{"class":313,"line":470},[1074,1079,1083,1088,1092,1096,1100],{"type":32,"tag":311,"props":1075,"children":1076},{"style":328},[1077],{"type":37,"value":1078},"    combined_scores[doc.id] ",{"type":32,"tag":311,"props":1080,"children":1081},{"style":483},[1082],{"type":37,"value":511},{"type":32,"tag":311,"props":1084,"children":1085},{"style":328},[1086],{"type":37,"value":1087}," combined_scores.get(doc.id, ",{"type":32,"tag":311,"props":1089,"children":1090},{"style":447},[1091],{"type":37,"value":603},{"type":32,"tag":311,"props":1093,"children":1094},{"style":328},[1095],{"type":37,"value":621},{"type":32,"tag":311,"props":1097,"children":1098},{"style":483},[1099],{"type":37,"value":612},{"type":32,"tag":311,"props":1101,"children":1102},{"style":328},[1103],{"type":37,"value":1104}," rrf_score(rank)\n",{"type":32,"tag":311,"props":1106,"children":1107},{"class":313,"line":479},[1108,1112,1116,1120,1124],{"type":32,"tag":311,"props":1109,"children":1110},{"style":483},[1111],{"type":37,"value":1051},{"type":32,"tag":311,"props":1113,"children":1114},{"style":328},[1115],{"type":37,"value":1056},{"type":32,"tag":311,"props":1117,"children":1118},{"style":483},[1119],{"type":37,"value":645},{"type":32,"tag":311,"props":1121,"children":1122},{"style":447},[1123],{"type":37,"value":1065},{"type":32,"tag":311,"props":1125,"children":1126},{"style":328},[1127],{"type":37,"value":1128},"(vector_results):\n",{"type":32,"tag":311,"props":1130,"children":1131},{"class":313,"line":500},[1132,1136,1140,1144,1148,1152,1156],{"type":32,"tag":311,"props":1133,"children":1134},{"style":328},[1135],{"type":37,"value":1078},{"type":32,"tag":311,"props":1137,"children":1138},{"style":483},[1139],{"type":37,"value":511},{"type":32,"tag":311,"props":1141,"children":1142},{"style":328},[1143],{"type":37,"value":1087},{"type":32,"tag":311,"props":1145,"children":1146},{"style":447},[1147],{"type":37,"value":603},{"type":32,"tag":311,"props":1149,"children":1150},{"style":328},[1151],{"type":37,"value":621},{"type":32,"tag":311,"props":1153,"children":1154},{"style":483},[1155],{"type":37,"value":612},{"type":32,"tag":311,"props":1157,"children":1158},{"style":328},[1159],{"type":37,"value":1104},{"type":32,"tag":311,"props":1161,"children":1162},{"class":313,"line":573},[1163],{"type":32,"tag":311,"props":1164,"children":1165},{"emptyLinePlaceholder":464},[1166],{"type":37,"value":467},{"type":32,"tag":311,"props":1168,"children":1169},{"class":313,"line":663},[1170,1175,1179,1184,1189,1194,1199,1204,1209,1214,1219,1223,1228,1233,1237],{"type":32,"tag":311,"props":1171,"children":1172},{"style":328},[1173],{"type":37,"value":1174},"final_results ",{"type":32,"tag":311,"props":1176,"children":1177},{"style":483},[1178],{"type":37,"value":511},{"type":32,"tag":311,"props":1180,"children":1181},{"style":447},[1182],{"type":37,"value":1183}," sorted",{"type":32,"tag":311,"props":1185,"children":1186},{"style":328},[1187],{"type":37,"value":1188},"(combined_scores.items(), ",{"type":32,"tag":311,"props":1190,"children":1191},{"style":894},[1192],{"type":37,"value":1193},"key",{"type":32,"tag":311,"props":1195,"children":1196},{"style":483},[1197],{"type":37,"value":1198},"=lambda",{"type":32,"tag":311,"props":1200,"children":1201},{"style":328},[1202],{"type":37,"value":1203}," x: x[",{"type":32,"tag":311,"props":1205,"children":1206},{"style":447},[1207],{"type":37,"value":1208},"1",{"type":32,"tag":311,"props":1210,"children":1211},{"style":328},[1212],{"type":37,"value":1213},"], ",{"type":32,"tag":311,"props":1215,"children":1216},{"style":894},[1217],{"type":37,"value":1218},"reverse",{"type":32,"tag":311,"props":1220,"children":1221},{"style":483},[1222],{"type":37,"value":511},{"type":32,"tag":311,"props":1224,"children":1225},{"style":447},[1226],{"type":37,"value":1227},"True",{"type":32,"tag":311,"props":1229,"children":1230},{"style":328},[1231],{"type":37,"value":1232},")[:",{"type":32,"tag":311,"props":1234,"children":1235},{"style":447},[1236],{"type":37,"value":536},{"type":32,"tag":311,"props":1238,"children":1239},{"style":328},[1240],{"type":37,"value":431},{"type":32,"tag":33,"props":1242,"children":1243},{},[1244],{"type":37,"value":1245},"Résultat du test : la recherche hybride a augmenté recall@5 de 22 % sur les requêtes techniques. Mais la latence a doublé car tu fais deux requêtes d'index distinctes. Si ce tradeoff est acceptable (par exemple, outil interne, \u003C500ms suffit), la recherche hybride fonctionne en production.",{"type":32,"tag":40,"props":1247,"children":1249},{"id":1248},"reranking-filtrage-de-deuxième-étape",[1250],{"type":37,"value":1251},"Reranking : Filtrage de Deuxième Étape",{"type":32,"tag":33,"props":1253,"children":1254},{},[1255,1257,1262],{"type":37,"value":1256},"La première étape de récupération (BM25 + vector) ramène 20-50 documents. Mais ils ne rentreront pas tous dans le contexte du LLM (coût + limite de tokens). Le ",{"type":32,"tag":98,"props":1258,"children":1259},{},[1260],{"type":37,"value":1261},"modèle de reranking",{"type":37,"value":1263}," intervient : recalcule le score de pertinence de chaque document par rapport à la requête et sélectionne le top-5.",{"type":32,"tag":33,"props":1265,"children":1266},{},[1267,1269,1275,1277,1283],{"type":37,"value":1268},"Des modèles comme Cohere ",{"type":32,"tag":57,"props":1270,"children":1272},{"className":1271},[],[1273],{"type":37,"value":1274},"rerank-english-v2.0",{"type":37,"value":1276}," ou ",{"type":32,"tag":57,"props":1278,"children":1280},{"className":1279},[],[1281],{"type":37,"value":1282},"bge-reranker-large",{"type":37,"value":1284}," sont utilisés. Le reranking repose sur une architecture d'encodeur croisé — il encode query + document ensemble, donc plus coûteux que l'embedding mais plus précis.",{"type":32,"tag":33,"props":1286,"children":1287},{},[1288],{"type":37,"value":1289},"Benchmark : en appliquant le reranking sur 50 documents :",{"type":32,"tag":156,"props":1291,"children":1292},{},[1293,1298,1303],{"type":32,"tag":108,"props":1294,"children":1295},{},[1296],{"type":37,"value":1297},"Recall@5 : 0.73 → 0.89",{"type":32,"tag":108,"props":1299,"children":1300},{},[1301],{"type":37,"value":1302},"Latence : +180ms (acceptable)",{"type":32,"tag":108,"props":1304,"children":1305},{},[1306],{"type":37,"value":1307},"Coût : +$0.002 par récupération (API Cohere)",{"type":32,"tag":33,"props":1309,"children":1310},{},[1311],{"type":37,"value":1312},"Si le budget est serré, tu peux utiliser un reranker auto-hébergé, mais il requiert l'inférence GPU. À ce stade, tu dois calculer coût infra vs coût API.",{"type":32,"tag":40,"props":1314,"children":1316},{"id":1315},"optimisation-de-la-fenêtre-contextuelle-moins-de-documents-meilleures-réponses",[1317],{"type":37,"value":1318},"Optimisation de la Fenêtre Contextuelle : Moins de Documents, Meilleures Réponses",{"type":32,"tag":33,"props":1320,"children":1321},{},[1322],{"type":37,"value":1323},"Envoyer 20 documents au LLM ne produit pas toujours une meilleure réponse. Le contexte long crée le problème « lost in the middle » — le modèle omet les informations au milieu. Résultat du test : envoyer 5 documents à GPT-4 Turbo produit des réponses meilleures que 15 documents (écart BLEU de 11 %).",{"type":32,"tag":33,"props":1325,"children":1326},{},[1327],{"type":32,"tag":98,"props":1328,"children":1329},{},[1330],{"type":37,"value":1331},"Stratégie d'optimisation :",{"type":32,"tag":104,"props":1333,"children":1334},{},[1335,1340,1345],{"type":32,"tag":108,"props":1336,"children":1337},{},[1338],{"type":37,"value":1339},"Utilise le reranker pour sélectionner top-5",{"type":32,"tag":108,"props":1341,"children":1342},{},[1343],{"type":37,"value":1344},"Élimine les documents avec relevance score \u003C 0.6",{"type":32,"tag":108,"props":1346,"children":1347},{},[1348],{"type":37,"value":1349},"Envoie les 3-5 documents restants au contexte du LLM",{"type":32,"tag":33,"props":1351,"children":1352},{},[1353],{"type":37,"value":1354},"Cette approche réduit le coût en tokens (réduction d'entrée de 70 %) tout en améliorant la qualité de réponse. En production, tu dois trouver le sweet spot dans le triangle coût\u002Flatence\u002Fqualité — le pipeline d'évaluation le rend visible.",{"type":32,"tag":40,"props":1356,"children":1358},{"id":1357},"monitoring-en-production-drift-de-récupération",[1359],{"type":37,"value":1360},"Monitoring en Production : Drift de Récupération",{"type":32,"tag":33,"props":1362,"children":1363},{},[1364,1366,1371],{"type":37,"value":1365},"La qualité de récupération peut se dégrader dans le temps — à mesure que tu ajoutes de nouveaux documents, que la distribution des requêtes change. Le ",{"type":32,"tag":98,"props":1367,"children":1368},{},[1369],{"type":37,"value":1370},"drift de récupération",{"type":37,"value":1372}," doit être suivi avec un dashboard :",{"type":32,"tag":1374,"props":1375,"children":1376},"table",{},[1377,1401],{"type":32,"tag":1378,"props":1379,"children":1380},"thead",{},[1381],{"type":32,"tag":1382,"props":1383,"children":1384},"tr",{},[1385,1391,1396],{"type":32,"tag":1386,"props":1387,"children":1388},"th",{},[1389],{"type":37,"value":1390},"Métrique",{"type":32,"tag":1386,"props":1392,"children":1393},{},[1394],{"type":37,"value":1395},"Cible",{"type":32,"tag":1386,"props":1397,"children":1398},{},[1399],{"type":37,"value":1400},"Seuil d'Alerte",{"type":32,"tag":1402,"props":1403,"children":1404},"tbody",{},[1405,1424,1442,1460],{"type":32,"tag":1382,"props":1406,"children":1407},{},[1408,1414,1419],{"type":32,"tag":1409,"props":1410,"children":1411},"td",{},[1412],{"type":37,"value":1413},"Recall@5 (éval hebdomadaire)",{"type":32,"tag":1409,"props":1415,"children":1416},{},[1417],{"type":37,"value":1418},"> 0.75",{"type":32,"tag":1409,"props":1420,"children":1421},{},[1422],{"type":37,"value":1423},"\u003C 0.70",{"type":32,"tag":1382,"props":1425,"children":1426},{},[1427,1432,1437],{"type":32,"tag":1409,"props":1428,"children":1429},{},[1430],{"type":37,"value":1431},"Latence P95",{"type":32,"tag":1409,"props":1433,"children":1434},{},[1435],{"type":37,"value":1436},"\u003C 400ms",{"type":32,"tag":1409,"props":1438,"children":1439},{},[1440],{"type":37,"value":1441},"> 600ms",{"type":32,"tag":1382,"props":1443,"children":1444},{},[1445,1450,1455],{"type":32,"tag":1409,"props":1446,"children":1447},{},[1448],{"type":37,"value":1449},"Requêtes sans résultat (%)",{"type":32,"tag":1409,"props":1451,"children":1452},{},[1453],{"type":37,"value":1454},"\u003C 5 %",{"type":32,"tag":1409,"props":1456,"children":1457},{},[1458],{"type":37,"value":1459},"> 10 %",{"type":32,"tag":1382,"props":1461,"children":1462},{},[1463,1468,1473],{"type":32,"tag":1409,"props":1464,"children":1465},{},[1466],{"type":37,"value":1467},"Score de pertinence moyen",{"type":32,"tag":1409,"props":1469,"children":1470},{},[1471],{"type":37,"value":1472},"> 0.65",{"type":32,"tag":1409,"props":1474,"children":1475},{},[1476],{"type":37,"value":1477},"\u003C 0.55",{"type":32,"tag":33,"props":1479,"children":1480},{},[1481],{"type":37,"value":1482},"Si tu observes un drift de recall :",{"type":32,"tag":104,"props":1484,"children":1485},{},[1486,1491,1496],{"type":32,"tag":108,"props":1487,"children":1488},{},[1489],{"type":37,"value":1490},"Mets à jour l'ensemble d'évaluation (ajoute les nouveaux patterns de requête)",{"type":32,"tag":108,"props":1492,"children":1493},{},[1494],{"type":37,"value":1495},"Fine-tune ou change le modèle d'embedding",{"type":32,"tag":108,"props":1497,"children":1498},{},[1499],{"type":37,"value":1500},"Réexamine la stratégie de chunking",{"type":32,"tag":33,"props":1502,"children":1503},{},[1504,1506,1513],{"type":37,"value":1505},"Ce monitoring relève de ",{"type":32,"tag":129,"props":1507,"children":1510},{"href":1508,"rel":1509},"https:\u002F\u002Fwww.roibase.com.tr\u002Ffr\u002Ffirstparty",[133],[1511],{"type":37,"value":1512},"l'architecture de données et de mesure first-party",{"type":37,"value":1514}," — le système RAG est un data pipeline, il doit être observable.",{"type":32,"tag":40,"props":1516,"children":1518},{"id":1517},"tradeoff-coût-vs-qualité-choix-pragmatiques",[1519],{"type":37,"value":1520},"Tradeoff Coût vs Qualité : Choix Pragmatiques",{"type":32,"tag":33,"props":1522,"children":1523},{},[1524],{"type":37,"value":1525},"En RAG production, chaque décision implique un tradeoff coût\u002Fqualité\u002Flatence. Quelques choix pragmatiques :",{"type":32,"tag":156,"props":1527,"children":1528},{},[1529,1539,1549,1559],{"type":32,"tag":108,"props":1530,"children":1531},{},[1532,1537],{"type":32,"tag":98,"props":1533,"children":1534},{},[1535],{"type":37,"value":1536},"Modèle d'embedding :",{"type":37,"value":1538}," Utilise Cohere v3 au lieu d'OpenAI 3-large → réduction de coût de 30 %, perte de qualité de 2 % (acceptable)",{"type":32,"tag":108,"props":1540,"children":1541},{},[1542,1547],{"type":32,"tag":98,"props":1543,"children":1544},{},[1545],{"type":37,"value":1546},"Reranking :",{"type":37,"value":1548}," Rerank seulement les requêtes ambiguës au lieu de toutes → latence réduite de 40 %",{"type":32,"tag":108,"props":1550,"children":1551},{},[1552,1557],{"type":32,"tag":98,"props":1553,"children":1554},{},[1555],{"type":37,"value":1556},"Recherche hybride :",{"type":37,"value":1558}," Vector seul au lieu de BM25 + vector (si la correspondance exacte n'est pas critique) → latence réduite de 50 %",{"type":32,"tag":108,"props":1560,"children":1561},{},[1562,1567],{"type":32,"tag":98,"props":1563,"children":1564},{},[1565],{"type":37,"value":1566},"Fenêtre contextuelle :",{"type":37,"value":1568}," 5 documents au lieu de 10 → réduction du coût en tokens de 60 %, augmentation de qualité de 8 %",{"type":32,"tag":33,"props":1570,"children":1571},{},[1572],{"type":37,"value":1573},"Pour voir ces tradeoffs, le pipeline d'évaluation est obligatoire. Sinon, tu dis « j'ai changé le modèle d'embedding, c'est moins cher », mais tu ne remarques pas que la qualité de récupération a baissé de 15 %.",{"type":32,"tag":33,"props":1575,"children":1576},{},[1577],{"type":37,"value":1578},"Avant de mettre ton système RAG en production, prends au sérieux le modèle d'embedding, la stratégie de chunking et la setup d'évaluation. L'optimisation des coûts vient en deuxième phase — stabilise d'abord la qualité de récupération, puis réduis les coûts. Sinon, la fiabilité du système se répercute sur l'utilisateur et l'adoption chute.",{"type":32,"tag":1580,"props":1581,"children":1582},"style",{},[1583],{"type":37,"value":1584},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":334,"depth":334,"links":1586},[1587,1588,1591,1592,1593,1594,1595,1596],{"id":42,"depth":324,"text":45},{"id":141,"depth":324,"text":144,"children":1589},[1590],{"id":272,"depth":334,"text":275},{"id":283,"depth":324,"text":286},{"id":844,"depth":324,"text":847},{"id":1248,"depth":324,"text":1251},{"id":1315,"depth":324,"text":1318},{"id":1357,"depth":324,"text":1360},{"id":1517,"depth":324,"text":1520},"markdown","content:fr:ai:rag-production-retrieval-kalitesi.md","content","fr\u002Fai\u002Frag-production-retrieval-kalitesi.md","fr\u002Fai\u002Frag-production-retrieval-kalitesi","md",1778709809344]