[{"data":1,"prerenderedAt":2637},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin":13},{"i18nKey":4,"paths":5},"ai-004-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Fprompt-versionierung-llm-evaluation","\u002Fen\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing","\u002Fes\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops","\u002Ffr\u002Fai\u002Fversionamento-prompt-ab-test","\u002Fit\u002Fai\u002Fversionamento-prompt-e-a-b-test-disciplina-llm-ops","\u002Fru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin","\u002Ftr\u002Fai\u002Fprompt-versiyonlama-ve-a-b-testi-llm-operasyonun-disiplini",{"_path":11,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":2631,"_id":2632,"_source":2633,"_file":2634,"_stem":2635,"_extension":2636},"ai",false,"","Prompt-Versionierung und A\u002FB-Tests: Die Disziplin von LLM-Operationen","Wie man in Production-LLM-Systemen Prompt-Versionierung, Evaluation Pipelines und deterministischen Qualitätskontroll mit Promptfoo\u002FLangSmith aufbaut.","2026-05-13",[21,22,23,24,25],"llm-ops","prompt-engineering","evaluation","mlops","ai-qualitaet",9,"Roibase",{"type":29,"children":30,"toc":2619},"root",[31,39,44,51,56,78,83,103,108,114,119,130,148,161,296,306,324,329,643,653,671,687,694,699,704,1020,1033,1117,1122,1128,1133,1185,1190,1480,1485,1704,1709,1715,1720,1732,1737,2062,2067,2073,2085,2090,2419,2424,2430,2435,2440,2473,2478,2490,2496,2501,2590,2595,2599,2613],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","Zwischen „funktioniert\" und „zuverlässig in Production\" liegen in LLM-Systemen 15 Schritte. Claude API generiert Markdown-Output in Marketing-Automatisierung, GPT segmentiert Customer Journeys — aber wie sicherst du, dass eine Prompt-Änderung keine Regression verursacht? In der Softwareentwicklung sind Versionierung, Test Coverage und CI\u002FCD Standard; in LLM-Operationen ohne diese Disziplin ist jedes Deployment ein Glücksspiel.",{"type":32,"tag":33,"props":40,"children":41},{},[42],{"type":37,"value":43},"Tools wie Promptfoo und LangSmith etablieren genau diese Disziplin: Prompt-Versionierung, deterministische Evaluations, A\u002FB-Tests, Metrik-Tracking. Dieser Artikel zeigt, wie du Qualitätskontrolle in Production-LLM-Systemen aufbaust — nicht auf Code-Ebene, sondern auf Infrastruktur-Ebene.",{"type":32,"tag":45,"props":46,"children":48},"h2",{"id":47},"der-irrglaube-dass-prompts-keine-software-dateien-sind",[49],{"type":37,"value":50},"Der Irrglaube, dass Prompts keine Software-Dateien sind",{"type":32,"tag":33,"props":52,"children":53},{},[54],{"type":37,"value":55},"Die meisten Teams behandeln Prompts als „Konfigurationsdateien\" — Editor in der UI, Dokumentation in Notion, hardcodierter Text-Node im n8n Workflow. In Wirklichkeit ist ein Prompt eine executable Spezifikation, die das Systemverhalten definiert. Aber es gibt keine Versionierung, kein Diff, kein Rollback.",{"type":32,"tag":33,"props":57,"children":58},{},[59,61,68,70,76],{"type":37,"value":60},"Ein Git-Commit mit der Nachricht „fix typo\" kann den Ton des Model-Output ändern und Metriken senken. Besonders in Structured-Output-Szenarien (JSON Schema, Markdown Frontmatter, SQL Query) führt ein einzelnes Wort zu Format-Fehlern mit Kettenreaktionen. Beispiel: ",{"type":32,"tag":62,"props":63,"children":65},"code",{"className":64},[],[66],{"type":37,"value":67},"OUTPUT FORMAT: JSON",{"type":37,"value":69}," statt ",{"type":32,"tag":62,"props":71,"children":73},{"className":72},[],[74],{"type":37,"value":75},"OUTPUT FORMAT: Valid JSON",{"type":37,"value":77}," kann Model dazu bringen, erklärende Absätze hinzuzufügen — Downstream-Parser crasht, Alert explodiert, 3 Stunden Debugging.",{"type":32,"tag":33,"props":79,"children":80},{},[81],{"type":37,"value":82},"Eine Versionierungs-Disziplin muss diese Fragen beantworten:",{"type":32,"tag":84,"props":85,"children":86},"ul",{},[87,93,98],{"type":32,"tag":88,"props":89,"children":90},"li",{},[91],{"type":37,"value":92},"Welche Prompt-Version läuft gerade in Production?",{"type":32,"tag":88,"props":94,"children":95},{},[96],{"type":37,"value":97},"Welcher Leistungsunterschied besteht zwischen der Version von vor 2 Wochen und jetzt?",{"type":32,"tag":88,"props":99,"children":100},{},[101],{"type":37,"value":102},"Welche A\u002FB-Test-Variante hat die Conversion um 8% erhöht?",{"type":32,"tag":33,"props":104,"children":105},{},[106],{"type":37,"value":107},"Wenn du diese Fragen nicht beantworten kannst, führst du keine „KI-Operationen\" durch — du machst manuelle Experimente.",{"type":32,"tag":45,"props":109,"children":111},{"id":110},"evaluation-pipeline-drei-schichten-zum-messen-von-output",[112],{"type":37,"value":113},"Evaluation Pipeline: Drei Schichten zum Messen von Output",{"type":32,"tag":33,"props":115,"children":116},{},[117],{"type":37,"value":118},"LLM-Output zu evaluieren wirkt subjektiv, aber in Production-Systemen lassen sich deterministische Metriken etablieren. Evaluation funktioniert auf drei Schichten: Syntax, Semantik, Business Outcome.",{"type":32,"tag":33,"props":120,"children":121},{},[122,128],{"type":32,"tag":123,"props":124,"children":125},"strong",{},[126],{"type":37,"value":127},"Syntax-Schicht",{"type":37,"value":129}," — Format-Konformität:",{"type":32,"tag":84,"props":131,"children":132},{},[133,138,143],{"type":32,"tag":88,"props":134,"children":135},{},[136],{"type":37,"value":137},"Parsed JSON korrekt?",{"type":32,"tag":88,"props":139,"children":140},{},[141],{"type":37,"value":142},"Ist Markdown Frontmatter gültig?",{"type":32,"tag":88,"props":144,"children":145},{},[146],{"type":37,"value":147},"Sind alle erwarteten Felder vorhanden?",{"type":32,"tag":33,"props":149,"children":150},{},[151,153,159],{"type":37,"value":152},"In Promptfoo mit ",{"type":32,"tag":62,"props":154,"children":156},{"className":155},[],[157],{"type":37,"value":158},"javascript",{"type":37,"value":160}," Assertions:",{"type":32,"tag":162,"props":163,"children":166},"pre",{"className":164,"code":165,"language":158,"meta":16,"style":16},"language-javascript shiki shiki-themes github-dark","assert: [\n  {\n    type: \"javascript\",\n    value: \"JSON.parse(output).title.length \u003C= 60\"\n  },\n  {\n    type: \"is-json\",\n    value: true\n  }\n]\n",[167],{"type":32,"tag":62,"props":168,"children":169},{"__ignoreMap":16},[170,188,197,217,231,240,248,265,279,287],{"type":32,"tag":171,"props":172,"children":175},"span",{"class":173,"line":174},"line",1,[176,182],{"type":32,"tag":171,"props":177,"children":179},{"style":178},"--shiki-default:#B392F0",[180],{"type":37,"value":181},"assert",{"type":32,"tag":171,"props":183,"children":185},{"style":184},"--shiki-default:#E1E4E8",[186],{"type":37,"value":187},": [\n",{"type":32,"tag":171,"props":189,"children":191},{"class":173,"line":190},2,[192],{"type":32,"tag":171,"props":193,"children":194},{"style":184},[195],{"type":37,"value":196},"  {\n",{"type":32,"tag":171,"props":198,"children":200},{"class":173,"line":199},3,[201,206,212],{"type":32,"tag":171,"props":202,"children":203},{"style":184},[204],{"type":37,"value":205},"    type: ",{"type":32,"tag":171,"props":207,"children":209},{"style":208},"--shiki-default:#9ECBFF",[210],{"type":37,"value":211},"\"javascript\"",{"type":32,"tag":171,"props":213,"children":214},{"style":184},[215],{"type":37,"value":216},",\n",{"type":32,"tag":171,"props":218,"children":220},{"class":173,"line":219},4,[221,226],{"type":32,"tag":171,"props":222,"children":223},{"style":184},[224],{"type":37,"value":225},"    value: ",{"type":32,"tag":171,"props":227,"children":228},{"style":208},[229],{"type":37,"value":230},"\"JSON.parse(output).title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":232,"children":234},{"class":173,"line":233},5,[235],{"type":32,"tag":171,"props":236,"children":237},{"style":184},[238],{"type":37,"value":239},"  },\n",{"type":32,"tag":171,"props":241,"children":243},{"class":173,"line":242},6,[244],{"type":32,"tag":171,"props":245,"children":246},{"style":184},[247],{"type":37,"value":196},{"type":32,"tag":171,"props":249,"children":251},{"class":173,"line":250},7,[252,256,261],{"type":32,"tag":171,"props":253,"children":254},{"style":184},[255],{"type":37,"value":205},{"type":32,"tag":171,"props":257,"children":258},{"style":208},[259],{"type":37,"value":260},"\"is-json\"",{"type":32,"tag":171,"props":262,"children":263},{"style":184},[264],{"type":37,"value":216},{"type":32,"tag":171,"props":266,"children":268},{"class":173,"line":267},8,[269,273],{"type":32,"tag":171,"props":270,"children":271},{"style":184},[272],{"type":37,"value":225},{"type":32,"tag":171,"props":274,"children":276},{"style":275},"--shiki-default:#79B8FF",[277],{"type":37,"value":278},"true\n",{"type":32,"tag":171,"props":280,"children":281},{"class":173,"line":26},[282],{"type":32,"tag":171,"props":283,"children":284},{"style":184},[285],{"type":37,"value":286},"  }\n",{"type":32,"tag":171,"props":288,"children":290},{"class":173,"line":289},10,[291],{"type":32,"tag":171,"props":292,"children":293},{"style":184},[294],{"type":37,"value":295},"]\n",{"type":32,"tag":33,"props":297,"children":298},{},[299,304],{"type":32,"tag":123,"props":300,"children":301},{},[302],{"type":37,"value":303},"Semantik-Schicht",{"type":37,"value":305}," — Inhaltsqualität:",{"type":32,"tag":84,"props":307,"children":308},{},[309,314,319],{"type":32,"tag":88,"props":310,"children":311},{},[312],{"type":37,"value":313},"Ist die Antwort themenrelevant? (Embedding Similarity, Cosine Distance > 0.85)",{"type":32,"tag":88,"props":315,"children":316},{},[317],{"type":37,"value":318},"Gibt es verbotene Wörter? (Regex, Token Filtering)",{"type":32,"tag":88,"props":320,"children":321},{},[322],{"type":37,"value":323},"Ist der Ton korrekt? (Classifier Model, Sentiment Score)",{"type":32,"tag":33,"props":325,"children":326},{},[327],{"type":37,"value":328},"Custom Evaluator in LangSmith:",{"type":32,"tag":162,"props":330,"children":334},{"className":331,"code":332,"language":333,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","from langsmith import evaluate\n\ndef check_brand_compliance(run, example):\n    forbidden = [\"Experte\", \"Marktführer\", \"revolutionär\"]\n    output = run.outputs[\"text\"].lower()\n    violations = [w for w in forbidden if w in output]\n    return {\"score\": 0 if violations else 1, \"violations\": violations}\n\nevaluate(\n    dataset_name=\"marketing_blog_posts\",\n    evaluators=[check_brand_compliance]\n)\n","python",[335],{"type":32,"tag":62,"props":336,"children":337},{"__ignoreMap":16},[338,362,371,389,435,462,517,579,586,594,616,634],{"type":32,"tag":171,"props":339,"children":340},{"class":173,"line":174},[341,347,352,357],{"type":32,"tag":171,"props":342,"children":344},{"style":343},"--shiki-default:#F97583",[345],{"type":37,"value":346},"from",{"type":32,"tag":171,"props":348,"children":349},{"style":184},[350],{"type":37,"value":351}," langsmith ",{"type":32,"tag":171,"props":353,"children":354},{"style":343},[355],{"type":37,"value":356},"import",{"type":32,"tag":171,"props":358,"children":359},{"style":184},[360],{"type":37,"value":361}," evaluate\n",{"type":32,"tag":171,"props":363,"children":364},{"class":173,"line":190},[365],{"type":32,"tag":171,"props":366,"children":368},{"emptyLinePlaceholder":367},true,[369],{"type":37,"value":370},"\n",{"type":32,"tag":171,"props":372,"children":373},{"class":173,"line":199},[374,379,384],{"type":32,"tag":171,"props":375,"children":376},{"style":343},[377],{"type":37,"value":378},"def",{"type":32,"tag":171,"props":380,"children":381},{"style":178},[382],{"type":37,"value":383}," check_brand_compliance",{"type":32,"tag":171,"props":385,"children":386},{"style":184},[387],{"type":37,"value":388},"(run, example):\n",{"type":32,"tag":171,"props":390,"children":391},{"class":173,"line":219},[392,397,402,407,412,417,422,426,431],{"type":32,"tag":171,"props":393,"children":394},{"style":184},[395],{"type":37,"value":396},"    forbidden ",{"type":32,"tag":171,"props":398,"children":399},{"style":343},[400],{"type":37,"value":401},"=",{"type":32,"tag":171,"props":403,"children":404},{"style":184},[405],{"type":37,"value":406}," [",{"type":32,"tag":171,"props":408,"children":409},{"style":208},[410],{"type":37,"value":411},"\"Experte\"",{"type":32,"tag":171,"props":413,"children":414},{"style":184},[415],{"type":37,"value":416},", ",{"type":32,"tag":171,"props":418,"children":419},{"style":208},[420],{"type":37,"value":421},"\"Marktführer\"",{"type":32,"tag":171,"props":423,"children":424},{"style":184},[425],{"type":37,"value":416},{"type":32,"tag":171,"props":427,"children":428},{"style":208},[429],{"type":37,"value":430},"\"revolutionär\"",{"type":32,"tag":171,"props":432,"children":433},{"style":184},[434],{"type":37,"value":295},{"type":32,"tag":171,"props":436,"children":437},{"class":173,"line":233},[438,443,447,452,457],{"type":32,"tag":171,"props":439,"children":440},{"style":184},[441],{"type":37,"value":442},"    output ",{"type":32,"tag":171,"props":444,"children":445},{"style":343},[446],{"type":37,"value":401},{"type":32,"tag":171,"props":448,"children":449},{"style":184},[450],{"type":37,"value":451}," run.outputs[",{"type":32,"tag":171,"props":453,"children":454},{"style":208},[455],{"type":37,"value":456},"\"text\"",{"type":32,"tag":171,"props":458,"children":459},{"style":184},[460],{"type":37,"value":461},"].lower()\n",{"type":32,"tag":171,"props":463,"children":464},{"class":173,"line":242},[465,470,474,479,484,489,494,499,504,508,512],{"type":32,"tag":171,"props":466,"children":467},{"style":184},[468],{"type":37,"value":469},"    violations ",{"type":32,"tag":171,"props":471,"children":472},{"style":343},[473],{"type":37,"value":401},{"type":32,"tag":171,"props":475,"children":476},{"style":184},[477],{"type":37,"value":478}," [w ",{"type":32,"tag":171,"props":480,"children":481},{"style":343},[482],{"type":37,"value":483},"for",{"type":32,"tag":171,"props":485,"children":486},{"style":184},[487],{"type":37,"value":488}," w ",{"type":32,"tag":171,"props":490,"children":491},{"style":343},[492],{"type":37,"value":493},"in",{"type":32,"tag":171,"props":495,"children":496},{"style":184},[497],{"type":37,"value":498}," forbidden ",{"type":32,"tag":171,"props":500,"children":501},{"style":343},[502],{"type":37,"value":503},"if",{"type":32,"tag":171,"props":505,"children":506},{"style":184},[507],{"type":37,"value":488},{"type":32,"tag":171,"props":509,"children":510},{"style":343},[511],{"type":37,"value":493},{"type":32,"tag":171,"props":513,"children":514},{"style":184},[515],{"type":37,"value":516}," output]\n",{"type":32,"tag":171,"props":518,"children":519},{"class":173,"line":250},[520,525,530,535,540,545,550,555,560,565,569,574],{"type":32,"tag":171,"props":521,"children":522},{"style":343},[523],{"type":37,"value":524},"    return",{"type":32,"tag":171,"props":526,"children":527},{"style":184},[528],{"type":37,"value":529}," {",{"type":32,"tag":171,"props":531,"children":532},{"style":208},[533],{"type":37,"value":534},"\"score\"",{"type":32,"tag":171,"props":536,"children":537},{"style":184},[538],{"type":37,"value":539},": ",{"type":32,"tag":171,"props":541,"children":542},{"style":275},[543],{"type":37,"value":544},"0",{"type":32,"tag":171,"props":546,"children":547},{"style":343},[548],{"type":37,"value":549}," if",{"type":32,"tag":171,"props":551,"children":552},{"style":184},[553],{"type":37,"value":554}," violations ",{"type":32,"tag":171,"props":556,"children":557},{"style":343},[558],{"type":37,"value":559},"else",{"type":32,"tag":171,"props":561,"children":562},{"style":275},[563],{"type":37,"value":564}," 1",{"type":32,"tag":171,"props":566,"children":567},{"style":184},[568],{"type":37,"value":416},{"type":32,"tag":171,"props":570,"children":571},{"style":208},[572],{"type":37,"value":573},"\"violations\"",{"type":32,"tag":171,"props":575,"children":576},{"style":184},[577],{"type":37,"value":578},": violations}\n",{"type":32,"tag":171,"props":580,"children":581},{"class":173,"line":267},[582],{"type":32,"tag":171,"props":583,"children":584},{"emptyLinePlaceholder":367},[585],{"type":37,"value":370},{"type":32,"tag":171,"props":587,"children":588},{"class":173,"line":26},[589],{"type":32,"tag":171,"props":590,"children":591},{"style":184},[592],{"type":37,"value":593},"evaluate(\n",{"type":32,"tag":171,"props":595,"children":596},{"class":173,"line":289},[597,603,607,612],{"type":32,"tag":171,"props":598,"children":600},{"style":599},"--shiki-default:#FFAB70",[601],{"type":37,"value":602},"    dataset_name",{"type":32,"tag":171,"props":604,"children":605},{"style":343},[606],{"type":37,"value":401},{"type":32,"tag":171,"props":608,"children":609},{"style":208},[610],{"type":37,"value":611},"\"marketing_blog_posts\"",{"type":32,"tag":171,"props":613,"children":614},{"style":184},[615],{"type":37,"value":216},{"type":32,"tag":171,"props":617,"children":619},{"class":173,"line":618},11,[620,625,629],{"type":32,"tag":171,"props":621,"children":622},{"style":599},[623],{"type":37,"value":624},"    evaluators",{"type":32,"tag":171,"props":626,"children":627},{"style":343},[628],{"type":37,"value":401},{"type":32,"tag":171,"props":630,"children":631},{"style":184},[632],{"type":37,"value":633},"[check_brand_compliance]\n",{"type":32,"tag":171,"props":635,"children":637},{"class":173,"line":636},12,[638],{"type":32,"tag":171,"props":639,"children":640},{"style":184},[641],{"type":37,"value":642},")\n",{"type":32,"tag":33,"props":644,"children":645},{},[646,651],{"type":32,"tag":123,"props":647,"children":648},{},[649],{"type":37,"value":650},"Business Outcome-Schicht",{"type":37,"value":652}," — echte Auswirkungen:",{"type":32,"tag":84,"props":654,"children":655},{},[656,661,666],{"type":32,"tag":88,"props":657,"children":658},{},[659],{"type":37,"value":660},"Hat sich CTR verändert?",{"type":32,"tag":88,"props":662,"children":663},{},[664],{"type":37,"value":665},"Ist die Conversion gesunken?",{"type":32,"tag":88,"props":667,"children":668},{},[669],{"type":37,"value":670},"Ist die Bounce Rate gestiegen?",{"type":32,"tag":33,"props":672,"children":673},{},[674,676,685],{"type":37,"value":675},"Diese Schicht verbindet sich mit Production Telemetry — im ",{"type":32,"tag":677,"props":678,"children":682},"a",{"href":679,"rel":680},"https:\u002F\u002Fwww.roibase.com.tr\u002Fru\u002Ffirstparty",[681],"nofollow",[683],{"type":37,"value":684},"First-Party Daten & Messfundament",{"type":37,"value":686}," System wird die Prompt-Version als Metadaten zum Event hinzugefügt, in BigQuery gejoined, ein dbt Model berechnet die Conversion Rate jeder Version.",{"type":32,"tag":688,"props":689,"children":691},"h3",{"id":690},"promptfoo-deterministische-test-suite-aufbauen",[692],{"type":37,"value":693},"Promptfoo: Deterministische Test Suite aufbauen",{"type":32,"tag":33,"props":695,"children":696},{},[697],{"type":37,"value":698},"Promptfoo ist ein lokal laufendes, YAML-basiertes Eval Framework. Ziel: Jede Prompt-Änderung vor Regression testen.",{"type":32,"tag":33,"props":700,"children":701},{},[702],{"type":37,"value":703},"Einfache Konfiguration:",{"type":32,"tag":162,"props":705,"children":709},{"className":706,"code":707,"language":708,"meta":16,"style":16},"language-yaml shiki shiki-themes github-dark","prompts:\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n\nproviders:\n  - anthropic:messages:claude-3-5-sonnet-20241022\n\ntests:\n  - vars:\n      topic: \"Server-Side GTM\"\n      category: \"tech\"\n    assert:\n      - type: is-json\n      - type: javascript\n        value: \"output.title.length \u003C= 60\"\n      - type: similar\n        value: \"Server-Side Tracking Architektur\"\n        threshold: 0.8\n      - type: not-contains\n        value: \"revolutionär\"\n","yaml",[710],{"type":32,"tag":62,"props":711,"children":712},{"__ignoreMap":16},[713,727,740,752,759,771,783,790,802,818,835,852,864,887,908,926,947,964,982,1003],{"type":32,"tag":171,"props":714,"children":715},{"class":173,"line":174},[716,722],{"type":32,"tag":171,"props":717,"children":719},{"style":718},"--shiki-default:#85E89D",[720],{"type":37,"value":721},"prompts",{"type":32,"tag":171,"props":723,"children":724},{"style":184},[725],{"type":37,"value":726},":\n",{"type":32,"tag":171,"props":728,"children":729},{"class":173,"line":190},[730,735],{"type":32,"tag":171,"props":731,"children":732},{"style":184},[733],{"type":37,"value":734},"  - ",{"type":32,"tag":171,"props":736,"children":737},{"style":208},[738],{"type":37,"value":739},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n",{"type":32,"tag":171,"props":741,"children":742},{"class":173,"line":199},[743,747],{"type":32,"tag":171,"props":744,"children":745},{"style":184},[746],{"type":37,"value":734},{"type":32,"tag":171,"props":748,"children":749},{"style":208},[750],{"type":37,"value":751},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n",{"type":32,"tag":171,"props":753,"children":754},{"class":173,"line":219},[755],{"type":32,"tag":171,"props":756,"children":757},{"emptyLinePlaceholder":367},[758],{"type":37,"value":370},{"type":32,"tag":171,"props":760,"children":761},{"class":173,"line":233},[762,767],{"type":32,"tag":171,"props":763,"children":764},{"style":718},[765],{"type":37,"value":766},"providers",{"type":32,"tag":171,"props":768,"children":769},{"style":184},[770],{"type":37,"value":726},{"type":32,"tag":171,"props":772,"children":773},{"class":173,"line":242},[774,778],{"type":32,"tag":171,"props":775,"children":776},{"style":184},[777],{"type":37,"value":734},{"type":32,"tag":171,"props":779,"children":780},{"style":208},[781],{"type":37,"value":782},"anthropic:messages:claude-3-5-sonnet-20241022\n",{"type":32,"tag":171,"props":784,"children":785},{"class":173,"line":250},[786],{"type":32,"tag":171,"props":787,"children":788},{"emptyLinePlaceholder":367},[789],{"type":37,"value":370},{"type":32,"tag":171,"props":791,"children":792},{"class":173,"line":267},[793,798],{"type":32,"tag":171,"props":794,"children":795},{"style":718},[796],{"type":37,"value":797},"tests",{"type":32,"tag":171,"props":799,"children":800},{"style":184},[801],{"type":37,"value":726},{"type":32,"tag":171,"props":803,"children":804},{"class":173,"line":26},[805,809,814],{"type":32,"tag":171,"props":806,"children":807},{"style":184},[808],{"type":37,"value":734},{"type":32,"tag":171,"props":810,"children":811},{"style":718},[812],{"type":37,"value":813},"vars",{"type":32,"tag":171,"props":815,"children":816},{"style":184},[817],{"type":37,"value":726},{"type":32,"tag":171,"props":819,"children":820},{"class":173,"line":289},[821,826,830],{"type":32,"tag":171,"props":822,"children":823},{"style":718},[824],{"type":37,"value":825},"      topic",{"type":32,"tag":171,"props":827,"children":828},{"style":184},[829],{"type":37,"value":539},{"type":32,"tag":171,"props":831,"children":832},{"style":208},[833],{"type":37,"value":834},"\"Server-Side GTM\"\n",{"type":32,"tag":171,"props":836,"children":837},{"class":173,"line":618},[838,843,847],{"type":32,"tag":171,"props":839,"children":840},{"style":718},[841],{"type":37,"value":842},"      category",{"type":32,"tag":171,"props":844,"children":845},{"style":184},[846],{"type":37,"value":539},{"type":32,"tag":171,"props":848,"children":849},{"style":208},[850],{"type":37,"value":851},"\"tech\"\n",{"type":32,"tag":171,"props":853,"children":854},{"class":173,"line":636},[855,860],{"type":32,"tag":171,"props":856,"children":857},{"style":718},[858],{"type":37,"value":859},"    assert",{"type":32,"tag":171,"props":861,"children":862},{"style":184},[863],{"type":37,"value":726},{"type":32,"tag":171,"props":865,"children":867},{"class":173,"line":866},13,[868,873,878,882],{"type":32,"tag":171,"props":869,"children":870},{"style":184},[871],{"type":37,"value":872},"      - ",{"type":32,"tag":171,"props":874,"children":875},{"style":718},[876],{"type":37,"value":877},"type",{"type":32,"tag":171,"props":879,"children":880},{"style":184},[881],{"type":37,"value":539},{"type":32,"tag":171,"props":883,"children":884},{"style":208},[885],{"type":37,"value":886},"is-json\n",{"type":32,"tag":171,"props":888,"children":890},{"class":173,"line":889},14,[891,895,899,903],{"type":32,"tag":171,"props":892,"children":893},{"style":184},[894],{"type":37,"value":872},{"type":32,"tag":171,"props":896,"children":897},{"style":718},[898],{"type":37,"value":877},{"type":32,"tag":171,"props":900,"children":901},{"style":184},[902],{"type":37,"value":539},{"type":32,"tag":171,"props":904,"children":905},{"style":208},[906],{"type":37,"value":907},"javascript\n",{"type":32,"tag":171,"props":909,"children":911},{"class":173,"line":910},15,[912,917,921],{"type":32,"tag":171,"props":913,"children":914},{"style":718},[915],{"type":37,"value":916},"        value",{"type":32,"tag":171,"props":918,"children":919},{"style":184},[920],{"type":37,"value":539},{"type":32,"tag":171,"props":922,"children":923},{"style":208},[924],{"type":37,"value":925},"\"output.title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":927,"children":929},{"class":173,"line":928},16,[930,934,938,942],{"type":32,"tag":171,"props":931,"children":932},{"style":184},[933],{"type":37,"value":872},{"type":32,"tag":171,"props":935,"children":936},{"style":718},[937],{"type":37,"value":877},{"type":32,"tag":171,"props":939,"children":940},{"style":184},[941],{"type":37,"value":539},{"type":32,"tag":171,"props":943,"children":944},{"style":208},[945],{"type":37,"value":946},"similar\n",{"type":32,"tag":171,"props":948,"children":950},{"class":173,"line":949},17,[951,955,959],{"type":32,"tag":171,"props":952,"children":953},{"style":718},[954],{"type":37,"value":916},{"type":32,"tag":171,"props":956,"children":957},{"style":184},[958],{"type":37,"value":539},{"type":32,"tag":171,"props":960,"children":961},{"style":208},[962],{"type":37,"value":963},"\"Server-Side Tracking Architektur\"\n",{"type":32,"tag":171,"props":965,"children":967},{"class":173,"line":966},18,[968,973,977],{"type":32,"tag":171,"props":969,"children":970},{"style":718},[971],{"type":37,"value":972},"        threshold",{"type":32,"tag":171,"props":974,"children":975},{"style":184},[976],{"type":37,"value":539},{"type":32,"tag":171,"props":978,"children":979},{"style":275},[980],{"type":37,"value":981},"0.8\n",{"type":32,"tag":171,"props":983,"children":985},{"class":173,"line":984},19,[986,990,994,998],{"type":32,"tag":171,"props":987,"children":988},{"style":184},[989],{"type":37,"value":872},{"type":32,"tag":171,"props":991,"children":992},{"style":718},[993],{"type":37,"value":877},{"type":32,"tag":171,"props":995,"children":996},{"style":184},[997],{"type":37,"value":539},{"type":32,"tag":171,"props":999,"children":1000},{"style":208},[1001],{"type":37,"value":1002},"not-contains\n",{"type":32,"tag":171,"props":1004,"children":1006},{"class":173,"line":1005},20,[1007,1011,1015],{"type":32,"tag":171,"props":1008,"children":1009},{"style":718},[1010],{"type":37,"value":916},{"type":32,"tag":171,"props":1012,"children":1013},{"style":184},[1014],{"type":37,"value":539},{"type":32,"tag":171,"props":1016,"children":1017},{"style":208},[1018],{"type":37,"value":1019},"\"revolutionär\"\n",{"type":32,"tag":33,"props":1021,"children":1022},{},[1023,1025,1031],{"type":37,"value":1024},"Mit ",{"type":32,"tag":62,"props":1026,"children":1028},{"className":1027},[],[1029],{"type":37,"value":1030},"promptfoo eval",{"type":37,"value":1032}," werden alle Varianten getestet, eine Metrik-Tabelle wird ausgegeben:",{"type":32,"tag":1034,"props":1035,"children":1036},"table",{},[1037,1066],{"type":32,"tag":1038,"props":1039,"children":1040},"thead",{},[1041],{"type":32,"tag":1042,"props":1043,"children":1044},"tr",{},[1045,1051,1056,1061],{"type":32,"tag":1046,"props":1047,"children":1048},"th",{},[1049],{"type":37,"value":1050},"Prompt",{"type":32,"tag":1046,"props":1052,"children":1053},{},[1054],{"type":37,"value":1055},"Pass Rate",{"type":32,"tag":1046,"props":1057,"children":1058},{},[1059],{"type":37,"value":1060},"Avg Latency",{"type":32,"tag":1046,"props":1062,"children":1063},{},[1064],{"type":37,"value":1065},"Cost",{"type":32,"tag":1067,"props":1068,"children":1069},"tbody",{},[1070,1094],{"type":32,"tag":1042,"props":1071,"children":1072},{},[1073,1079,1084,1089],{"type":32,"tag":1074,"props":1075,"children":1076},"td",{},[1077],{"type":37,"value":1078},"v1",{"type":32,"tag":1074,"props":1080,"children":1081},{},[1082],{"type":37,"value":1083},"92%",{"type":32,"tag":1074,"props":1085,"children":1086},{},[1087],{"type":37,"value":1088},"2.3s",{"type":32,"tag":1074,"props":1090,"children":1091},{},[1092],{"type":37,"value":1093},"$0.012",{"type":32,"tag":1042,"props":1095,"children":1096},{},[1097,1102,1107,1112],{"type":32,"tag":1074,"props":1098,"children":1099},{},[1100],{"type":37,"value":1101},"v2",{"type":32,"tag":1074,"props":1103,"children":1104},{},[1105],{"type":37,"value":1106},"98%",{"type":32,"tag":1074,"props":1108,"children":1109},{},[1110],{"type":37,"value":1111},"2.1s",{"type":32,"tag":1074,"props":1113,"children":1114},{},[1115],{"type":37,"value":1116},"$0.014",{"type":32,"tag":33,"props":1118,"children":1119},{},[1120],{"type":37,"value":1121},"v2 hat höhere Pass Rate, aber die Kosten sind um 17% gestiegen — Token Count wächst, muss im Detail überprüft werden. Ohne diesen Vergleich hätte die Änderung das monatliche Budget gesprengt.",{"type":32,"tag":45,"props":1123,"children":1125},{"id":1124},"ab-tests-prompt-varianten-in-production-vergleichen",[1126],{"type":37,"value":1127},"A\u002FB-Tests: Prompt-Varianten in Production vergleichen",{"type":32,"tag":33,"props":1129,"children":1130},{},[1131],{"type":37,"value":1132},"Evaluation Suite wird grün, jetzt braucht echten Traffic. A\u002FB-Tests für LLM-Systeme funktionieren so:",{"type":32,"tag":1134,"props":1135,"children":1136},"ol",{},[1137,1147,1165,1175],{"type":32,"tag":88,"props":1138,"children":1139},{},[1140,1145],{"type":32,"tag":123,"props":1141,"children":1142},{},[1143],{"type":37,"value":1144},"Variant Routing",{"type":37,"value":1146}," — je nach User\u002FSession ID Prompt-Version auswählen (% Split)",{"type":32,"tag":88,"props":1148,"children":1149},{},[1150,1155,1157,1163],{"type":32,"tag":123,"props":1151,"children":1152},{},[1153],{"type":37,"value":1154},"Metadata Tagging",{"type":37,"value":1156}," — füge ",{"type":32,"tag":62,"props":1158,"children":1160},{"className":1159},[],[1161],{"type":37,"value":1162},"prompt_version",{"type":37,"value":1164}," zu jedem API Call hinzu",{"type":32,"tag":88,"props":1166,"children":1167},{},[1168,1173],{"type":32,"tag":123,"props":1169,"children":1170},{},[1171],{"type":37,"value":1172},"Metric Tracking",{"type":37,"value":1174}," — behalte Variant-Info in Downstream Events",{"type":32,"tag":88,"props":1176,"children":1177},{},[1178,1183],{"type":32,"tag":123,"props":1179,"children":1180},{},[1181],{"type":37,"value":1182},"Statistical Significance",{"type":37,"value":1184}," — wenn genug Sample gesammelt (min 385 Observations pro Variante, 95% Konfidenz), dann Entscheidung treffen",{"type":32,"tag":33,"props":1186,"children":1187},{},[1188],{"type":37,"value":1189},"n8n Workflow Beispiel:",{"type":32,"tag":162,"props":1191,"children":1193},{"className":164,"code":1192,"language":158,"meta":16,"style":16},"\u002F\u002F A\u002FB Variant-Auswahl\nconst userId = $json.user_id;\nconst variant = (userId % 100 \u003C 50) ? 'v1' : 'v2';\nconst promptUrl = `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${variant}.md`;\n\n\u002F\u002F Metadaten zum API Call hinzufügen\nreturn {\n  json: {\n    prompt: await fetch(promptUrl).then(r => r.text()),\n    metadata: {\n      prompt_version: variant,\n      experiment_id: 'blog_tone_test_2026_05'\n    }\n  }\n};\n",[1194],{"type":32,"tag":62,"props":1195,"children":1196},{"__ignoreMap":16},[1197,1206,1229,1300,1335,1342,1350,1363,1371,1428,1436,1444,1457,1465,1472],{"type":32,"tag":171,"props":1198,"children":1199},{"class":173,"line":174},[1200],{"type":32,"tag":171,"props":1201,"children":1203},{"style":1202},"--shiki-default:#6A737D",[1204],{"type":37,"value":1205},"\u002F\u002F A\u002FB Variant-Auswahl\n",{"type":32,"tag":171,"props":1207,"children":1208},{"class":173,"line":190},[1209,1214,1219,1224],{"type":32,"tag":171,"props":1210,"children":1211},{"style":343},[1212],{"type":37,"value":1213},"const",{"type":32,"tag":171,"props":1215,"children":1216},{"style":275},[1217],{"type":37,"value":1218}," userId",{"type":32,"tag":171,"props":1220,"children":1221},{"style":343},[1222],{"type":37,"value":1223}," =",{"type":32,"tag":171,"props":1225,"children":1226},{"style":184},[1227],{"type":37,"value":1228}," $json.user_id;\n",{"type":32,"tag":171,"props":1230,"children":1231},{"class":173,"line":199},[1232,1236,1241,1245,1250,1255,1260,1265,1270,1275,1280,1285,1290,1295],{"type":32,"tag":171,"props":1233,"children":1234},{"style":343},[1235],{"type":37,"value":1213},{"type":32,"tag":171,"props":1237,"children":1238},{"style":275},[1239],{"type":37,"value":1240}," variant",{"type":32,"tag":171,"props":1242,"children":1243},{"style":343},[1244],{"type":37,"value":1223},{"type":32,"tag":171,"props":1246,"children":1247},{"style":184},[1248],{"type":37,"value":1249}," (userId ",{"type":32,"tag":171,"props":1251,"children":1252},{"style":343},[1253],{"type":37,"value":1254},"%",{"type":32,"tag":171,"props":1256,"children":1257},{"style":275},[1258],{"type":37,"value":1259}," 100",{"type":32,"tag":171,"props":1261,"children":1262},{"style":343},[1263],{"type":37,"value":1264}," \u003C",{"type":32,"tag":171,"props":1266,"children":1267},{"style":275},[1268],{"type":37,"value":1269}," 50",{"type":32,"tag":171,"props":1271,"children":1272},{"style":184},[1273],{"type":37,"value":1274},") ",{"type":32,"tag":171,"props":1276,"children":1277},{"style":343},[1278],{"type":37,"value":1279},"?",{"type":32,"tag":171,"props":1281,"children":1282},{"style":208},[1283],{"type":37,"value":1284}," 'v1'",{"type":32,"tag":171,"props":1286,"children":1287},{"style":343},[1288],{"type":37,"value":1289}," :",{"type":32,"tag":171,"props":1291,"children":1292},{"style":208},[1293],{"type":37,"value":1294}," 'v2'",{"type":32,"tag":171,"props":1296,"children":1297},{"style":184},[1298],{"type":37,"value":1299},";\n",{"type":32,"tag":171,"props":1301,"children":1302},{"class":173,"line":219},[1303,1307,1312,1316,1321,1326,1331],{"type":32,"tag":171,"props":1304,"children":1305},{"style":343},[1306],{"type":37,"value":1213},{"type":32,"tag":171,"props":1308,"children":1309},{"style":275},[1310],{"type":37,"value":1311}," promptUrl",{"type":32,"tag":171,"props":1313,"children":1314},{"style":343},[1315],{"type":37,"value":1223},{"type":32,"tag":171,"props":1317,"children":1318},{"style":208},[1319],{"type":37,"value":1320}," `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${",{"type":32,"tag":171,"props":1322,"children":1323},{"style":184},[1324],{"type":37,"value":1325},"variant",{"type":32,"tag":171,"props":1327,"children":1328},{"style":208},[1329],{"type":37,"value":1330},"}.md`",{"type":32,"tag":171,"props":1332,"children":1333},{"style":184},[1334],{"type":37,"value":1299},{"type":32,"tag":171,"props":1336,"children":1337},{"class":173,"line":233},[1338],{"type":32,"tag":171,"props":1339,"children":1340},{"emptyLinePlaceholder":367},[1341],{"type":37,"value":370},{"type":32,"tag":171,"props":1343,"children":1344},{"class":173,"line":242},[1345],{"type":32,"tag":171,"props":1346,"children":1347},{"style":1202},[1348],{"type":37,"value":1349},"\u002F\u002F Metadaten zum API Call hinzufügen\n",{"type":32,"tag":171,"props":1351,"children":1352},{"class":173,"line":250},[1353,1358],{"type":32,"tag":171,"props":1354,"children":1355},{"style":343},[1356],{"type":37,"value":1357},"return",{"type":32,"tag":171,"props":1359,"children":1360},{"style":184},[1361],{"type":37,"value":1362}," {\n",{"type":32,"tag":171,"props":1364,"children":1365},{"class":173,"line":267},[1366],{"type":32,"tag":171,"props":1367,"children":1368},{"style":184},[1369],{"type":37,"value":1370},"  json: {\n",{"type":32,"tag":171,"props":1372,"children":1373},{"class":173,"line":26},[1374,1379,1384,1389,1394,1399,1404,1409,1414,1419,1423],{"type":32,"tag":171,"props":1375,"children":1376},{"style":184},[1377],{"type":37,"value":1378},"    prompt: ",{"type":32,"tag":171,"props":1380,"children":1381},{"style":343},[1382],{"type":37,"value":1383},"await",{"type":32,"tag":171,"props":1385,"children":1386},{"style":178},[1387],{"type":37,"value":1388}," fetch",{"type":32,"tag":171,"props":1390,"children":1391},{"style":184},[1392],{"type":37,"value":1393},"(promptUrl).",{"type":32,"tag":171,"props":1395,"children":1396},{"style":178},[1397],{"type":37,"value":1398},"then",{"type":32,"tag":171,"props":1400,"children":1401},{"style":184},[1402],{"type":37,"value":1403},"(",{"type":32,"tag":171,"props":1405,"children":1406},{"style":599},[1407],{"type":37,"value":1408},"r",{"type":32,"tag":171,"props":1410,"children":1411},{"style":343},[1412],{"type":37,"value":1413}," =>",{"type":32,"tag":171,"props":1415,"children":1416},{"style":184},[1417],{"type":37,"value":1418}," r.",{"type":32,"tag":171,"props":1420,"children":1421},{"style":178},[1422],{"type":37,"value":37},{"type":32,"tag":171,"props":1424,"children":1425},{"style":184},[1426],{"type":37,"value":1427},"()),\n",{"type":32,"tag":171,"props":1429,"children":1430},{"class":173,"line":289},[1431],{"type":32,"tag":171,"props":1432,"children":1433},{"style":184},[1434],{"type":37,"value":1435},"    metadata: {\n",{"type":32,"tag":171,"props":1437,"children":1438},{"class":173,"line":618},[1439],{"type":32,"tag":171,"props":1440,"children":1441},{"style":184},[1442],{"type":37,"value":1443},"      prompt_version: variant,\n",{"type":32,"tag":171,"props":1445,"children":1446},{"class":173,"line":636},[1447,1452],{"type":32,"tag":171,"props":1448,"children":1449},{"style":184},[1450],{"type":37,"value":1451},"      experiment_id: ",{"type":32,"tag":171,"props":1453,"children":1454},{"style":208},[1455],{"type":37,"value":1456},"'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1458,"children":1459},{"class":173,"line":866},[1460],{"type":32,"tag":171,"props":1461,"children":1462},{"style":184},[1463],{"type":37,"value":1464},"    }\n",{"type":32,"tag":171,"props":1466,"children":1467},{"class":173,"line":889},[1468],{"type":32,"tag":171,"props":1469,"children":1470},{"style":184},[1471],{"type":37,"value":286},{"type":32,"tag":171,"props":1473,"children":1474},{"class":173,"line":910},[1475],{"type":32,"tag":171,"props":1476,"children":1477},{"style":184},[1478],{"type":37,"value":1479},"};\n",{"type":32,"tag":33,"props":1481,"children":1482},{},[1483],{"type":37,"value":1484},"Analyse in BigQuery:",{"type":32,"tag":162,"props":1486,"children":1490},{"className":1487,"code":1488,"language":1489,"meta":16,"style":16},"language-sql shiki shiki-themes github-dark","SELECT\n  metadata.value:prompt_version AS variant,\n  COUNT(DISTINCT user_id) AS users,\n  AVG(session_duration_sec) AS avg_duration,\n  SUM(conversion) \u002F COUNT(*) AS cvr\nFROM events\nWHERE experiment_id = 'blog_tone_test_2026_05'\n  AND event_date >= '2026-05-01'\nGROUP BY 1\n","sql",[1491],{"type":32,"tag":62,"props":1492,"children":1493},{"__ignoreMap":16},[1494,1502,1535,1566,1588,1633,1646,1668,1691],{"type":32,"tag":171,"props":1495,"children":1496},{"class":173,"line":174},[1497],{"type":32,"tag":171,"props":1498,"children":1499},{"style":343},[1500],{"type":37,"value":1501},"SELECT\n",{"type":32,"tag":171,"props":1503,"children":1504},{"class":173,"line":190},[1505,1510,1515,1520,1525,1530],{"type":32,"tag":171,"props":1506,"children":1507},{"style":275},[1508],{"type":37,"value":1509},"  metadata",{"type":32,"tag":171,"props":1511,"children":1512},{"style":184},[1513],{"type":37,"value":1514},".",{"type":32,"tag":171,"props":1516,"children":1517},{"style":275},[1518],{"type":37,"value":1519},"value",{"type":32,"tag":171,"props":1521,"children":1522},{"style":184},[1523],{"type":37,"value":1524},":prompt_version ",{"type":32,"tag":171,"props":1526,"children":1527},{"style":343},[1528],{"type":37,"value":1529},"AS",{"type":32,"tag":171,"props":1531,"children":1532},{"style":184},[1533],{"type":37,"value":1534}," variant,\n",{"type":32,"tag":171,"props":1536,"children":1537},{"class":173,"line":199},[1538,1543,1547,1552,1557,1561],{"type":32,"tag":171,"props":1539,"children":1540},{"style":275},[1541],{"type":37,"value":1542},"  COUNT",{"type":32,"tag":171,"props":1544,"children":1545},{"style":184},[1546],{"type":37,"value":1403},{"type":32,"tag":171,"props":1548,"children":1549},{"style":343},[1550],{"type":37,"value":1551},"DISTINCT",{"type":32,"tag":171,"props":1553,"children":1554},{"style":184},[1555],{"type":37,"value":1556}," user_id) ",{"type":32,"tag":171,"props":1558,"children":1559},{"style":343},[1560],{"type":37,"value":1529},{"type":32,"tag":171,"props":1562,"children":1563},{"style":184},[1564],{"type":37,"value":1565}," users,\n",{"type":32,"tag":171,"props":1567,"children":1568},{"class":173,"line":219},[1569,1574,1579,1583],{"type":32,"tag":171,"props":1570,"children":1571},{"style":275},[1572],{"type":37,"value":1573},"  AVG",{"type":32,"tag":171,"props":1575,"children":1576},{"style":184},[1577],{"type":37,"value":1578},"(session_duration_sec) ",{"type":32,"tag":171,"props":1580,"children":1581},{"style":343},[1582],{"type":37,"value":1529},{"type":32,"tag":171,"props":1584,"children":1585},{"style":184},[1586],{"type":37,"value":1587}," avg_duration,\n",{"type":32,"tag":171,"props":1589,"children":1590},{"class":173,"line":233},[1591,1596,1601,1606,1611,1615,1620,1624,1628],{"type":32,"tag":171,"props":1592,"children":1593},{"style":275},[1594],{"type":37,"value":1595},"  SUM",{"type":32,"tag":171,"props":1597,"children":1598},{"style":184},[1599],{"type":37,"value":1600},"(conversion) ",{"type":32,"tag":171,"props":1602,"children":1603},{"style":343},[1604],{"type":37,"value":1605},"\u002F",{"type":32,"tag":171,"props":1607,"children":1608},{"style":275},[1609],{"type":37,"value":1610}," COUNT",{"type":32,"tag":171,"props":1612,"children":1613},{"style":184},[1614],{"type":37,"value":1403},{"type":32,"tag":171,"props":1616,"children":1617},{"style":343},[1618],{"type":37,"value":1619},"*",{"type":32,"tag":171,"props":1621,"children":1622},{"style":184},[1623],{"type":37,"value":1274},{"type":32,"tag":171,"props":1625,"children":1626},{"style":343},[1627],{"type":37,"value":1529},{"type":32,"tag":171,"props":1629,"children":1630},{"style":184},[1631],{"type":37,"value":1632}," cvr\n",{"type":32,"tag":171,"props":1634,"children":1635},{"class":173,"line":242},[1636,1641],{"type":32,"tag":171,"props":1637,"children":1638},{"style":343},[1639],{"type":37,"value":1640},"FROM",{"type":32,"tag":171,"props":1642,"children":1643},{"style":184},[1644],{"type":37,"value":1645}," events\n",{"type":32,"tag":171,"props":1647,"children":1648},{"class":173,"line":250},[1649,1654,1659,1663],{"type":32,"tag":171,"props":1650,"children":1651},{"style":343},[1652],{"type":37,"value":1653},"WHERE",{"type":32,"tag":171,"props":1655,"children":1656},{"style":184},[1657],{"type":37,"value":1658}," experiment_id ",{"type":32,"tag":171,"props":1660,"children":1661},{"style":343},[1662],{"type":37,"value":401},{"type":32,"tag":171,"props":1664,"children":1665},{"style":208},[1666],{"type":37,"value":1667}," 'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1669,"children":1670},{"class":173,"line":267},[1671,1676,1681,1686],{"type":32,"tag":171,"props":1672,"children":1673},{"style":343},[1674],{"type":37,"value":1675},"  AND",{"type":32,"tag":171,"props":1677,"children":1678},{"style":184},[1679],{"type":37,"value":1680}," event_date ",{"type":32,"tag":171,"props":1682,"children":1683},{"style":343},[1684],{"type":37,"value":1685},">=",{"type":32,"tag":171,"props":1687,"children":1688},{"style":208},[1689],{"type":37,"value":1690}," '2026-05-01'\n",{"type":32,"tag":171,"props":1692,"children":1693},{"class":173,"line":26},[1694,1699],{"type":32,"tag":171,"props":1695,"children":1696},{"style":343},[1697],{"type":37,"value":1698},"GROUP BY",{"type":32,"tag":171,"props":1700,"children":1701},{"style":275},[1702],{"type":37,"value":1703}," 1\n",{"type":32,"tag":33,"props":1705,"children":1706},{},[1707],{"type":37,"value":1708},"Ergebnis: v2 Variante erhöht CVR von 0.042 auf 0.051 (+21%), p-value 0.003 — mit Vertrauen in Production übernehmen.",{"type":32,"tag":45,"props":1710,"children":1712},{"id":1711},"langsmith-observability-und-long-term-regression-detection",[1713],{"type":37,"value":1714},"LangSmith: Observability und Long-Term Regression Detection",{"type":32,"tag":33,"props":1716,"children":1717},{},[1718],{"type":37,"value":1719},"Promptfoo macht lokale Tests, LangSmith Production Observability. Jeder LLM Call wird getraced: Input, Output, Latency, Token Count, Model Version, Prompt Version.",{"type":32,"tag":33,"props":1721,"children":1722},{},[1723,1725,1730],{"type":37,"value":1724},"LangSmith-Vorteil: ",{"type":32,"tag":123,"props":1726,"children":1727},{},[1728],{"type":37,"value":1729},"Long-Term Metrik-Tracking",{"type":37,"value":1731},". Ein Bug in der Prompt-Version von vor 3 Monaten wird heute durch Feedback erkannt — gehe zur Trace, sehe Input\u002FOutput Diff, finde welche Version das damals war, mache Rollback.",{"type":32,"tag":33,"props":1733,"children":1734},{},[1735],{"type":37,"value":1736},"Beispiel Trace:",{"type":32,"tag":162,"props":1738,"children":1742},{"className":1739,"code":1740,"language":1741,"meta":16,"style":16},"language-json shiki shiki-themes github-dark","{\n  \"run_id\": \"abc123\",\n  \"prompt_version\": \"v2.1\",\n  \"model\": \"claude-3-5-sonnet-20241022\",\n  \"input\": {\"topic\": \"Server-Side GTM\", \"category\": \"tech\"},\n  \"output\": \"---\\ntitle: \\\"Server-Side GTM...\\\"\",\n  \"latency_ms\": 2341,\n  \"tokens\": {\"input\": 1842, \"output\": 1523},\n  \"cost_usd\": 0.0137,\n  \"feedback\": {\"score\": 4, \"comment\": \"Titel zu lang\"}\n}\n","json",[1743],{"type":32,"tag":62,"props":1744,"children":1745},{"__ignoreMap":16},[1746,1754,1775,1796,1817,1867,1917,1938,1986,2007,2055],{"type":32,"tag":171,"props":1747,"children":1748},{"class":173,"line":174},[1749],{"type":32,"tag":171,"props":1750,"children":1751},{"style":184},[1752],{"type":37,"value":1753},"{\n",{"type":32,"tag":171,"props":1755,"children":1756},{"class":173,"line":190},[1757,1762,1766,1771],{"type":32,"tag":171,"props":1758,"children":1759},{"style":275},[1760],{"type":37,"value":1761},"  \"run_id\"",{"type":32,"tag":171,"props":1763,"children":1764},{"style":184},[1765],{"type":37,"value":539},{"type":32,"tag":171,"props":1767,"children":1768},{"style":208},[1769],{"type":37,"value":1770},"\"abc123\"",{"type":32,"tag":171,"props":1772,"children":1773},{"style":184},[1774],{"type":37,"value":216},{"type":32,"tag":171,"props":1776,"children":1777},{"class":173,"line":199},[1778,1783,1787,1792],{"type":32,"tag":171,"props":1779,"children":1780},{"style":275},[1781],{"type":37,"value":1782},"  \"prompt_version\"",{"type":32,"tag":171,"props":1784,"children":1785},{"style":184},[1786],{"type":37,"value":539},{"type":32,"tag":171,"props":1788,"children":1789},{"style":208},[1790],{"type":37,"value":1791},"\"v2.1\"",{"type":32,"tag":171,"props":1793,"children":1794},{"style":184},[1795],{"type":37,"value":216},{"type":32,"tag":171,"props":1797,"children":1798},{"class":173,"line":219},[1799,1804,1808,1813],{"type":32,"tag":171,"props":1800,"children":1801},{"style":275},[1802],{"type":37,"value":1803},"  \"model\"",{"type":32,"tag":171,"props":1805,"children":1806},{"style":184},[1807],{"type":37,"value":539},{"type":32,"tag":171,"props":1809,"children":1810},{"style":208},[1811],{"type":37,"value":1812},"\"claude-3-5-sonnet-20241022\"",{"type":32,"tag":171,"props":1814,"children":1815},{"style":184},[1816],{"type":37,"value":216},{"type":32,"tag":171,"props":1818,"children":1819},{"class":173,"line":233},[1820,1825,1830,1835,1839,1844,1848,1853,1857,1862],{"type":32,"tag":171,"props":1821,"children":1822},{"style":275},[1823],{"type":37,"value":1824},"  \"input\"",{"type":32,"tag":171,"props":1826,"children":1827},{"style":184},[1828],{"type":37,"value":1829},": {",{"type":32,"tag":171,"props":1831,"children":1832},{"style":275},[1833],{"type":37,"value":1834},"\"topic\"",{"type":32,"tag":171,"props":1836,"children":1837},{"style":184},[1838],{"type":37,"value":539},{"type":32,"tag":171,"props":1840,"children":1841},{"style":208},[1842],{"type":37,"value":1843},"\"Server-Side GTM\"",{"type":32,"tag":171,"props":1845,"children":1846},{"style":184},[1847],{"type":37,"value":416},{"type":32,"tag":171,"props":1849,"children":1850},{"style":275},[1851],{"type":37,"value":1852},"\"category\"",{"type":32,"tag":171,"props":1854,"children":1855},{"style":184},[1856],{"type":37,"value":539},{"type":32,"tag":171,"props":1858,"children":1859},{"style":208},[1860],{"type":37,"value":1861},"\"tech\"",{"type":32,"tag":171,"props":1863,"children":1864},{"style":184},[1865],{"type":37,"value":1866},"},\n",{"type":32,"tag":171,"props":1868,"children":1869},{"class":173,"line":242},[1870,1875,1879,1884,1889,1894,1899,1904,1908,1913],{"type":32,"tag":171,"props":1871,"children":1872},{"style":275},[1873],{"type":37,"value":1874},"  \"output\"",{"type":32,"tag":171,"props":1876,"children":1877},{"style":184},[1878],{"type":37,"value":539},{"type":32,"tag":171,"props":1880,"children":1881},{"style":208},[1882],{"type":37,"value":1883},"\"---",{"type":32,"tag":171,"props":1885,"children":1886},{"style":275},[1887],{"type":37,"value":1888},"\\n",{"type":32,"tag":171,"props":1890,"children":1891},{"style":208},[1892],{"type":37,"value":1893},"title: ",{"type":32,"tag":171,"props":1895,"children":1896},{"style":275},[1897],{"type":37,"value":1898},"\\\"",{"type":32,"tag":171,"props":1900,"children":1901},{"style":208},[1902],{"type":37,"value":1903},"Server-Side GTM...",{"type":32,"tag":171,"props":1905,"children":1906},{"style":275},[1907],{"type":37,"value":1898},{"type":32,"tag":171,"props":1909,"children":1910},{"style":208},[1911],{"type":37,"value":1912},"\"",{"type":32,"tag":171,"props":1914,"children":1915},{"style":184},[1916],{"type":37,"value":216},{"type":32,"tag":171,"props":1918,"children":1919},{"class":173,"line":250},[1920,1925,1929,1934],{"type":32,"tag":171,"props":1921,"children":1922},{"style":275},[1923],{"type":37,"value":1924},"  \"latency_ms\"",{"type":32,"tag":171,"props":1926,"children":1927},{"style":184},[1928],{"type":37,"value":539},{"type":32,"tag":171,"props":1930,"children":1931},{"style":275},[1932],{"type":37,"value":1933},"2341",{"type":32,"tag":171,"props":1935,"children":1936},{"style":184},[1937],{"type":37,"value":216},{"type":32,"tag":171,"props":1939,"children":1940},{"class":173,"line":267},[1941,1946,1950,1955,1959,1964,1968,1973,1977,1982],{"type":32,"tag":171,"props":1942,"children":1943},{"style":275},[1944],{"type":37,"value":1945},"  \"tokens\"",{"type":32,"tag":171,"props":1947,"children":1948},{"style":184},[1949],{"type":37,"value":1829},{"type":32,"tag":171,"props":1951,"children":1952},{"style":275},[1953],{"type":37,"value":1954},"\"input\"",{"type":32,"tag":171,"props":1956,"children":1957},{"style":184},[1958],{"type":37,"value":539},{"type":32,"tag":171,"props":1960,"children":1961},{"style":275},[1962],{"type":37,"value":1963},"1842",{"type":32,"tag":171,"props":1965,"children":1966},{"style":184},[1967],{"type":37,"value":416},{"type":32,"tag":171,"props":1969,"children":1970},{"style":275},[1971],{"type":37,"value":1972},"\"output\"",{"type":32,"tag":171,"props":1974,"children":1975},{"style":184},[1976],{"type":37,"value":539},{"type":32,"tag":171,"props":1978,"children":1979},{"style":275},[1980],{"type":37,"value":1981},"1523",{"type":32,"tag":171,"props":1983,"children":1984},{"style":184},[1985],{"type":37,"value":1866},{"type":32,"tag":171,"props":1987,"children":1988},{"class":173,"line":26},[1989,1994,1998,2003],{"type":32,"tag":171,"props":1990,"children":1991},{"style":275},[1992],{"type":37,"value":1993},"  \"cost_usd\"",{"type":32,"tag":171,"props":1995,"children":1996},{"style":184},[1997],{"type":37,"value":539},{"type":32,"tag":171,"props":1999,"children":2000},{"style":275},[2001],{"type":37,"value":2002},"0.0137",{"type":32,"tag":171,"props":2004,"children":2005},{"style":184},[2006],{"type":37,"value":216},{"type":32,"tag":171,"props":2008,"children":2009},{"class":173,"line":289},[2010,2015,2019,2023,2027,2032,2036,2041,2045,2050],{"type":32,"tag":171,"props":2011,"children":2012},{"style":275},[2013],{"type":37,"value":2014},"  \"feedback\"",{"type":32,"tag":171,"props":2016,"children":2017},{"style":184},[2018],{"type":37,"value":1829},{"type":32,"tag":171,"props":2020,"children":2021},{"style":275},[2022],{"type":37,"value":534},{"type":32,"tag":171,"props":2024,"children":2025},{"style":184},[2026],{"type":37,"value":539},{"type":32,"tag":171,"props":2028,"children":2029},{"style":275},[2030],{"type":37,"value":2031},"4",{"type":32,"tag":171,"props":2033,"children":2034},{"style":184},[2035],{"type":37,"value":416},{"type":32,"tag":171,"props":2037,"children":2038},{"style":275},[2039],{"type":37,"value":2040},"\"comment\"",{"type":32,"tag":171,"props":2042,"children":2043},{"style":184},[2044],{"type":37,"value":539},{"type":32,"tag":171,"props":2046,"children":2047},{"style":208},[2048],{"type":37,"value":2049},"\"Titel zu lang\"",{"type":32,"tag":171,"props":2051,"children":2052},{"style":184},[2053],{"type":37,"value":2054},"}\n",{"type":32,"tag":171,"props":2056,"children":2057},{"class":173,"line":618},[2058],{"type":32,"tag":171,"props":2059,"children":2060},{"style":184},[2061],{"type":37,"value":2054},{"type":32,"tag":33,"props":2063,"children":2064},{},[2065],{"type":37,"value":2066},"Feedback Loop: Redaktion gibt jedem Blog 1-5 Punkte, LangSmith bindet diese an die Trace, Wochenbericht zeigt „v2.3 Version durchschnittliche Score auf 3.2 gefallen\" — sofort Rollback → Prompt Diff sehen → Problem identifizieren → fixen.",{"type":32,"tag":688,"props":2068,"children":2070},{"id":2069},"dataset-management-golden-set-unter-versionskontrolle",[2071],{"type":37,"value":2072},"Dataset Management: Golden Set unter Versionskontrolle",{"type":32,"tag":33,"props":2074,"children":2075},{},[2076,2078,2083],{"type":37,"value":2077},"Das Herz der Eval Pipeline ist das ",{"type":32,"tag":123,"props":2079,"children":2080},{},[2081],{"type":37,"value":2082},"Golden Dataset",{"type":37,"value":2084}," — bekannte Input\u002FOutput Paare, Referenz für erwartetes Verhalten. Dieses in Notion zu halten, manuell in Google Sheets zu aktualisieren, ist Regression-Risiko.",{"type":32,"tag":33,"props":2086,"children":2087},{},[2088],{"type":37,"value":2089},"LangSmith Dataset unter Versionskontrolle:",{"type":32,"tag":162,"props":2091,"children":2093},{"className":331,"code":2092,"language":333,"meta":16,"style":16},"from langsmith import Client\n\nclient = Client()\n\ndataset = client.create_dataset(\"marketing_blog_golden_v3\")\n\n# Golden Beispiele hinzufügen\nexamples = [\n    {\n        \"inputs\": {\"topic\": \"Server-Side GTM\", \"category\": \"tech\"},\n        \"outputs\": {\"title\": \"Server-Side GTM: Messung nach Cookies\"},\n        \"metadata\": {\"expected_h2_count\": 5, \"expected_word_count\": 1500}\n    },\n    # 50+ Beispiele...\n]\n\nfor ex in examples:\n    client.create_example(**ex, dataset_id=dataset.id)\n",[2094],{"type":32,"tag":62,"props":2095,"children":2096},{"__ignoreMap":16},[2097,2117,2124,2141,2148,2174,2181,2189,2206,2214,2258,2288,2336,2344,2352,2359,2366,2387],{"type":32,"tag":171,"props":2098,"children":2099},{"class":173,"line":174},[2100,2104,2108,2112],{"type":32,"tag":171,"props":2101,"children":2102},{"style":343},[2103],{"type":37,"value":346},{"type":32,"tag":171,"props":2105,"children":2106},{"style":184},[2107],{"type":37,"value":351},{"type":32,"tag":171,"props":2109,"children":2110},{"style":343},[2111],{"type":37,"value":356},{"type":32,"tag":171,"props":2113,"children":2114},{"style":184},[2115],{"type":37,"value":2116}," Client\n",{"type":32,"tag":171,"props":2118,"children":2119},{"class":173,"line":190},[2120],{"type":32,"tag":171,"props":2121,"children":2122},{"emptyLinePlaceholder":367},[2123],{"type":37,"value":370},{"type":32,"tag":171,"props":2125,"children":2126},{"class":173,"line":199},[2127,2132,2136],{"type":32,"tag":171,"props":2128,"children":2129},{"style":184},[2130],{"type":37,"value":2131},"client ",{"type":32,"tag":171,"props":2133,"children":2134},{"style":343},[2135],{"type":37,"value":401},{"type":32,"tag":171,"props":2137,"children":2138},{"style":184},[2139],{"type":37,"value":2140}," Client()\n",{"type":32,"tag":171,"props":2142,"children":2143},{"class":173,"line":219},[2144],{"type":32,"tag":171,"props":2145,"children":2146},{"emptyLinePlaceholder":367},[2147],{"type":37,"value":370},{"type":32,"tag":171,"props":2149,"children":2150},{"class":173,"line":233},[2151,2156,2160,2165,2170],{"type":32,"tag":171,"props":2152,"children":2153},{"style":184},[2154],{"type":37,"value":2155},"dataset ",{"type":32,"tag":171,"props":2157,"children":2158},{"style":343},[2159],{"type":37,"value":401},{"type":32,"tag":171,"props":2161,"children":2162},{"style":184},[2163],{"type":37,"value":2164}," client.create_dataset(",{"type":32,"tag":171,"props":2166,"children":2167},{"style":208},[2168],{"type":37,"value":2169},"\"marketing_blog_golden_v3\"",{"type":32,"tag":171,"props":2171,"children":2172},{"style":184},[2173],{"type":37,"value":642},{"type":32,"tag":171,"props":2175,"children":2176},{"class":173,"line":242},[2177],{"type":32,"tag":171,"props":2178,"children":2179},{"emptyLinePlaceholder":367},[2180],{"type":37,"value":370},{"type":32,"tag":171,"props":2182,"children":2183},{"class":173,"line":250},[2184],{"type":32,"tag":171,"props":2185,"children":2186},{"style":1202},[2187],{"type":37,"value":2188},"# Golden Beispiele hinzufügen\n",{"type":32,"tag":171,"props":2190,"children":2191},{"class":173,"line":267},[2192,2197,2201],{"type":32,"tag":171,"props":2193,"children":2194},{"style":184},[2195],{"type":37,"value":2196},"examples ",{"type":32,"tag":171,"props":2198,"children":2199},{"style":343},[2200],{"type":37,"value":401},{"type":32,"tag":171,"props":2202,"children":2203},{"style":184},[2204],{"type":37,"value":2205}," [\n",{"type":32,"tag":171,"props":2207,"children":2208},{"class":173,"line":26},[2209],{"type":32,"tag":171,"props":2210,"children":2211},{"style":184},[2212],{"type":37,"value":2213},"    {\n",{"type":32,"tag":171,"props":2215,"children":2216},{"class":173,"line":289},[2217,2222,2226,2230,2234,2238,2242,2246,2250,2254],{"type":32,"tag":171,"props":2218,"children":2219},{"style":208},[2220],{"type":37,"value":2221},"        \"inputs\"",{"type":32,"tag":171,"props":2223,"children":2224},{"style":184},[2225],{"type":37,"value":1829},{"type":32,"tag":171,"props":2227,"children":2228},{"style":208},[2229],{"type":37,"value":1834},{"type":32,"tag":171,"props":2231,"children":2232},{"style":184},[2233],{"type":37,"value":539},{"type":32,"tag":171,"props":2235,"children":2236},{"style":208},[2237],{"type":37,"value":1843},{"type":32,"tag":171,"props":2239,"children":2240},{"style":184},[2241],{"type":37,"value":416},{"type":32,"tag":171,"props":2243,"children":2244},{"style":208},[2245],{"type":37,"value":1852},{"type":32,"tag":171,"props":2247,"children":2248},{"style":184},[2249],{"type":37,"value":539},{"type":32,"tag":171,"props":2251,"children":2252},{"style":208},[2253],{"type":37,"value":1861},{"type":32,"tag":171,"props":2255,"children":2256},{"style":184},[2257],{"type":37,"value":1866},{"type":32,"tag":171,"props":2259,"children":2260},{"class":173,"line":618},[2261,2266,2270,2275,2279,2284],{"type":32,"tag":171,"props":2262,"children":2263},{"style":208},[2264],{"type":37,"value":2265},"        \"outputs\"",{"type":32,"tag":171,"props":2267,"children":2268},{"style":184},[2269],{"type":37,"value":1829},{"type":32,"tag":171,"props":2271,"children":2272},{"style":208},[2273],{"type":37,"value":2274},"\"title\"",{"type":32,"tag":171,"props":2276,"children":2277},{"style":184},[2278],{"type":37,"value":539},{"type":32,"tag":171,"props":2280,"children":2281},{"style":208},[2282],{"type":37,"value":2283},"\"Server-Side GTM: Messung nach Cookies\"",{"type":32,"tag":171,"props":2285,"children":2286},{"style":184},[2287],{"type":37,"value":1866},{"type":32,"tag":171,"props":2289,"children":2290},{"class":173,"line":636},[2291,2296,2300,2305,2309,2314,2318,2323,2327,2332],{"type":32,"tag":171,"props":2292,"children":2293},{"style":208},[2294],{"type":37,"value":2295},"        \"metadata\"",{"type":32,"tag":171,"props":2297,"children":2298},{"style":184},[2299],{"type":37,"value":1829},{"type":32,"tag":171,"props":2301,"children":2302},{"style":208},[2303],{"type":37,"value":2304},"\"expected_h2_count\"",{"type":32,"tag":171,"props":2306,"children":2307},{"style":184},[2308],{"type":37,"value":539},{"type":32,"tag":171,"props":2310,"children":2311},{"style":275},[2312],{"type":37,"value":2313},"5",{"type":32,"tag":171,"props":2315,"children":2316},{"style":184},[2317],{"type":37,"value":416},{"type":32,"tag":171,"props":2319,"children":2320},{"style":208},[2321],{"type":37,"value":2322},"\"expected_word_count\"",{"type":32,"tag":171,"props":2324,"children":2325},{"style":184},[2326],{"type":37,"value":539},{"type":32,"tag":171,"props":2328,"children":2329},{"style":275},[2330],{"type":37,"value":2331},"1500",{"type":32,"tag":171,"props":2333,"children":2334},{"style":184},[2335],{"type":37,"value":2054},{"type":32,"tag":171,"props":2337,"children":2338},{"class":173,"line":866},[2339],{"type":32,"tag":171,"props":2340,"children":2341},{"style":184},[2342],{"type":37,"value":2343},"    },\n",{"type":32,"tag":171,"props":2345,"children":2346},{"class":173,"line":889},[2347],{"type":32,"tag":171,"props":2348,"children":2349},{"style":1202},[2350],{"type":37,"value":2351},"    # 50+ Beispiele...\n",{"type":32,"tag":171,"props":2353,"children":2354},{"class":173,"line":910},[2355],{"type":32,"tag":171,"props":2356,"children":2357},{"style":184},[2358],{"type":37,"value":295},{"type":32,"tag":171,"props":2360,"children":2361},{"class":173,"line":928},[2362],{"type":32,"tag":171,"props":2363,"children":2364},{"emptyLinePlaceholder":367},[2365],{"type":37,"value":370},{"type":32,"tag":171,"props":2367,"children":2368},{"class":173,"line":949},[2369,2373,2378,2382],{"type":32,"tag":171,"props":2370,"children":2371},{"style":343},[2372],{"type":37,"value":483},{"type":32,"tag":171,"props":2374,"children":2375},{"style":184},[2376],{"type":37,"value":2377}," ex ",{"type":32,"tag":171,"props":2379,"children":2380},{"style":343},[2381],{"type":37,"value":493},{"type":32,"tag":171,"props":2383,"children":2384},{"style":184},[2385],{"type":37,"value":2386}," examples:\n",{"type":32,"tag":171,"props":2388,"children":2389},{"class":173,"line":966},[2390,2395,2400,2405,2410,2414],{"type":32,"tag":171,"props":2391,"children":2392},{"style":184},[2393],{"type":37,"value":2394},"    client.create_example(",{"type":32,"tag":171,"props":2396,"children":2397},{"style":343},[2398],{"type":37,"value":2399},"**",{"type":32,"tag":171,"props":2401,"children":2402},{"style":184},[2403],{"type":37,"value":2404},"ex, ",{"type":32,"tag":171,"props":2406,"children":2407},{"style":599},[2408],{"type":37,"value":2409},"dataset_id",{"type":32,"tag":171,"props":2411,"children":2412},{"style":343},[2413],{"type":37,"value":401},{"type":32,"tag":171,"props":2415,"children":2416},{"style":184},[2417],{"type":37,"value":2418},"dataset.id)\n",{"type":32,"tag":33,"props":2420,"children":2421},{},[2422],{"type":37,"value":2423},"Bei jeder Prompt-Änderung gegen diesen Dataset testen. Pass Rate fällt? — Nicht deployen. Neue Edge Cases zum Dataset hinzufügen (Bugs aus Production), damit keine Regression entsteht.",{"type":32,"tag":45,"props":2425,"children":2427},{"id":2426},"tradeoff-deterministische-metriken-vs-creative-output",[2428],{"type":37,"value":2429},"Tradeoff: Deterministische Metriken vs Creative Output",{"type":32,"tag":33,"props":2431,"children":2432},{},[2433],{"type":37,"value":2434},"LLMs Kraft ist, non-deterministisch zu sein — gleicher Input, anderer Output. Aber in Production-Systemen ist diese Kraft auch Risiko: Nutzer sieht bei jedem Seiten-Refresh anderes Markdown, manche sind fehlerhaft.",{"type":32,"tag":33,"props":2436,"children":2437},{},[2438],{"type":37,"value":2439},"Temperature 0 erhöht Determinismus, aber Output wird eintönig. Tradeoff:",{"type":32,"tag":84,"props":2441,"children":2442},{},[2443,2453,2463],{"type":32,"tag":88,"props":2444,"children":2445},{},[2446,2451],{"type":32,"tag":123,"props":2447,"children":2448},{},[2449],{"type":37,"value":2450},"Temperature 0",{"type":37,"value":2452},": ideal für Eval Suite, Production zu monoton",{"type":32,"tag":88,"props":2454,"children":2455},{},[2456,2461],{"type":32,"tag":123,"props":2457,"children":2458},{},[2459],{"type":37,"value":2460},"Temperature 0.3-0.5",{"type":37,"value":2462},": angemessene Vielfalt, immer noch konsistent",{"type":32,"tag":88,"props":2464,"children":2465},{},[2466,2471],{"type":32,"tag":123,"props":2467,"children":2468},{},[2469],{"type":37,"value":2470},"Temperature 0.7+",{"type":37,"value":2472},": kreativ, aber selbst wenn Test Suite grün ist, Production Überraschungen",{"type":32,"tag":33,"props":2474,"children":2475},{},[2476],{"type":37,"value":2477},"Lösung: Temperature 0 in Eval, 0.4 in Production, im Golden Set für jeden Input 5 akzeptable Outputs speichern (Range-Kontrolle).",{"type":32,"tag":33,"props":2479,"children":2480},{},[2481,2483,2488],{"type":37,"value":2482},"Anderer Tradeoff: ",{"type":32,"tag":123,"props":2484,"children":2485},{},[2486],{"type":37,"value":2487},"Latency vs Qualität",{"type":37,"value":2489},". Längere Prompts geben besseren Output, aber Input-Token-Kosten steigen, Latency wächst. In Promptfoo: Wenn Latency 2.5s überschreitet, Alert abfeuern — Nutzererlebnis nicht verderben.",{"type":32,"tag":45,"props":2491,"children":2493},{"id":2492},"production-checklist-llm-system-deployen",[2494],{"type":37,"value":2495},"Production Checklist: LLM-System deployen",{"type":32,"tag":33,"props":2497,"children":2498},{},[2499],{"type":37,"value":2500},"Vor dem Deploy Kontrolliste:",{"type":32,"tag":84,"props":2502,"children":2505},{"className":2503},[2504],"contains-task-list",[2506,2518,2527,2536,2545,2554,2563,2572,2581],{"type":32,"tag":88,"props":2507,"children":2510},{"className":2508},[2509],"task-list-item",[2511,2516],{"type":32,"tag":2512,"props":2513,"children":2515},"input",{"disabled":367,"type":2514},"checkbox",[],{"type":37,"value":2517}," Prompt in Git Repo, Commit History sauber",{"type":32,"tag":88,"props":2519,"children":2521},{"className":2520},[2509],[2522,2525],{"type":32,"tag":2512,"props":2523,"children":2524},{"disabled":367,"type":2514},[],{"type":37,"value":2526}," Promptfoo Eval Suite Pass Rate > %95",{"type":32,"tag":88,"props":2528,"children":2530},{"className":2529},[2509],[2531,2534],{"type":32,"tag":2512,"props":2532,"children":2533},{"disabled":367,"type":2514},[],{"type":37,"value":2535}," Golden Dataset min 50 Beispiele",{"type":32,"tag":88,"props":2537,"children":2539},{"className":2538},[2509],[2540,2543],{"type":32,"tag":2512,"props":2541,"children":2542},{"disabled":367,"type":2514},[],{"type":37,"value":2544}," A\u002FB-Test Plan bereit, Sample Size berechnet",{"type":32,"tag":88,"props":2546,"children":2548},{"className":2547},[2509],[2549,2552],{"type":32,"tag":2512,"props":2550,"children":2551},{"disabled":367,"type":2514},[],{"type":37,"value":2553}," LangSmith Trace an, API Key in Production",{"type":32,"tag":88,"props":2555,"children":2557},{"className":2556},[2509],[2558,2561],{"type":32,"tag":2512,"props":2559,"children":2560},{"disabled":367,"type":2514},[],{"type":37,"value":2562}," Feedback Loop aufgebaut (Redaktion bewertet, BigQuery Join)",{"type":32,"tag":88,"props":2564,"children":2566},{"className":2565},[2509],[2567,2570],{"type":32,"tag":2512,"props":2568,"children":2569},{"disabled":367,"type":2514},[],{"type":37,"value":2571}," Rollback-Prozedur definiert (welche Metrik-Schwelle = automatisches Rollback)",{"type":32,"tag":88,"props":2573,"children":2575},{"className":2574},[2509],[2576,2579],{"type":32,"tag":2512,"props":2577,"children":2578},{"disabled":367,"type":2514},[],{"type":37,"value":2580}," Cost Monitoring — tägliches Token-Spend Threshold $X",{"type":32,"tag":88,"props":2582,"children":2584},{"className":2583},[2509],[2585,2588],{"type":32,"tag":2512,"props":2586,"children":2587},{"disabled":367,"type":2514},[],{"type":37,"value":2589}," Latency SLA — p95 \u003C 3s",{"type":32,"tag":33,"props":2591,"children":2592},{},[2593],{"type":37,"value":2594},"Ohne diese Checkliste hast du keine „KI-Services\" — du hast kontrolliertes Chaos.",{"type":32,"tag":2596,"props":2597,"children":2598},"hr",{},[],{"type":32,"tag":33,"props":2600,"children":2601},{},[2602,2604,2611],{"type":37,"value":2603},"Prompt-Versionierung ist Disziplinfrage — nicht für Geschwindigkeit, für Zuverlässigkeit. In Taktiken wie ",{"type":32,"tag":677,"props":2605,"children":2608},{"href":2606,"rel":2607},"https:\u002F\u002Fwww.roibase.com.tr\u002Fru\u002Fgeo",[681],[2609],{"type":37,"value":2610},"Generative Engine Optimization",{"type":37,"value":2612}," bindet sich Output-Qualität direkt an Business Outcome. Ohne Eval Pipeline setzt jedes Deployment alte Performance aufs Spiel. Promptfoo gibt lokale Sicherheit, LangSmith Production Sichtbarkeit. Zusammen heben sie LLM-Operationen auf Softwareengineering-Standard.",{"type":32,"tag":2614,"props":2615,"children":2616},"style",{},[2617],{"type":37,"value":2618},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":199,"depth":199,"links":2620},[2621,2622,2625,2626,2629,2630],{"id":47,"depth":190,"text":50},{"id":110,"depth":190,"text":113,"children":2623},[2624],{"id":690,"depth":199,"text":693},{"id":1124,"depth":190,"text":1127},{"id":1711,"depth":190,"text":1714,"children":2627},[2628],{"id":2069,"depth":199,"text":2072},{"id":2426,"depth":190,"text":2429},{"id":2492,"depth":190,"text":2495},"markdown","content:ru:ai:prompt-versionierung-und-ab-tests-llm-ops-disziplin.md","content","ru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin.md","ru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin","md",1778709808772]