[{"data":1,"prerenderedAt":2617},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fen\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing":13},{"i18nKey":4,"paths":5},"ai-004-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Fprompt-versionierung-llm-evaluation","\u002Fen\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing","\u002Fes\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops","\u002Ffr\u002Fai\u002Fversionamento-prompt-ab-test","\u002Fit\u002Fai\u002Fversionamento-prompt-e-a-b-test-disciplina-llm-ops","\u002Fru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin","\u002Ftr\u002Fai\u002Fprompt-versiyonlama-ve-a-b-testi-llm-operasyonun-disiplini",{"_path":7,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":2611,"_id":2612,"_source":2613,"_file":2614,"_stem":2615,"_extension":2616},"ai",false,"","Prompt Versioning and A\u002FB Testing: The Discipline of LLM Operations","How to build deterministic quality control in production LLM systems using prompt versioning, evaluation pipelines, and tools like Promptfoo and LangSmith.","2026-05-13",[21,22,23,24,25],"llm-ops","prompt-engineering","evaluation","mlops","ai-quality",8,"Roibase",{"type":29,"children":30,"toc":2599},"root",[31,39,44,51,56,78,83,103,108,114,119,130,148,161,296,306,324,329,643,653,671,676,683,688,693,1009,1022,1106,1111,1117,1122,1174,1179,1469,1474,1693,1698,1704,1709,1721,1726,2051,2056,2062,2074,2079,2408,2413,2419,2424,2429,2462,2467,2479,2485,2490,2579,2584,2588,2593],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","In systems using LLMs, there are 15 steps between \"it works\" and \"reliable in production.\" Your marketing automation generates Claude API markdown output, GPT handles customer journey segmentation — but when you change the prompt, how do you know you haven't introduced regression? In software engineering, versioning, test coverage, and CI\u002FCD are standard. In LLM operations, without this discipline, every deployment is a gamble.",{"type":32,"tag":33,"props":40,"children":41},{},[42],{"type":37,"value":43},"Tools like Promptfoo and LangSmith enforce this discipline: prompt versioning, deterministic evaluation, A\u002FB testing, metric tracking. This article shows how to build quality control into production LLM systems — not at the code level, but at the infrastructure level.",{"type":32,"tag":45,"props":46,"children":48},"h2",{"id":47},"the-misconception-that-prompts-arent-software-code",[49],{"type":37,"value":50},"The Misconception That Prompts Aren't Software Code",{"type":32,"tag":33,"props":52,"children":53},{},[54],{"type":37,"value":55},"Most teams treat prompts as \"configuration files\" — text editors in UIs, documentation in Notion, hardcoded text nodes in n8n workflows. In reality, prompts are executable specifications that define system behavior. But there's no versioning, no diffs, no rollbacks.",{"type":32,"tag":33,"props":57,"children":58},{},[59,61,68,70,76],{"type":37,"value":60},"A git commit message like \"fix typo\" can change the tone of model output and drop metrics. Especially in structured output scenarios (JSON schema, markdown frontmatter, SQL queries), a single word change in format can break parsing and cascade failures. Example: changing ",{"type":32,"tag":62,"props":63,"children":65},"code",{"className":64},[],[66],{"type":37,"value":67},"OUTPUT FORMAT: JSON",{"type":37,"value":69}," to ",{"type":32,"tag":62,"props":71,"children":73},{"className":72},[],[74],{"type":37,"value":75},"OUTPUT FORMAT: Valid JSON",{"type":37,"value":77}," sometimes causes the model to add an explanatory paragraph — downstream parser crashes, alerts fire, three hours of debugging.",{"type":32,"tag":33,"props":79,"children":80},{},[81],{"type":37,"value":82},"Versioning discipline should answer these questions:",{"type":32,"tag":84,"props":85,"children":86},"ul",{},[87,93,98],{"type":32,"tag":88,"props":89,"children":90},"li",{},[91],{"type":37,"value":92},"Which prompt version is currently in production?",{"type":32,"tag":88,"props":94,"children":95},{},[96],{"type":37,"value":97},"What's the performance difference between the current version and the one from two weeks ago?",{"type":32,"tag":88,"props":99,"children":100},{},[101],{"type":37,"value":102},"In an A\u002FB test, which variant increased conversion by 8%?",{"type":32,"tag":33,"props":104,"children":105},{},[106],{"type":37,"value":107},"If you can't answer these questions, you're not running \"AI operations\" — you're running manual experiments.",{"type":32,"tag":45,"props":109,"children":111},{"id":110},"evaluation-pipeline-three-layers-of-measuring-output",[112],{"type":37,"value":113},"Evaluation Pipeline: Three Layers of Measuring Output",{"type":32,"tag":33,"props":115,"children":116},{},[117],{"type":37,"value":118},"Evaluating LLM output seems subjective, but building deterministic metrics in production systems is possible. Evaluation works across three layers: syntax, semantics, and business outcome.",{"type":32,"tag":33,"props":120,"children":121},{},[122,128],{"type":32,"tag":123,"props":124,"children":125},"strong",{},[126],{"type":37,"value":127},"Syntax layer",{"type":37,"value":129}," — format compliance:",{"type":32,"tag":84,"props":131,"children":132},{},[133,138,143],{"type":32,"tag":88,"props":134,"children":135},{},[136],{"type":37,"value":137},"Does JSON parse correctly?",{"type":32,"tag":88,"props":139,"children":140},{},[141],{"type":37,"value":142},"Is markdown frontmatter valid?",{"type":32,"tag":88,"props":144,"children":145},{},[146],{"type":37,"value":147},"Are expected fields present?",{"type":32,"tag":33,"props":149,"children":150},{},[151,153,159],{"type":37,"value":152},"In Promptfoo, controlled with ",{"type":32,"tag":62,"props":154,"children":156},{"className":155},[],[157],{"type":37,"value":158},"javascript",{"type":37,"value":160}," assertions:",{"type":32,"tag":162,"props":163,"children":166},"pre",{"className":164,"code":165,"language":158,"meta":16,"style":16},"language-javascript shiki shiki-themes github-dark","assert: [\n  {\n    type: \"javascript\",\n    value: \"JSON.parse(output).title.length \u003C= 60\"\n  },\n  {\n    type: \"is-json\",\n    value: true\n  }\n]\n",[167],{"type":32,"tag":62,"props":168,"children":169},{"__ignoreMap":16},[170,188,197,217,231,240,248,265,278,287],{"type":32,"tag":171,"props":172,"children":175},"span",{"class":173,"line":174},"line",1,[176,182],{"type":32,"tag":171,"props":177,"children":179},{"style":178},"--shiki-default:#B392F0",[180],{"type":37,"value":181},"assert",{"type":32,"tag":171,"props":183,"children":185},{"style":184},"--shiki-default:#E1E4E8",[186],{"type":37,"value":187},": [\n",{"type":32,"tag":171,"props":189,"children":191},{"class":173,"line":190},2,[192],{"type":32,"tag":171,"props":193,"children":194},{"style":184},[195],{"type":37,"value":196},"  {\n",{"type":32,"tag":171,"props":198,"children":200},{"class":173,"line":199},3,[201,206,212],{"type":32,"tag":171,"props":202,"children":203},{"style":184},[204],{"type":37,"value":205},"    type: ",{"type":32,"tag":171,"props":207,"children":209},{"style":208},"--shiki-default:#9ECBFF",[210],{"type":37,"value":211},"\"javascript\"",{"type":32,"tag":171,"props":213,"children":214},{"style":184},[215],{"type":37,"value":216},",\n",{"type":32,"tag":171,"props":218,"children":220},{"class":173,"line":219},4,[221,226],{"type":32,"tag":171,"props":222,"children":223},{"style":184},[224],{"type":37,"value":225},"    value: ",{"type":32,"tag":171,"props":227,"children":228},{"style":208},[229],{"type":37,"value":230},"\"JSON.parse(output).title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":232,"children":234},{"class":173,"line":233},5,[235],{"type":32,"tag":171,"props":236,"children":237},{"style":184},[238],{"type":37,"value":239},"  },\n",{"type":32,"tag":171,"props":241,"children":243},{"class":173,"line":242},6,[244],{"type":32,"tag":171,"props":245,"children":246},{"style":184},[247],{"type":37,"value":196},{"type":32,"tag":171,"props":249,"children":251},{"class":173,"line":250},7,[252,256,261],{"type":32,"tag":171,"props":253,"children":254},{"style":184},[255],{"type":37,"value":205},{"type":32,"tag":171,"props":257,"children":258},{"style":208},[259],{"type":37,"value":260},"\"is-json\"",{"type":32,"tag":171,"props":262,"children":263},{"style":184},[264],{"type":37,"value":216},{"type":32,"tag":171,"props":266,"children":267},{"class":173,"line":26},[268,272],{"type":32,"tag":171,"props":269,"children":270},{"style":184},[271],{"type":37,"value":225},{"type":32,"tag":171,"props":273,"children":275},{"style":274},"--shiki-default:#79B8FF",[276],{"type":37,"value":277},"true\n",{"type":32,"tag":171,"props":279,"children":281},{"class":173,"line":280},9,[282],{"type":32,"tag":171,"props":283,"children":284},{"style":184},[285],{"type":37,"value":286},"  }\n",{"type":32,"tag":171,"props":288,"children":290},{"class":173,"line":289},10,[291],{"type":32,"tag":171,"props":292,"children":293},{"style":184},[294],{"type":37,"value":295},"]\n",{"type":32,"tag":33,"props":297,"children":298},{},[299,304],{"type":32,"tag":123,"props":300,"children":301},{},[302],{"type":37,"value":303},"Semantics layer",{"type":37,"value":305}," — content quality:",{"type":32,"tag":84,"props":307,"children":308},{},[309,314,319],{"type":32,"tag":88,"props":310,"children":311},{},[312],{"type":37,"value":313},"Is the response on-topic? (embedding similarity, cosine distance > 0.85)",{"type":32,"tag":88,"props":315,"children":316},{},[317],{"type":37,"value":318},"Are forbidden words present? (regex, token filtering)",{"type":32,"tag":88,"props":320,"children":321},{},[322],{"type":37,"value":323},"Is the tone correct? (classifier model, sentiment score)",{"type":32,"tag":33,"props":325,"children":326},{},[327],{"type":37,"value":328},"In LangSmith, with a custom evaluator:",{"type":32,"tag":162,"props":330,"children":334},{"className":331,"code":332,"language":333,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","from langsmith import evaluate\n\ndef check_brand_compliance(run, example):\n    forbidden = [\"expert\", \"leader\", \"revolutionary\"]\n    output = run.outputs[\"text\"].lower()\n    violations = [w for w in forbidden if w in output]\n    return {\"score\": 0 if violations else 1, \"violations\": violations}\n\nevaluate(\n    dataset_name=\"marketing_blog_posts\",\n    evaluators=[check_brand_compliance]\n)\n","python",[335],{"type":32,"tag":62,"props":336,"children":337},{"__ignoreMap":16},[338,362,371,389,435,462,517,579,586,594,616,634],{"type":32,"tag":171,"props":339,"children":340},{"class":173,"line":174},[341,347,352,357],{"type":32,"tag":171,"props":342,"children":344},{"style":343},"--shiki-default:#F97583",[345],{"type":37,"value":346},"from",{"type":32,"tag":171,"props":348,"children":349},{"style":184},[350],{"type":37,"value":351}," langsmith ",{"type":32,"tag":171,"props":353,"children":354},{"style":343},[355],{"type":37,"value":356},"import",{"type":32,"tag":171,"props":358,"children":359},{"style":184},[360],{"type":37,"value":361}," evaluate\n",{"type":32,"tag":171,"props":363,"children":364},{"class":173,"line":190},[365],{"type":32,"tag":171,"props":366,"children":368},{"emptyLinePlaceholder":367},true,[369],{"type":37,"value":370},"\n",{"type":32,"tag":171,"props":372,"children":373},{"class":173,"line":199},[374,379,384],{"type":32,"tag":171,"props":375,"children":376},{"style":343},[377],{"type":37,"value":378},"def",{"type":32,"tag":171,"props":380,"children":381},{"style":178},[382],{"type":37,"value":383}," check_brand_compliance",{"type":32,"tag":171,"props":385,"children":386},{"style":184},[387],{"type":37,"value":388},"(run, example):\n",{"type":32,"tag":171,"props":390,"children":391},{"class":173,"line":219},[392,397,402,407,412,417,422,426,431],{"type":32,"tag":171,"props":393,"children":394},{"style":184},[395],{"type":37,"value":396},"    forbidden ",{"type":32,"tag":171,"props":398,"children":399},{"style":343},[400],{"type":37,"value":401},"=",{"type":32,"tag":171,"props":403,"children":404},{"style":184},[405],{"type":37,"value":406}," [",{"type":32,"tag":171,"props":408,"children":409},{"style":208},[410],{"type":37,"value":411},"\"expert\"",{"type":32,"tag":171,"props":413,"children":414},{"style":184},[415],{"type":37,"value":416},", ",{"type":32,"tag":171,"props":418,"children":419},{"style":208},[420],{"type":37,"value":421},"\"leader\"",{"type":32,"tag":171,"props":423,"children":424},{"style":184},[425],{"type":37,"value":416},{"type":32,"tag":171,"props":427,"children":428},{"style":208},[429],{"type":37,"value":430},"\"revolutionary\"",{"type":32,"tag":171,"props":432,"children":433},{"style":184},[434],{"type":37,"value":295},{"type":32,"tag":171,"props":436,"children":437},{"class":173,"line":233},[438,443,447,452,457],{"type":32,"tag":171,"props":439,"children":440},{"style":184},[441],{"type":37,"value":442},"    output ",{"type":32,"tag":171,"props":444,"children":445},{"style":343},[446],{"type":37,"value":401},{"type":32,"tag":171,"props":448,"children":449},{"style":184},[450],{"type":37,"value":451}," run.outputs[",{"type":32,"tag":171,"props":453,"children":454},{"style":208},[455],{"type":37,"value":456},"\"text\"",{"type":32,"tag":171,"props":458,"children":459},{"style":184},[460],{"type":37,"value":461},"].lower()\n",{"type":32,"tag":171,"props":463,"children":464},{"class":173,"line":242},[465,470,474,479,484,489,494,499,504,508,512],{"type":32,"tag":171,"props":466,"children":467},{"style":184},[468],{"type":37,"value":469},"    violations ",{"type":32,"tag":171,"props":471,"children":472},{"style":343},[473],{"type":37,"value":401},{"type":32,"tag":171,"props":475,"children":476},{"style":184},[477],{"type":37,"value":478}," [w ",{"type":32,"tag":171,"props":480,"children":481},{"style":343},[482],{"type":37,"value":483},"for",{"type":32,"tag":171,"props":485,"children":486},{"style":184},[487],{"type":37,"value":488}," w ",{"type":32,"tag":171,"props":490,"children":491},{"style":343},[492],{"type":37,"value":493},"in",{"type":32,"tag":171,"props":495,"children":496},{"style":184},[497],{"type":37,"value":498}," forbidden ",{"type":32,"tag":171,"props":500,"children":501},{"style":343},[502],{"type":37,"value":503},"if",{"type":32,"tag":171,"props":505,"children":506},{"style":184},[507],{"type":37,"value":488},{"type":32,"tag":171,"props":509,"children":510},{"style":343},[511],{"type":37,"value":493},{"type":32,"tag":171,"props":513,"children":514},{"style":184},[515],{"type":37,"value":516}," output]\n",{"type":32,"tag":171,"props":518,"children":519},{"class":173,"line":250},[520,525,530,535,540,545,550,555,560,565,569,574],{"type":32,"tag":171,"props":521,"children":522},{"style":343},[523],{"type":37,"value":524},"    return",{"type":32,"tag":171,"props":526,"children":527},{"style":184},[528],{"type":37,"value":529}," {",{"type":32,"tag":171,"props":531,"children":532},{"style":208},[533],{"type":37,"value":534},"\"score\"",{"type":32,"tag":171,"props":536,"children":537},{"style":184},[538],{"type":37,"value":539},": ",{"type":32,"tag":171,"props":541,"children":542},{"style":274},[543],{"type":37,"value":544},"0",{"type":32,"tag":171,"props":546,"children":547},{"style":343},[548],{"type":37,"value":549}," if",{"type":32,"tag":171,"props":551,"children":552},{"style":184},[553],{"type":37,"value":554}," violations ",{"type":32,"tag":171,"props":556,"children":557},{"style":343},[558],{"type":37,"value":559},"else",{"type":32,"tag":171,"props":561,"children":562},{"style":274},[563],{"type":37,"value":564}," 1",{"type":32,"tag":171,"props":566,"children":567},{"style":184},[568],{"type":37,"value":416},{"type":32,"tag":171,"props":570,"children":571},{"style":208},[572],{"type":37,"value":573},"\"violations\"",{"type":32,"tag":171,"props":575,"children":576},{"style":184},[577],{"type":37,"value":578},": violations}\n",{"type":32,"tag":171,"props":580,"children":581},{"class":173,"line":26},[582],{"type":32,"tag":171,"props":583,"children":584},{"emptyLinePlaceholder":367},[585],{"type":37,"value":370},{"type":32,"tag":171,"props":587,"children":588},{"class":173,"line":280},[589],{"type":32,"tag":171,"props":590,"children":591},{"style":184},[592],{"type":37,"value":593},"evaluate(\n",{"type":32,"tag":171,"props":595,"children":596},{"class":173,"line":289},[597,603,607,612],{"type":32,"tag":171,"props":598,"children":600},{"style":599},"--shiki-default:#FFAB70",[601],{"type":37,"value":602},"    dataset_name",{"type":32,"tag":171,"props":604,"children":605},{"style":343},[606],{"type":37,"value":401},{"type":32,"tag":171,"props":608,"children":609},{"style":208},[610],{"type":37,"value":611},"\"marketing_blog_posts\"",{"type":32,"tag":171,"props":613,"children":614},{"style":184},[615],{"type":37,"value":216},{"type":32,"tag":171,"props":617,"children":619},{"class":173,"line":618},11,[620,625,629],{"type":32,"tag":171,"props":621,"children":622},{"style":599},[623],{"type":37,"value":624},"    evaluators",{"type":32,"tag":171,"props":626,"children":627},{"style":343},[628],{"type":37,"value":401},{"type":32,"tag":171,"props":630,"children":631},{"style":184},[632],{"type":37,"value":633},"[check_brand_compliance]\n",{"type":32,"tag":171,"props":635,"children":637},{"class":173,"line":636},12,[638],{"type":32,"tag":171,"props":639,"children":640},{"style":184},[641],{"type":37,"value":642},")\n",{"type":32,"tag":33,"props":644,"children":645},{},[646,651],{"type":32,"tag":123,"props":647,"children":648},{},[649],{"type":37,"value":650},"Business outcome layer",{"type":37,"value":652}," — real impact:",{"type":32,"tag":84,"props":654,"children":655},{},[656,661,666],{"type":32,"tag":88,"props":657,"children":658},{},[659],{"type":37,"value":660},"Did CTR change?",{"type":32,"tag":88,"props":662,"children":663},{},[664],{"type":37,"value":665},"Did conversion drop?",{"type":32,"tag":88,"props":667,"children":668},{},[669],{"type":37,"value":670},"Did bounce rate increase?",{"type":32,"tag":33,"props":672,"children":673},{},[674],{"type":37,"value":675},"This layer connects to production telemetry — in a first-party data measurement system, the prompt version is added as metadata to event tracking, joined in BigQuery, and a dbt model calculates each version's conversion rate.",{"type":32,"tag":677,"props":678,"children":680},"h3",{"id":679},"promptfoo-building-a-deterministic-test-suite",[681],{"type":37,"value":682},"Promptfoo: Building a Deterministic Test Suite",{"type":32,"tag":33,"props":684,"children":685},{},[686],{"type":37,"value":687},"Promptfoo is a local-running, YAML-based evaluation framework. Its goal: validate every prompt change with regression tests before deployment.",{"type":32,"tag":33,"props":689,"children":690},{},[691],{"type":37,"value":692},"Simple config:",{"type":32,"tag":162,"props":694,"children":698},{"className":695,"code":696,"language":697,"meta":16,"style":16},"language-yaml shiki shiki-themes github-dark","prompts:\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n\nproviders:\n  - anthropic:messages:claude-3-5-sonnet-20241022\n\ntests:\n  - vars:\n      topic: \"Server-side GTM\"\n      category: \"tech\"\n    assert:\n      - type: is-json\n      - type: javascript\n        value: \"output.title.length \u003C= 60\"\n      - type: similar\n        value: \"server-side tracking architecture\"\n        threshold: 0.8\n      - type: not-contains\n        value: \"revolutionary\"\n","yaml",[699],{"type":32,"tag":62,"props":700,"children":701},{"__ignoreMap":16},[702,716,729,741,748,760,772,779,791,807,824,841,853,876,897,915,936,953,971,992],{"type":32,"tag":171,"props":703,"children":704},{"class":173,"line":174},[705,711],{"type":32,"tag":171,"props":706,"children":708},{"style":707},"--shiki-default:#85E89D",[709],{"type":37,"value":710},"prompts",{"type":32,"tag":171,"props":712,"children":713},{"style":184},[714],{"type":37,"value":715},":\n",{"type":32,"tag":171,"props":717,"children":718},{"class":173,"line":190},[719,724],{"type":32,"tag":171,"props":720,"children":721},{"style":184},[722],{"type":37,"value":723},"  - ",{"type":32,"tag":171,"props":725,"children":726},{"style":208},[727],{"type":37,"value":728},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n",{"type":32,"tag":171,"props":730,"children":731},{"class":173,"line":199},[732,736],{"type":32,"tag":171,"props":733,"children":734},{"style":184},[735],{"type":37,"value":723},{"type":32,"tag":171,"props":737,"children":738},{"style":208},[739],{"type":37,"value":740},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n",{"type":32,"tag":171,"props":742,"children":743},{"class":173,"line":219},[744],{"type":32,"tag":171,"props":745,"children":746},{"emptyLinePlaceholder":367},[747],{"type":37,"value":370},{"type":32,"tag":171,"props":749,"children":750},{"class":173,"line":233},[751,756],{"type":32,"tag":171,"props":752,"children":753},{"style":707},[754],{"type":37,"value":755},"providers",{"type":32,"tag":171,"props":757,"children":758},{"style":184},[759],{"type":37,"value":715},{"type":32,"tag":171,"props":761,"children":762},{"class":173,"line":242},[763,767],{"type":32,"tag":171,"props":764,"children":765},{"style":184},[766],{"type":37,"value":723},{"type":32,"tag":171,"props":768,"children":769},{"style":208},[770],{"type":37,"value":771},"anthropic:messages:claude-3-5-sonnet-20241022\n",{"type":32,"tag":171,"props":773,"children":774},{"class":173,"line":250},[775],{"type":32,"tag":171,"props":776,"children":777},{"emptyLinePlaceholder":367},[778],{"type":37,"value":370},{"type":32,"tag":171,"props":780,"children":781},{"class":173,"line":26},[782,787],{"type":32,"tag":171,"props":783,"children":784},{"style":707},[785],{"type":37,"value":786},"tests",{"type":32,"tag":171,"props":788,"children":789},{"style":184},[790],{"type":37,"value":715},{"type":32,"tag":171,"props":792,"children":793},{"class":173,"line":280},[794,798,803],{"type":32,"tag":171,"props":795,"children":796},{"style":184},[797],{"type":37,"value":723},{"type":32,"tag":171,"props":799,"children":800},{"style":707},[801],{"type":37,"value":802},"vars",{"type":32,"tag":171,"props":804,"children":805},{"style":184},[806],{"type":37,"value":715},{"type":32,"tag":171,"props":808,"children":809},{"class":173,"line":289},[810,815,819],{"type":32,"tag":171,"props":811,"children":812},{"style":707},[813],{"type":37,"value":814},"      topic",{"type":32,"tag":171,"props":816,"children":817},{"style":184},[818],{"type":37,"value":539},{"type":32,"tag":171,"props":820,"children":821},{"style":208},[822],{"type":37,"value":823},"\"Server-side GTM\"\n",{"type":32,"tag":171,"props":825,"children":826},{"class":173,"line":618},[827,832,836],{"type":32,"tag":171,"props":828,"children":829},{"style":707},[830],{"type":37,"value":831},"      category",{"type":32,"tag":171,"props":833,"children":834},{"style":184},[835],{"type":37,"value":539},{"type":32,"tag":171,"props":837,"children":838},{"style":208},[839],{"type":37,"value":840},"\"tech\"\n",{"type":32,"tag":171,"props":842,"children":843},{"class":173,"line":636},[844,849],{"type":32,"tag":171,"props":845,"children":846},{"style":707},[847],{"type":37,"value":848},"    assert",{"type":32,"tag":171,"props":850,"children":851},{"style":184},[852],{"type":37,"value":715},{"type":32,"tag":171,"props":854,"children":856},{"class":173,"line":855},13,[857,862,867,871],{"type":32,"tag":171,"props":858,"children":859},{"style":184},[860],{"type":37,"value":861},"      - ",{"type":32,"tag":171,"props":863,"children":864},{"style":707},[865],{"type":37,"value":866},"type",{"type":32,"tag":171,"props":868,"children":869},{"style":184},[870],{"type":37,"value":539},{"type":32,"tag":171,"props":872,"children":873},{"style":208},[874],{"type":37,"value":875},"is-json\n",{"type":32,"tag":171,"props":877,"children":879},{"class":173,"line":878},14,[880,884,888,892],{"type":32,"tag":171,"props":881,"children":882},{"style":184},[883],{"type":37,"value":861},{"type":32,"tag":171,"props":885,"children":886},{"style":707},[887],{"type":37,"value":866},{"type":32,"tag":171,"props":889,"children":890},{"style":184},[891],{"type":37,"value":539},{"type":32,"tag":171,"props":893,"children":894},{"style":208},[895],{"type":37,"value":896},"javascript\n",{"type":32,"tag":171,"props":898,"children":900},{"class":173,"line":899},15,[901,906,910],{"type":32,"tag":171,"props":902,"children":903},{"style":707},[904],{"type":37,"value":905},"        value",{"type":32,"tag":171,"props":907,"children":908},{"style":184},[909],{"type":37,"value":539},{"type":32,"tag":171,"props":911,"children":912},{"style":208},[913],{"type":37,"value":914},"\"output.title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":916,"children":918},{"class":173,"line":917},16,[919,923,927,931],{"type":32,"tag":171,"props":920,"children":921},{"style":184},[922],{"type":37,"value":861},{"type":32,"tag":171,"props":924,"children":925},{"style":707},[926],{"type":37,"value":866},{"type":32,"tag":171,"props":928,"children":929},{"style":184},[930],{"type":37,"value":539},{"type":32,"tag":171,"props":932,"children":933},{"style":208},[934],{"type":37,"value":935},"similar\n",{"type":32,"tag":171,"props":937,"children":939},{"class":173,"line":938},17,[940,944,948],{"type":32,"tag":171,"props":941,"children":942},{"style":707},[943],{"type":37,"value":905},{"type":32,"tag":171,"props":945,"children":946},{"style":184},[947],{"type":37,"value":539},{"type":32,"tag":171,"props":949,"children":950},{"style":208},[951],{"type":37,"value":952},"\"server-side tracking architecture\"\n",{"type":32,"tag":171,"props":954,"children":956},{"class":173,"line":955},18,[957,962,966],{"type":32,"tag":171,"props":958,"children":959},{"style":707},[960],{"type":37,"value":961},"        threshold",{"type":32,"tag":171,"props":963,"children":964},{"style":184},[965],{"type":37,"value":539},{"type":32,"tag":171,"props":967,"children":968},{"style":274},[969],{"type":37,"value":970},"0.8\n",{"type":32,"tag":171,"props":972,"children":974},{"class":173,"line":973},19,[975,979,983,987],{"type":32,"tag":171,"props":976,"children":977},{"style":184},[978],{"type":37,"value":861},{"type":32,"tag":171,"props":980,"children":981},{"style":707},[982],{"type":37,"value":866},{"type":32,"tag":171,"props":984,"children":985},{"style":184},[986],{"type":37,"value":539},{"type":32,"tag":171,"props":988,"children":989},{"style":208},[990],{"type":37,"value":991},"not-contains\n",{"type":32,"tag":171,"props":993,"children":995},{"class":173,"line":994},20,[996,1000,1004],{"type":32,"tag":171,"props":997,"children":998},{"style":707},[999],{"type":37,"value":905},{"type":32,"tag":171,"props":1001,"children":1002},{"style":184},[1003],{"type":37,"value":539},{"type":32,"tag":171,"props":1005,"children":1006},{"style":208},[1007],{"type":37,"value":1008},"\"revolutionary\"\n",{"type":32,"tag":33,"props":1010,"children":1011},{},[1012,1014,1020],{"type":37,"value":1013},"Run ",{"type":32,"tag":62,"props":1015,"children":1017},{"className":1016},[],[1018],{"type":37,"value":1019},"promptfoo eval",{"type":37,"value":1021},", all variants are tested, metric table returned:",{"type":32,"tag":1023,"props":1024,"children":1025},"table",{},[1026,1055],{"type":32,"tag":1027,"props":1028,"children":1029},"thead",{},[1030],{"type":32,"tag":1031,"props":1032,"children":1033},"tr",{},[1034,1040,1045,1050],{"type":32,"tag":1035,"props":1036,"children":1037},"th",{},[1038],{"type":37,"value":1039},"Prompt",{"type":32,"tag":1035,"props":1041,"children":1042},{},[1043],{"type":37,"value":1044},"Pass Rate",{"type":32,"tag":1035,"props":1046,"children":1047},{},[1048],{"type":37,"value":1049},"Avg Latency",{"type":32,"tag":1035,"props":1051,"children":1052},{},[1053],{"type":37,"value":1054},"Cost",{"type":32,"tag":1056,"props":1057,"children":1058},"tbody",{},[1059,1083],{"type":32,"tag":1031,"props":1060,"children":1061},{},[1062,1068,1073,1078],{"type":32,"tag":1063,"props":1064,"children":1065},"td",{},[1066],{"type":37,"value":1067},"v1",{"type":32,"tag":1063,"props":1069,"children":1070},{},[1071],{"type":37,"value":1072},"92%",{"type":32,"tag":1063,"props":1074,"children":1075},{},[1076],{"type":37,"value":1077},"2.3s",{"type":32,"tag":1063,"props":1079,"children":1080},{},[1081],{"type":37,"value":1082},"$0.012",{"type":32,"tag":1031,"props":1084,"children":1085},{},[1086,1091,1096,1101],{"type":32,"tag":1063,"props":1087,"children":1088},{},[1089],{"type":37,"value":1090},"v2",{"type":32,"tag":1063,"props":1092,"children":1093},{},[1094],{"type":37,"value":1095},"98%",{"type":32,"tag":1063,"props":1097,"children":1098},{},[1099],{"type":37,"value":1100},"2.1s",{"type":32,"tag":1063,"props":1102,"children":1103},{},[1104],{"type":37,"value":1105},"$0.014",{"type":32,"tag":33,"props":1107,"children":1108},{},[1109],{"type":37,"value":1110},"v2 has better pass rate but 17% higher cost — token count is increasing. Without seeing this tradeoff, you'd deploy and monthly spend would spike.",{"type":32,"tag":45,"props":1112,"children":1114},{"id":1113},"ab-testing-comparing-prompt-variants-in-production",[1115],{"type":37,"value":1116},"A\u002FB Testing: Comparing Prompt Variants in Production",{"type":32,"tag":33,"props":1118,"children":1119},{},[1120],{"type":37,"value":1121},"Evaluation suite turned green, now you need real traffic. A\u002FB testing in LLM systems works like this:",{"type":32,"tag":1123,"props":1124,"children":1125},"ol",{},[1126,1136,1154,1164],{"type":32,"tag":88,"props":1127,"children":1128},{},[1129,1134],{"type":32,"tag":123,"props":1130,"children":1131},{},[1132],{"type":37,"value":1133},"Variant routing",{"type":37,"value":1135}," — pick prompt version by user\u002Fsession ID (% split)",{"type":32,"tag":88,"props":1137,"children":1138},{},[1139,1144,1146,1152],{"type":32,"tag":123,"props":1140,"children":1141},{},[1142],{"type":37,"value":1143},"Metadata tagging",{"type":37,"value":1145}," — add ",{"type":32,"tag":62,"props":1147,"children":1149},{"className":1148},[],[1150],{"type":37,"value":1151},"prompt_version",{"type":37,"value":1153}," to each API call",{"type":32,"tag":88,"props":1155,"children":1156},{},[1157,1162],{"type":32,"tag":123,"props":1158,"children":1159},{},[1160],{"type":37,"value":1161},"Metric tracking",{"type":37,"value":1163}," — keep variant info in downstream events",{"type":32,"tag":88,"props":1165,"children":1166},{},[1167,1172],{"type":32,"tag":123,"props":1168,"children":1169},{},[1170],{"type":37,"value":1171},"Statistical significance",{"type":37,"value":1173}," — once enough samples collected (min 385 observations per variant, 95% confidence), decide",{"type":32,"tag":33,"props":1175,"children":1176},{},[1177],{"type":37,"value":1178},"n8n workflow example:",{"type":32,"tag":162,"props":1180,"children":1182},{"className":164,"code":1181,"language":158,"meta":16,"style":16},"\u002F\u002F A\u002FB variant selection\nconst userId = $json.user_id;\nconst variant = (userId % 100 \u003C 50) ? 'v1' : 'v2';\nconst promptUrl = `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${variant}.md`;\n\n\u002F\u002F Add metadata to API call\nreturn {\n  json: {\n    prompt: await fetch(promptUrl).then(r => r.text()),\n    metadata: {\n      prompt_version: variant,\n      experiment_id: 'blog_tone_test_2026_05'\n    }\n  }\n};\n",[1183],{"type":32,"tag":62,"props":1184,"children":1185},{"__ignoreMap":16},[1186,1195,1218,1289,1324,1331,1339,1352,1360,1417,1425,1433,1446,1454,1461],{"type":32,"tag":171,"props":1187,"children":1188},{"class":173,"line":174},[1189],{"type":32,"tag":171,"props":1190,"children":1192},{"style":1191},"--shiki-default:#6A737D",[1193],{"type":37,"value":1194},"\u002F\u002F A\u002FB variant selection\n",{"type":32,"tag":171,"props":1196,"children":1197},{"class":173,"line":190},[1198,1203,1208,1213],{"type":32,"tag":171,"props":1199,"children":1200},{"style":343},[1201],{"type":37,"value":1202},"const",{"type":32,"tag":171,"props":1204,"children":1205},{"style":274},[1206],{"type":37,"value":1207}," userId",{"type":32,"tag":171,"props":1209,"children":1210},{"style":343},[1211],{"type":37,"value":1212}," =",{"type":32,"tag":171,"props":1214,"children":1215},{"style":184},[1216],{"type":37,"value":1217}," $json.user_id;\n",{"type":32,"tag":171,"props":1219,"children":1220},{"class":173,"line":199},[1221,1225,1230,1234,1239,1244,1249,1254,1259,1264,1269,1274,1279,1284],{"type":32,"tag":171,"props":1222,"children":1223},{"style":343},[1224],{"type":37,"value":1202},{"type":32,"tag":171,"props":1226,"children":1227},{"style":274},[1228],{"type":37,"value":1229}," variant",{"type":32,"tag":171,"props":1231,"children":1232},{"style":343},[1233],{"type":37,"value":1212},{"type":32,"tag":171,"props":1235,"children":1236},{"style":184},[1237],{"type":37,"value":1238}," (userId ",{"type":32,"tag":171,"props":1240,"children":1241},{"style":343},[1242],{"type":37,"value":1243},"%",{"type":32,"tag":171,"props":1245,"children":1246},{"style":274},[1247],{"type":37,"value":1248}," 100",{"type":32,"tag":171,"props":1250,"children":1251},{"style":343},[1252],{"type":37,"value":1253}," \u003C",{"type":32,"tag":171,"props":1255,"children":1256},{"style":274},[1257],{"type":37,"value":1258}," 50",{"type":32,"tag":171,"props":1260,"children":1261},{"style":184},[1262],{"type":37,"value":1263},") ",{"type":32,"tag":171,"props":1265,"children":1266},{"style":343},[1267],{"type":37,"value":1268},"?",{"type":32,"tag":171,"props":1270,"children":1271},{"style":208},[1272],{"type":37,"value":1273}," 'v1'",{"type":32,"tag":171,"props":1275,"children":1276},{"style":343},[1277],{"type":37,"value":1278}," :",{"type":32,"tag":171,"props":1280,"children":1281},{"style":208},[1282],{"type":37,"value":1283}," 'v2'",{"type":32,"tag":171,"props":1285,"children":1286},{"style":184},[1287],{"type":37,"value":1288},";\n",{"type":32,"tag":171,"props":1290,"children":1291},{"class":173,"line":219},[1292,1296,1301,1305,1310,1315,1320],{"type":32,"tag":171,"props":1293,"children":1294},{"style":343},[1295],{"type":37,"value":1202},{"type":32,"tag":171,"props":1297,"children":1298},{"style":274},[1299],{"type":37,"value":1300}," promptUrl",{"type":32,"tag":171,"props":1302,"children":1303},{"style":343},[1304],{"type":37,"value":1212},{"type":32,"tag":171,"props":1306,"children":1307},{"style":208},[1308],{"type":37,"value":1309}," `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${",{"type":32,"tag":171,"props":1311,"children":1312},{"style":184},[1313],{"type":37,"value":1314},"variant",{"type":32,"tag":171,"props":1316,"children":1317},{"style":208},[1318],{"type":37,"value":1319},"}.md`",{"type":32,"tag":171,"props":1321,"children":1322},{"style":184},[1323],{"type":37,"value":1288},{"type":32,"tag":171,"props":1325,"children":1326},{"class":173,"line":233},[1327],{"type":32,"tag":171,"props":1328,"children":1329},{"emptyLinePlaceholder":367},[1330],{"type":37,"value":370},{"type":32,"tag":171,"props":1332,"children":1333},{"class":173,"line":242},[1334],{"type":32,"tag":171,"props":1335,"children":1336},{"style":1191},[1337],{"type":37,"value":1338},"\u002F\u002F Add metadata to API call\n",{"type":32,"tag":171,"props":1340,"children":1341},{"class":173,"line":250},[1342,1347],{"type":32,"tag":171,"props":1343,"children":1344},{"style":343},[1345],{"type":37,"value":1346},"return",{"type":32,"tag":171,"props":1348,"children":1349},{"style":184},[1350],{"type":37,"value":1351}," {\n",{"type":32,"tag":171,"props":1353,"children":1354},{"class":173,"line":26},[1355],{"type":32,"tag":171,"props":1356,"children":1357},{"style":184},[1358],{"type":37,"value":1359},"  json: {\n",{"type":32,"tag":171,"props":1361,"children":1362},{"class":173,"line":280},[1363,1368,1373,1378,1383,1388,1393,1398,1403,1408,1412],{"type":32,"tag":171,"props":1364,"children":1365},{"style":184},[1366],{"type":37,"value":1367},"    prompt: ",{"type":32,"tag":171,"props":1369,"children":1370},{"style":343},[1371],{"type":37,"value":1372},"await",{"type":32,"tag":171,"props":1374,"children":1375},{"style":178},[1376],{"type":37,"value":1377}," fetch",{"type":32,"tag":171,"props":1379,"children":1380},{"style":184},[1381],{"type":37,"value":1382},"(promptUrl).",{"type":32,"tag":171,"props":1384,"children":1385},{"style":178},[1386],{"type":37,"value":1387},"then",{"type":32,"tag":171,"props":1389,"children":1390},{"style":184},[1391],{"type":37,"value":1392},"(",{"type":32,"tag":171,"props":1394,"children":1395},{"style":599},[1396],{"type":37,"value":1397},"r",{"type":32,"tag":171,"props":1399,"children":1400},{"style":343},[1401],{"type":37,"value":1402}," =>",{"type":32,"tag":171,"props":1404,"children":1405},{"style":184},[1406],{"type":37,"value":1407}," r.",{"type":32,"tag":171,"props":1409,"children":1410},{"style":178},[1411],{"type":37,"value":37},{"type":32,"tag":171,"props":1413,"children":1414},{"style":184},[1415],{"type":37,"value":1416},"()),\n",{"type":32,"tag":171,"props":1418,"children":1419},{"class":173,"line":289},[1420],{"type":32,"tag":171,"props":1421,"children":1422},{"style":184},[1423],{"type":37,"value":1424},"    metadata: {\n",{"type":32,"tag":171,"props":1426,"children":1427},{"class":173,"line":618},[1428],{"type":32,"tag":171,"props":1429,"children":1430},{"style":184},[1431],{"type":37,"value":1432},"      prompt_version: variant,\n",{"type":32,"tag":171,"props":1434,"children":1435},{"class":173,"line":636},[1436,1441],{"type":32,"tag":171,"props":1437,"children":1438},{"style":184},[1439],{"type":37,"value":1440},"      experiment_id: ",{"type":32,"tag":171,"props":1442,"children":1443},{"style":208},[1444],{"type":37,"value":1445},"'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1447,"children":1448},{"class":173,"line":855},[1449],{"type":32,"tag":171,"props":1450,"children":1451},{"style":184},[1452],{"type":37,"value":1453},"    }\n",{"type":32,"tag":171,"props":1455,"children":1456},{"class":173,"line":878},[1457],{"type":32,"tag":171,"props":1458,"children":1459},{"style":184},[1460],{"type":37,"value":286},{"type":32,"tag":171,"props":1462,"children":1463},{"class":173,"line":899},[1464],{"type":32,"tag":171,"props":1465,"children":1466},{"style":184},[1467],{"type":37,"value":1468},"};\n",{"type":32,"tag":33,"props":1470,"children":1471},{},[1472],{"type":37,"value":1473},"Analysis in BigQuery:",{"type":32,"tag":162,"props":1475,"children":1479},{"className":1476,"code":1477,"language":1478,"meta":16,"style":16},"language-sql shiki shiki-themes github-dark","SELECT\n  metadata.value:prompt_version AS variant,\n  COUNT(DISTINCT user_id) AS users,\n  AVG(session_duration_sec) AS avg_duration,\n  SUM(conversion) \u002F COUNT(*) AS cvr\nFROM events\nWHERE experiment_id = 'blog_tone_test_2026_05'\n  AND event_date >= '2026-05-01'\nGROUP BY 1\n","sql",[1480],{"type":32,"tag":62,"props":1481,"children":1482},{"__ignoreMap":16},[1483,1491,1524,1555,1577,1622,1635,1657,1680],{"type":32,"tag":171,"props":1484,"children":1485},{"class":173,"line":174},[1486],{"type":32,"tag":171,"props":1487,"children":1488},{"style":343},[1489],{"type":37,"value":1490},"SELECT\n",{"type":32,"tag":171,"props":1492,"children":1493},{"class":173,"line":190},[1494,1499,1504,1509,1514,1519],{"type":32,"tag":171,"props":1495,"children":1496},{"style":274},[1497],{"type":37,"value":1498},"  metadata",{"type":32,"tag":171,"props":1500,"children":1501},{"style":184},[1502],{"type":37,"value":1503},".",{"type":32,"tag":171,"props":1505,"children":1506},{"style":274},[1507],{"type":37,"value":1508},"value",{"type":32,"tag":171,"props":1510,"children":1511},{"style":184},[1512],{"type":37,"value":1513},":prompt_version ",{"type":32,"tag":171,"props":1515,"children":1516},{"style":343},[1517],{"type":37,"value":1518},"AS",{"type":32,"tag":171,"props":1520,"children":1521},{"style":184},[1522],{"type":37,"value":1523}," variant,\n",{"type":32,"tag":171,"props":1525,"children":1526},{"class":173,"line":199},[1527,1532,1536,1541,1546,1550],{"type":32,"tag":171,"props":1528,"children":1529},{"style":274},[1530],{"type":37,"value":1531},"  COUNT",{"type":32,"tag":171,"props":1533,"children":1534},{"style":184},[1535],{"type":37,"value":1392},{"type":32,"tag":171,"props":1537,"children":1538},{"style":343},[1539],{"type":37,"value":1540},"DISTINCT",{"type":32,"tag":171,"props":1542,"children":1543},{"style":184},[1544],{"type":37,"value":1545}," user_id) ",{"type":32,"tag":171,"props":1547,"children":1548},{"style":343},[1549],{"type":37,"value":1518},{"type":32,"tag":171,"props":1551,"children":1552},{"style":184},[1553],{"type":37,"value":1554}," users,\n",{"type":32,"tag":171,"props":1556,"children":1557},{"class":173,"line":219},[1558,1563,1568,1572],{"type":32,"tag":171,"props":1559,"children":1560},{"style":274},[1561],{"type":37,"value":1562},"  AVG",{"type":32,"tag":171,"props":1564,"children":1565},{"style":184},[1566],{"type":37,"value":1567},"(session_duration_sec) ",{"type":32,"tag":171,"props":1569,"children":1570},{"style":343},[1571],{"type":37,"value":1518},{"type":32,"tag":171,"props":1573,"children":1574},{"style":184},[1575],{"type":37,"value":1576}," avg_duration,\n",{"type":32,"tag":171,"props":1578,"children":1579},{"class":173,"line":233},[1580,1585,1590,1595,1600,1604,1609,1613,1617],{"type":32,"tag":171,"props":1581,"children":1582},{"style":274},[1583],{"type":37,"value":1584},"  SUM",{"type":32,"tag":171,"props":1586,"children":1587},{"style":184},[1588],{"type":37,"value":1589},"(conversion) ",{"type":32,"tag":171,"props":1591,"children":1592},{"style":343},[1593],{"type":37,"value":1594},"\u002F",{"type":32,"tag":171,"props":1596,"children":1597},{"style":274},[1598],{"type":37,"value":1599}," COUNT",{"type":32,"tag":171,"props":1601,"children":1602},{"style":184},[1603],{"type":37,"value":1392},{"type":32,"tag":171,"props":1605,"children":1606},{"style":343},[1607],{"type":37,"value":1608},"*",{"type":32,"tag":171,"props":1610,"children":1611},{"style":184},[1612],{"type":37,"value":1263},{"type":32,"tag":171,"props":1614,"children":1615},{"style":343},[1616],{"type":37,"value":1518},{"type":32,"tag":171,"props":1618,"children":1619},{"style":184},[1620],{"type":37,"value":1621}," cvr\n",{"type":32,"tag":171,"props":1623,"children":1624},{"class":173,"line":242},[1625,1630],{"type":32,"tag":171,"props":1626,"children":1627},{"style":343},[1628],{"type":37,"value":1629},"FROM",{"type":32,"tag":171,"props":1631,"children":1632},{"style":184},[1633],{"type":37,"value":1634}," events\n",{"type":32,"tag":171,"props":1636,"children":1637},{"class":173,"line":250},[1638,1643,1648,1652],{"type":32,"tag":171,"props":1639,"children":1640},{"style":343},[1641],{"type":37,"value":1642},"WHERE",{"type":32,"tag":171,"props":1644,"children":1645},{"style":184},[1646],{"type":37,"value":1647}," experiment_id ",{"type":32,"tag":171,"props":1649,"children":1650},{"style":343},[1651],{"type":37,"value":401},{"type":32,"tag":171,"props":1653,"children":1654},{"style":208},[1655],{"type":37,"value":1656}," 'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1658,"children":1659},{"class":173,"line":26},[1660,1665,1670,1675],{"type":32,"tag":171,"props":1661,"children":1662},{"style":343},[1663],{"type":37,"value":1664},"  AND",{"type":32,"tag":171,"props":1666,"children":1667},{"style":184},[1668],{"type":37,"value":1669}," event_date ",{"type":32,"tag":171,"props":1671,"children":1672},{"style":343},[1673],{"type":37,"value":1674},">=",{"type":32,"tag":171,"props":1676,"children":1677},{"style":208},[1678],{"type":37,"value":1679}," '2026-05-01'\n",{"type":32,"tag":171,"props":1681,"children":1682},{"class":173,"line":280},[1683,1688],{"type":32,"tag":171,"props":1684,"children":1685},{"style":343},[1686],{"type":37,"value":1687},"GROUP BY",{"type":32,"tag":171,"props":1689,"children":1690},{"style":274},[1691],{"type":37,"value":1692}," 1\n",{"type":32,"tag":33,"props":1694,"children":1695},{},[1696],{"type":37,"value":1697},"Result: v2 variant increased CVR from 0.042 to 0.051 (+21%), p-value 0.003 — confidently move to production.",{"type":32,"tag":45,"props":1699,"children":1701},{"id":1700},"langsmith-observability-and-long-term-regression-detection",[1702],{"type":37,"value":1703},"LangSmith: Observability and Long-Term Regression Detection",{"type":32,"tag":33,"props":1705,"children":1706},{},[1707],{"type":37,"value":1708},"Promptfoo is local testing, LangSmith is production observability. Every LLM call is traced: input, output, latency, token count, model version, prompt version.",{"type":32,"tag":33,"props":1710,"children":1711},{},[1712,1714,1719],{"type":37,"value":1713},"LangSmith's strength is ",{"type":32,"tag":123,"props":1715,"children":1716},{},[1717],{"type":37,"value":1718},"long-term metric tracking",{"type":37,"value":1720},". A bug in a prompt version from three months ago is discovered by feedback today — go back to the trace, see the input\u002Foutput diff, find which version that was, rollback.",{"type":32,"tag":33,"props":1722,"children":1723},{},[1724],{"type":37,"value":1725},"Example trace:",{"type":32,"tag":162,"props":1727,"children":1731},{"className":1728,"code":1729,"language":1730,"meta":16,"style":16},"language-json shiki shiki-themes github-dark","{\n  \"run_id\": \"abc123\",\n  \"prompt_version\": \"v2.1\",\n  \"model\": \"claude-3-5-sonnet-20241022\",\n  \"input\": {\"topic\": \"Server-side GTM\", \"category\": \"tech\"},\n  \"output\": \"---\\ntitle: \\\"Server-Side GTM...\\\"\",\n  \"latency_ms\": 2341,\n  \"tokens\": {\"input\": 1842, \"output\": 1523},\n  \"cost_usd\": 0.0137,\n  \"feedback\": {\"score\": 4, \"comment\": \"title too long\"}\n}\n","json",[1732],{"type":32,"tag":62,"props":1733,"children":1734},{"__ignoreMap":16},[1735,1743,1764,1785,1806,1856,1906,1927,1975,1996,2044],{"type":32,"tag":171,"props":1736,"children":1737},{"class":173,"line":174},[1738],{"type":32,"tag":171,"props":1739,"children":1740},{"style":184},[1741],{"type":37,"value":1742},"{\n",{"type":32,"tag":171,"props":1744,"children":1745},{"class":173,"line":190},[1746,1751,1755,1760],{"type":32,"tag":171,"props":1747,"children":1748},{"style":274},[1749],{"type":37,"value":1750},"  \"run_id\"",{"type":32,"tag":171,"props":1752,"children":1753},{"style":184},[1754],{"type":37,"value":539},{"type":32,"tag":171,"props":1756,"children":1757},{"style":208},[1758],{"type":37,"value":1759},"\"abc123\"",{"type":32,"tag":171,"props":1761,"children":1762},{"style":184},[1763],{"type":37,"value":216},{"type":32,"tag":171,"props":1765,"children":1766},{"class":173,"line":199},[1767,1772,1776,1781],{"type":32,"tag":171,"props":1768,"children":1769},{"style":274},[1770],{"type":37,"value":1771},"  \"prompt_version\"",{"type":32,"tag":171,"props":1773,"children":1774},{"style":184},[1775],{"type":37,"value":539},{"type":32,"tag":171,"props":1777,"children":1778},{"style":208},[1779],{"type":37,"value":1780},"\"v2.1\"",{"type":32,"tag":171,"props":1782,"children":1783},{"style":184},[1784],{"type":37,"value":216},{"type":32,"tag":171,"props":1786,"children":1787},{"class":173,"line":219},[1788,1793,1797,1802],{"type":32,"tag":171,"props":1789,"children":1790},{"style":274},[1791],{"type":37,"value":1792},"  \"model\"",{"type":32,"tag":171,"props":1794,"children":1795},{"style":184},[1796],{"type":37,"value":539},{"type":32,"tag":171,"props":1798,"children":1799},{"style":208},[1800],{"type":37,"value":1801},"\"claude-3-5-sonnet-20241022\"",{"type":32,"tag":171,"props":1803,"children":1804},{"style":184},[1805],{"type":37,"value":216},{"type":32,"tag":171,"props":1807,"children":1808},{"class":173,"line":233},[1809,1814,1819,1824,1828,1833,1837,1842,1846,1851],{"type":32,"tag":171,"props":1810,"children":1811},{"style":274},[1812],{"type":37,"value":1813},"  \"input\"",{"type":32,"tag":171,"props":1815,"children":1816},{"style":184},[1817],{"type":37,"value":1818},": {",{"type":32,"tag":171,"props":1820,"children":1821},{"style":274},[1822],{"type":37,"value":1823},"\"topic\"",{"type":32,"tag":171,"props":1825,"children":1826},{"style":184},[1827],{"type":37,"value":539},{"type":32,"tag":171,"props":1829,"children":1830},{"style":208},[1831],{"type":37,"value":1832},"\"Server-side GTM\"",{"type":32,"tag":171,"props":1834,"children":1835},{"style":184},[1836],{"type":37,"value":416},{"type":32,"tag":171,"props":1838,"children":1839},{"style":274},[1840],{"type":37,"value":1841},"\"category\"",{"type":32,"tag":171,"props":1843,"children":1844},{"style":184},[1845],{"type":37,"value":539},{"type":32,"tag":171,"props":1847,"children":1848},{"style":208},[1849],{"type":37,"value":1850},"\"tech\"",{"type":32,"tag":171,"props":1852,"children":1853},{"style":184},[1854],{"type":37,"value":1855},"},\n",{"type":32,"tag":171,"props":1857,"children":1858},{"class":173,"line":242},[1859,1864,1868,1873,1878,1883,1888,1893,1897,1902],{"type":32,"tag":171,"props":1860,"children":1861},{"style":274},[1862],{"type":37,"value":1863},"  \"output\"",{"type":32,"tag":171,"props":1865,"children":1866},{"style":184},[1867],{"type":37,"value":539},{"type":32,"tag":171,"props":1869,"children":1870},{"style":208},[1871],{"type":37,"value":1872},"\"---",{"type":32,"tag":171,"props":1874,"children":1875},{"style":274},[1876],{"type":37,"value":1877},"\\n",{"type":32,"tag":171,"props":1879,"children":1880},{"style":208},[1881],{"type":37,"value":1882},"title: ",{"type":32,"tag":171,"props":1884,"children":1885},{"style":274},[1886],{"type":37,"value":1887},"\\\"",{"type":32,"tag":171,"props":1889,"children":1890},{"style":208},[1891],{"type":37,"value":1892},"Server-Side GTM...",{"type":32,"tag":171,"props":1894,"children":1895},{"style":274},[1896],{"type":37,"value":1887},{"type":32,"tag":171,"props":1898,"children":1899},{"style":208},[1900],{"type":37,"value":1901},"\"",{"type":32,"tag":171,"props":1903,"children":1904},{"style":184},[1905],{"type":37,"value":216},{"type":32,"tag":171,"props":1907,"children":1908},{"class":173,"line":250},[1909,1914,1918,1923],{"type":32,"tag":171,"props":1910,"children":1911},{"style":274},[1912],{"type":37,"value":1913},"  \"latency_ms\"",{"type":32,"tag":171,"props":1915,"children":1916},{"style":184},[1917],{"type":37,"value":539},{"type":32,"tag":171,"props":1919,"children":1920},{"style":274},[1921],{"type":37,"value":1922},"2341",{"type":32,"tag":171,"props":1924,"children":1925},{"style":184},[1926],{"type":37,"value":216},{"type":32,"tag":171,"props":1928,"children":1929},{"class":173,"line":26},[1930,1935,1939,1944,1948,1953,1957,1962,1966,1971],{"type":32,"tag":171,"props":1931,"children":1932},{"style":274},[1933],{"type":37,"value":1934},"  \"tokens\"",{"type":32,"tag":171,"props":1936,"children":1937},{"style":184},[1938],{"type":37,"value":1818},{"type":32,"tag":171,"props":1940,"children":1941},{"style":274},[1942],{"type":37,"value":1943},"\"input\"",{"type":32,"tag":171,"props":1945,"children":1946},{"style":184},[1947],{"type":37,"value":539},{"type":32,"tag":171,"props":1949,"children":1950},{"style":274},[1951],{"type":37,"value":1952},"1842",{"type":32,"tag":171,"props":1954,"children":1955},{"style":184},[1956],{"type":37,"value":416},{"type":32,"tag":171,"props":1958,"children":1959},{"style":274},[1960],{"type":37,"value":1961},"\"output\"",{"type":32,"tag":171,"props":1963,"children":1964},{"style":184},[1965],{"type":37,"value":539},{"type":32,"tag":171,"props":1967,"children":1968},{"style":274},[1969],{"type":37,"value":1970},"1523",{"type":32,"tag":171,"props":1972,"children":1973},{"style":184},[1974],{"type":37,"value":1855},{"type":32,"tag":171,"props":1976,"children":1977},{"class":173,"line":280},[1978,1983,1987,1992],{"type":32,"tag":171,"props":1979,"children":1980},{"style":274},[1981],{"type":37,"value":1982},"  \"cost_usd\"",{"type":32,"tag":171,"props":1984,"children":1985},{"style":184},[1986],{"type":37,"value":539},{"type":32,"tag":171,"props":1988,"children":1989},{"style":274},[1990],{"type":37,"value":1991},"0.0137",{"type":32,"tag":171,"props":1993,"children":1994},{"style":184},[1995],{"type":37,"value":216},{"type":32,"tag":171,"props":1997,"children":1998},{"class":173,"line":289},[1999,2004,2008,2012,2016,2021,2025,2030,2034,2039],{"type":32,"tag":171,"props":2000,"children":2001},{"style":274},[2002],{"type":37,"value":2003},"  \"feedback\"",{"type":32,"tag":171,"props":2005,"children":2006},{"style":184},[2007],{"type":37,"value":1818},{"type":32,"tag":171,"props":2009,"children":2010},{"style":274},[2011],{"type":37,"value":534},{"type":32,"tag":171,"props":2013,"children":2014},{"style":184},[2015],{"type":37,"value":539},{"type":32,"tag":171,"props":2017,"children":2018},{"style":274},[2019],{"type":37,"value":2020},"4",{"type":32,"tag":171,"props":2022,"children":2023},{"style":184},[2024],{"type":37,"value":416},{"type":32,"tag":171,"props":2026,"children":2027},{"style":274},[2028],{"type":37,"value":2029},"\"comment\"",{"type":32,"tag":171,"props":2031,"children":2032},{"style":184},[2033],{"type":37,"value":539},{"type":32,"tag":171,"props":2035,"children":2036},{"style":208},[2037],{"type":37,"value":2038},"\"title too long\"",{"type":32,"tag":171,"props":2040,"children":2041},{"style":184},[2042],{"type":37,"value":2043},"}\n",{"type":32,"tag":171,"props":2045,"children":2046},{"class":173,"line":618},[2047],{"type":32,"tag":171,"props":2048,"children":2049},{"style":184},[2050],{"type":37,"value":2043},{"type":32,"tag":33,"props":2052,"children":2053},{},[2054],{"type":37,"value":2055},"Feedback loop: editors score each blog post 1-5, LangSmith links these scores to traces, weekly report alerts \"v2.3 version dropped to avg score 3.2.\" Immediately rollback, see the prompt diff, find the problem, fix it.",{"type":32,"tag":677,"props":2057,"children":2059},{"id":2058},"dataset-management-keeping-the-golden-set-under-version-control",[2060],{"type":37,"value":2061},"Dataset Management: Keeping the Golden Set Under Version Control",{"type":32,"tag":33,"props":2063,"children":2064},{},[2065,2067,2072],{"type":37,"value":2066},"The heart of an eval pipeline is the ",{"type":32,"tag":123,"props":2068,"children":2069},{},[2070],{"type":37,"value":2071},"golden dataset",{"type":37,"value":2073}," — known input\u002Foutput pairs, reference for expected behavior. Keeping this dataset in Notion, updating it manually in Google Sheets, is a regression risk.",{"type":32,"tag":33,"props":2075,"children":2076},{},[2077],{"type":37,"value":2078},"LangSmith dataset under version control:",{"type":32,"tag":162,"props":2080,"children":2082},{"className":331,"code":2081,"language":333,"meta":16,"style":16},"from langsmith import Client\n\nclient = Client()\n\ndataset = client.create_dataset(\"marketing_blog_golden_v3\")\n\n# Add golden examples\nexamples = [\n    {\n        \"inputs\": {\"topic\": \"Server-side GTM\", \"category\": \"tech\"},\n        \"outputs\": {\"title\": \"Server-Side GTM: Post-Cookie Measurement\"},\n        \"metadata\": {\"expected_h2_count\": 5, \"expected_word_count\": 1500}\n    },\n    # 50+ examples...\n]\n\nfor ex in examples:\n    client.create_example(**ex, dataset_id=dataset.id)\n",[2083],{"type":32,"tag":62,"props":2084,"children":2085},{"__ignoreMap":16},[2086,2106,2113,2130,2137,2163,2170,2178,2195,2203,2247,2277,2325,2333,2341,2348,2355,2376],{"type":32,"tag":171,"props":2087,"children":2088},{"class":173,"line":174},[2089,2093,2097,2101],{"type":32,"tag":171,"props":2090,"children":2091},{"style":343},[2092],{"type":37,"value":346},{"type":32,"tag":171,"props":2094,"children":2095},{"style":184},[2096],{"type":37,"value":351},{"type":32,"tag":171,"props":2098,"children":2099},{"style":343},[2100],{"type":37,"value":356},{"type":32,"tag":171,"props":2102,"children":2103},{"style":184},[2104],{"type":37,"value":2105}," Client\n",{"type":32,"tag":171,"props":2107,"children":2108},{"class":173,"line":190},[2109],{"type":32,"tag":171,"props":2110,"children":2111},{"emptyLinePlaceholder":367},[2112],{"type":37,"value":370},{"type":32,"tag":171,"props":2114,"children":2115},{"class":173,"line":199},[2116,2121,2125],{"type":32,"tag":171,"props":2117,"children":2118},{"style":184},[2119],{"type":37,"value":2120},"client ",{"type":32,"tag":171,"props":2122,"children":2123},{"style":343},[2124],{"type":37,"value":401},{"type":32,"tag":171,"props":2126,"children":2127},{"style":184},[2128],{"type":37,"value":2129}," Client()\n",{"type":32,"tag":171,"props":2131,"children":2132},{"class":173,"line":219},[2133],{"type":32,"tag":171,"props":2134,"children":2135},{"emptyLinePlaceholder":367},[2136],{"type":37,"value":370},{"type":32,"tag":171,"props":2138,"children":2139},{"class":173,"line":233},[2140,2145,2149,2154,2159],{"type":32,"tag":171,"props":2141,"children":2142},{"style":184},[2143],{"type":37,"value":2144},"dataset ",{"type":32,"tag":171,"props":2146,"children":2147},{"style":343},[2148],{"type":37,"value":401},{"type":32,"tag":171,"props":2150,"children":2151},{"style":184},[2152],{"type":37,"value":2153}," client.create_dataset(",{"type":32,"tag":171,"props":2155,"children":2156},{"style":208},[2157],{"type":37,"value":2158},"\"marketing_blog_golden_v3\"",{"type":32,"tag":171,"props":2160,"children":2161},{"style":184},[2162],{"type":37,"value":642},{"type":32,"tag":171,"props":2164,"children":2165},{"class":173,"line":242},[2166],{"type":32,"tag":171,"props":2167,"children":2168},{"emptyLinePlaceholder":367},[2169],{"type":37,"value":370},{"type":32,"tag":171,"props":2171,"children":2172},{"class":173,"line":250},[2173],{"type":32,"tag":171,"props":2174,"children":2175},{"style":1191},[2176],{"type":37,"value":2177},"# Add golden examples\n",{"type":32,"tag":171,"props":2179,"children":2180},{"class":173,"line":26},[2181,2186,2190],{"type":32,"tag":171,"props":2182,"children":2183},{"style":184},[2184],{"type":37,"value":2185},"examples ",{"type":32,"tag":171,"props":2187,"children":2188},{"style":343},[2189],{"type":37,"value":401},{"type":32,"tag":171,"props":2191,"children":2192},{"style":184},[2193],{"type":37,"value":2194}," [\n",{"type":32,"tag":171,"props":2196,"children":2197},{"class":173,"line":280},[2198],{"type":32,"tag":171,"props":2199,"children":2200},{"style":184},[2201],{"type":37,"value":2202},"    {\n",{"type":32,"tag":171,"props":2204,"children":2205},{"class":173,"line":289},[2206,2211,2215,2219,2223,2227,2231,2235,2239,2243],{"type":32,"tag":171,"props":2207,"children":2208},{"style":208},[2209],{"type":37,"value":2210},"        \"inputs\"",{"type":32,"tag":171,"props":2212,"children":2213},{"style":184},[2214],{"type":37,"value":1818},{"type":32,"tag":171,"props":2216,"children":2217},{"style":208},[2218],{"type":37,"value":1823},{"type":32,"tag":171,"props":2220,"children":2221},{"style":184},[2222],{"type":37,"value":539},{"type":32,"tag":171,"props":2224,"children":2225},{"style":208},[2226],{"type":37,"value":1832},{"type":32,"tag":171,"props":2228,"children":2229},{"style":184},[2230],{"type":37,"value":416},{"type":32,"tag":171,"props":2232,"children":2233},{"style":208},[2234],{"type":37,"value":1841},{"type":32,"tag":171,"props":2236,"children":2237},{"style":184},[2238],{"type":37,"value":539},{"type":32,"tag":171,"props":2240,"children":2241},{"style":208},[2242],{"type":37,"value":1850},{"type":32,"tag":171,"props":2244,"children":2245},{"style":184},[2246],{"type":37,"value":1855},{"type":32,"tag":171,"props":2248,"children":2249},{"class":173,"line":618},[2250,2255,2259,2264,2268,2273],{"type":32,"tag":171,"props":2251,"children":2252},{"style":208},[2253],{"type":37,"value":2254},"        \"outputs\"",{"type":32,"tag":171,"props":2256,"children":2257},{"style":184},[2258],{"type":37,"value":1818},{"type":32,"tag":171,"props":2260,"children":2261},{"style":208},[2262],{"type":37,"value":2263},"\"title\"",{"type":32,"tag":171,"props":2265,"children":2266},{"style":184},[2267],{"type":37,"value":539},{"type":32,"tag":171,"props":2269,"children":2270},{"style":208},[2271],{"type":37,"value":2272},"\"Server-Side GTM: Post-Cookie Measurement\"",{"type":32,"tag":171,"props":2274,"children":2275},{"style":184},[2276],{"type":37,"value":1855},{"type":32,"tag":171,"props":2278,"children":2279},{"class":173,"line":636},[2280,2285,2289,2294,2298,2303,2307,2312,2316,2321],{"type":32,"tag":171,"props":2281,"children":2282},{"style":208},[2283],{"type":37,"value":2284},"        \"metadata\"",{"type":32,"tag":171,"props":2286,"children":2287},{"style":184},[2288],{"type":37,"value":1818},{"type":32,"tag":171,"props":2290,"children":2291},{"style":208},[2292],{"type":37,"value":2293},"\"expected_h2_count\"",{"type":32,"tag":171,"props":2295,"children":2296},{"style":184},[2297],{"type":37,"value":539},{"type":32,"tag":171,"props":2299,"children":2300},{"style":274},[2301],{"type":37,"value":2302},"5",{"type":32,"tag":171,"props":2304,"children":2305},{"style":184},[2306],{"type":37,"value":416},{"type":32,"tag":171,"props":2308,"children":2309},{"style":208},[2310],{"type":37,"value":2311},"\"expected_word_count\"",{"type":32,"tag":171,"props":2313,"children":2314},{"style":184},[2315],{"type":37,"value":539},{"type":32,"tag":171,"props":2317,"children":2318},{"style":274},[2319],{"type":37,"value":2320},"1500",{"type":32,"tag":171,"props":2322,"children":2323},{"style":184},[2324],{"type":37,"value":2043},{"type":32,"tag":171,"props":2326,"children":2327},{"class":173,"line":855},[2328],{"type":32,"tag":171,"props":2329,"children":2330},{"style":184},[2331],{"type":37,"value":2332},"    },\n",{"type":32,"tag":171,"props":2334,"children":2335},{"class":173,"line":878},[2336],{"type":32,"tag":171,"props":2337,"children":2338},{"style":1191},[2339],{"type":37,"value":2340},"    # 50+ examples...\n",{"type":32,"tag":171,"props":2342,"children":2343},{"class":173,"line":899},[2344],{"type":32,"tag":171,"props":2345,"children":2346},{"style":184},[2347],{"type":37,"value":295},{"type":32,"tag":171,"props":2349,"children":2350},{"class":173,"line":917},[2351],{"type":32,"tag":171,"props":2352,"children":2353},{"emptyLinePlaceholder":367},[2354],{"type":37,"value":370},{"type":32,"tag":171,"props":2356,"children":2357},{"class":173,"line":938},[2358,2362,2367,2371],{"type":32,"tag":171,"props":2359,"children":2360},{"style":343},[2361],{"type":37,"value":483},{"type":32,"tag":171,"props":2363,"children":2364},{"style":184},[2365],{"type":37,"value":2366}," ex ",{"type":32,"tag":171,"props":2368,"children":2369},{"style":343},[2370],{"type":37,"value":493},{"type":32,"tag":171,"props":2372,"children":2373},{"style":184},[2374],{"type":37,"value":2375}," examples:\n",{"type":32,"tag":171,"props":2377,"children":2378},{"class":173,"line":955},[2379,2384,2389,2394,2399,2403],{"type":32,"tag":171,"props":2380,"children":2381},{"style":184},[2382],{"type":37,"value":2383},"    client.create_example(",{"type":32,"tag":171,"props":2385,"children":2386},{"style":343},[2387],{"type":37,"value":2388},"**",{"type":32,"tag":171,"props":2390,"children":2391},{"style":184},[2392],{"type":37,"value":2393},"ex, ",{"type":32,"tag":171,"props":2395,"children":2396},{"style":599},[2397],{"type":37,"value":2398},"dataset_id",{"type":32,"tag":171,"props":2400,"children":2401},{"style":343},[2402],{"type":37,"value":401},{"type":32,"tag":171,"props":2404,"children":2405},{"style":184},[2406],{"type":37,"value":2407},"dataset.id)\n",{"type":32,"tag":33,"props":2409,"children":2410},{},[2411],{"type":37,"value":2412},"Test every prompt change against this dataset. If pass rate drops, don't deploy. Add new edge cases to the dataset (bugs you find in production) so regression doesn't happen again.",{"type":32,"tag":45,"props":2414,"children":2416},{"id":2415},"tradeoff-deterministic-metrics-vs-creative-output",[2417],{"type":37,"value":2418},"Tradeoff: Deterministic Metrics vs Creative Output",{"type":32,"tag":33,"props":2420,"children":2421},{},[2422],{"type":37,"value":2423},"LLMs' power is being non-deterministic — same input, different output. But in production this power is a risk: customers see different markdown each page refresh, some broken.",{"type":32,"tag":33,"props":2425,"children":2426},{},[2427],{"type":37,"value":2428},"Temperature 0 increases determinism but output becomes monotonous. Tradeoff:",{"type":32,"tag":84,"props":2430,"children":2431},{},[2432,2442,2452],{"type":32,"tag":88,"props":2433,"children":2434},{},[2435,2440],{"type":32,"tag":123,"props":2436,"children":2437},{},[2438],{"type":37,"value":2439},"Temperature 0",{"type":37,"value":2441},": ideal for eval suites, monotonous for production",{"type":32,"tag":88,"props":2443,"children":2444},{},[2445,2450],{"type":32,"tag":123,"props":2446,"children":2447},{},[2448],{"type":37,"value":2449},"Temperature 0.3-0.5",{"type":37,"value":2451},": reasonable variety, still consistent",{"type":32,"tag":88,"props":2453,"children":2454},{},[2455,2460],{"type":32,"tag":123,"props":2456,"children":2457},{},[2458],{"type":37,"value":2459},"Temperature 0.7+",{"type":37,"value":2461},": creative but even if eval suite passes, production surprises",{"type":32,"tag":33,"props":2463,"children":2464},{},[2465],{"type":37,"value":2466},"Solution: eval at temperature 0, production at 0.4, store 5 different acceptable outputs per input in golden set (range checking).",{"type":32,"tag":33,"props":2468,"children":2469},{},[2470,2472,2477],{"type":37,"value":2471},"Another tradeoff: ",{"type":32,"tag":123,"props":2473,"children":2474},{},[2475],{"type":37,"value":2476},"latency vs quality",{"type":37,"value":2478},". Longer prompts produce better output but input token cost increases, latency rises. In Promptfoo, if latency metric exceeds 2.5s, alert — don't degrade user experience.",{"type":32,"tag":45,"props":2480,"children":2482},{"id":2481},"production-checklist-before-deploying-an-llm-system",[2483],{"type":37,"value":2484},"Production Checklist: Before Deploying an LLM System",{"type":32,"tag":33,"props":2486,"children":2487},{},[2488],{"type":37,"value":2489},"Pre-deployment checklist:",{"type":32,"tag":84,"props":2491,"children":2494},{"className":2492},[2493],"contains-task-list",[2495,2507,2516,2525,2534,2543,2552,2561,2570],{"type":32,"tag":88,"props":2496,"children":2499},{"className":2497},[2498],"task-list-item",[2500,2505],{"type":32,"tag":2501,"props":2502,"children":2504},"input",{"disabled":367,"type":2503},"checkbox",[],{"type":37,"value":2506}," Prompt in git repo, commit history clean",{"type":32,"tag":88,"props":2508,"children":2510},{"className":2509},[2498],[2511,2514],{"type":32,"tag":2501,"props":2512,"children":2513},{"disabled":367,"type":2503},[],{"type":37,"value":2515}," Promptfoo eval suite pass rate > 95%",{"type":32,"tag":88,"props":2517,"children":2519},{"className":2518},[2498],[2520,2523],{"type":32,"tag":2501,"props":2521,"children":2522},{"disabled":367,"type":2503},[],{"type":37,"value":2524}," Golden dataset min 50 examples",{"type":32,"tag":88,"props":2526,"children":2528},{"className":2527},[2498],[2529,2532],{"type":32,"tag":2501,"props":2530,"children":2531},{"disabled":367,"type":2503},[],{"type":37,"value":2533}," A\u002FB test plan ready, sample size calculated",{"type":32,"tag":88,"props":2535,"children":2537},{"className":2536},[2498],[2538,2541],{"type":32,"tag":2501,"props":2539,"children":2540},{"disabled":367,"type":2503},[],{"type":37,"value":2542}," LangSmith tracing on, API key in production",{"type":32,"tag":88,"props":2544,"children":2546},{"className":2545},[2498],[2547,2550],{"type":32,"tag":2501,"props":2548,"children":2549},{"disabled":367,"type":2503},[],{"type":37,"value":2551}," Feedback loop set up (editors scoring, BigQuery join)",{"type":32,"tag":88,"props":2553,"children":2555},{"className":2554},[2498],[2556,2559],{"type":32,"tag":2501,"props":2557,"children":2558},{"disabled":367,"type":2503},[],{"type":37,"value":2560}," Rollback procedure defined (which metric drop triggers auto-revert)",{"type":32,"tag":88,"props":2562,"children":2564},{"className":2563},[2498],[2565,2568],{"type":32,"tag":2501,"props":2566,"children":2567},{"disabled":367,"type":2503},[],{"type":37,"value":2569}," Cost monitoring — daily token spend threshold $X",{"type":32,"tag":88,"props":2571,"children":2573},{"className":2572},[2498],[2574,2577],{"type":32,"tag":2501,"props":2575,"children":2576},{"disabled":367,"type":2503},[],{"type":37,"value":2578}," Latency SLA — p95 \u003C 3s",{"type":32,"tag":33,"props":2580,"children":2581},{},[2582],{"type":37,"value":2583},"Without completing this checklist, you're not delivering \"AI services.\" Without versioning, eval, and observability, production LLM operations isn't discipline — it's controlled chaos.",{"type":32,"tag":2585,"props":2586,"children":2587},"hr",{},[],{"type":32,"tag":33,"props":2589,"children":2590},{},[2591],{"type":37,"value":2592},"Prompt versioning is a discipline matter — not for speed, but for reliability. In techniques like Generative Engine Optimization where output quality directly ties to business outcomes, an eval pipeline is non-negotiable. Without one, every deployment risks losing previous performance gains. Promptfoo provides local assurance, LangSmith provides production visibility. Together, they elevate LLM operations to software engineering standards.",{"type":32,"tag":2594,"props":2595,"children":2596},"style",{},[2597],{"type":37,"value":2598},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":199,"depth":199,"links":2600},[2601,2602,2605,2606,2609,2610],{"id":47,"depth":190,"text":50},{"id":110,"depth":190,"text":113,"children":2603},[2604],{"id":679,"depth":199,"text":682},{"id":1113,"depth":190,"text":1116},{"id":1700,"depth":190,"text":1703,"children":2607},[2608],{"id":2058,"depth":199,"text":2061},{"id":2415,"depth":190,"text":2418},{"id":2481,"depth":190,"text":2484},"markdown","content:en:ai:llm-ops-prompt-versioning-ab-testing.md","content","en\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing.md","en\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing","md",1778709810163]