[{"data":1,"prerenderedAt":2637},["ShallowReactive",2],{"article-alternates":3,"article-\u002Fes\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops":13},{"i18nKey":4,"paths":5},"ai-004-2026-05",{"de":6,"en":7,"es":8,"fr":9,"it":10,"ru":11,"tr":12},"\u002Fde\u002Fai\u002Fprompt-versionierung-llm-evaluation","\u002Fen\u002Fai\u002Fllm-ops-prompt-versioning-ab-testing","\u002Fes\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops","\u002Ffr\u002Fai\u002Fversionamento-prompt-ab-test","\u002Fit\u002Fai\u002Fversionamento-prompt-e-a-b-test-disciplina-llm-ops","\u002Fru\u002Fai\u002Fprompt-versionierung-und-ab-tests-llm-ops-disziplin","\u002Ftr\u002Fai\u002Fprompt-versiyonlama-ve-a-b-testi-llm-operasyonun-disiplini",{"_path":8,"_dir":14,"_draft":15,"_partial":15,"_locale":16,"title":17,"description":18,"publishedAt":19,"modifiedAt":19,"category":14,"i18nKey":4,"tags":20,"readingTime":26,"author":27,"body":28,"_type":2631,"_id":2632,"_source":2633,"_file":2634,"_stem":2635,"_extension":2636},"ai",false,"","Versionado de Prompts y A\u002FB Testing: La Disciplina de LLM Ops","Cómo construir versionado de prompts, pipelines de evaluación y control de calidad determinístico en sistemas LLM production con Promptfoo y LangSmith.","2026-05-13",[21,22,23,24,25],"llm-ops","prompt-engineering","evaluación","mlops","ai-quality",8,"Roibase",{"type":29,"children":30,"toc":2619},"root",[31,39,44,51,56,78,83,103,108,114,119,130,148,161,296,306,324,329,643,653,671,687,694,699,704,1020,1033,1117,1122,1128,1133,1185,1190,1480,1485,1704,1709,1715,1720,1732,1737,2062,2067,2073,2085,2090,2419,2424,2430,2435,2440,2473,2478,2490,2496,2501,2590,2595,2599,2613],{"type":32,"tag":33,"props":34,"children":35},"element","p",{},[36],{"type":37,"value":38},"text","En sistemas que usan LLM, hay 15 pasos entre \"funciona\" y \"confiable en producción\". La automatización de marketing genera markdown con Claude API, la segmentación de viajes de clientes usa GPT — pero cuando cambias el prompt, ¿cómo garantizas que no creaste una regresión? En ingeniería de software, versionado, cobertura de tests y CI\u002FCD son estándar; en operaciones LLM sin esa disciplina, cada deployment es una apuesta.",{"type":32,"tag":33,"props":40,"children":41},{},[42],{"type":37,"value":43},"Herramientas como Promptfoo y LangSmith proporcionan esa disciplina: versionado de prompts, evaluación determinística, A\u002FB testing, tracking de métricas. Este artículo muestra cómo construir control de calidad en un sistema LLM production — a nivel de infraestructura, no solo código.",{"type":32,"tag":45,"props":46,"children":48},"h2",{"id":47},"la-ilusión-de-que-el-prompt-no-es-software",[49],{"type":37,"value":50},"La Ilusión de que el Prompt No Es Software",{"type":32,"tag":33,"props":52,"children":53},{},[54],{"type":37,"value":55},"La mayoría de los equipos tratan el prompt como un \"archivo de configuración\" — editor en UI, documentación en Notion, texto hardcoded en un nodo de workflow de n8n. En realidad, el prompt es una especificación ejecutable que define el comportamiento del sistema. Pero no hay versionado, no hay diff, no hay rollback.",{"type":32,"tag":33,"props":57,"children":58},{},[59,61,68,70,76],{"type":37,"value":60},"Un cambio de commit con mensaje \"fix typo\" puede alterar el tono del output del modelo y degradar las métricas. Especialmente en escenarios de salida estructurada (JSON schema, frontmatter markdown, query SQL), una sola palabra rompiendo el formato causa errores en cadena. Ejemplo: escribir ",{"type":32,"tag":62,"props":63,"children":65},"code",{"className":64},[],[66],{"type":37,"value":67},"OUTPUT FORMAT: JSON",{"type":37,"value":69}," en lugar de ",{"type":32,"tag":62,"props":71,"children":73},{"className":72},[],[74],{"type":37,"value":75},"OUTPUT FORMAT: Valid JSON",{"type":37,"value":77}," hace que el modelo a veces agregue párrafos de explicación — falla en el parser downstream, alertas se disparan, 3 horas de debugging.",{"type":32,"tag":33,"props":79,"children":80},{},[81],{"type":37,"value":82},"La disciplina de versionado debe responder estas preguntas:",{"type":32,"tag":84,"props":85,"children":86},"ul",{},[87,93,98],{"type":32,"tag":88,"props":89,"children":90},"li",{},[91],{"type":37,"value":92},"¿Qué versión del prompt está en producción ahora?",{"type":32,"tag":88,"props":94,"children":95},{},[96],{"type":37,"value":97},"¿Cuál es la diferencia de rendimiento entre la versión de hace 2 semanas y la actual?",{"type":32,"tag":88,"props":99,"children":100},{},[101],{"type":37,"value":102},"¿Qué variante en el A\u002FB test aumentó la conversión un 8%?",{"type":32,"tag":33,"props":104,"children":105},{},[106],{"type":37,"value":107},"Si no puedes responder estas preguntas, no estás haciendo \"operaciones de IA\" — estás ejecutando experimentos manuales.",{"type":32,"tag":45,"props":109,"children":111},{"id":110},"pipeline-de-evaluación-tres-capas-para-medir-el-output",[112],{"type":37,"value":113},"Pipeline de Evaluación: Tres Capas para Medir el Output",{"type":32,"tag":33,"props":115,"children":116},{},[117],{"type":37,"value":118},"Evaluar output de LLM parece subjetivo, pero en sistemas production es posible construir métricas determinísticas. La evaluación funciona en tres capas: sintaxis, semántica, resultado de negocio.",{"type":32,"tag":33,"props":120,"children":121},{},[122,128],{"type":32,"tag":123,"props":124,"children":125},"strong",{},[126],{"type":37,"value":127},"Capa de sintaxis",{"type":37,"value":129}," — conformidad de formato:",{"type":32,"tag":84,"props":131,"children":132},{},[133,138,143],{"type":32,"tag":88,"props":134,"children":135},{},[136],{"type":37,"value":137},"¿Se parsea el JSON?",{"type":32,"tag":88,"props":139,"children":140},{},[141],{"type":37,"value":142},"¿Es válido el frontmatter markdown?",{"type":32,"tag":88,"props":144,"children":145},{},[146],{"type":37,"value":147},"¿Están presentes los campos esperados?",{"type":32,"tag":33,"props":149,"children":150},{},[151,153,159],{"type":37,"value":152},"En Promptfoo se controla con assertions ",{"type":32,"tag":62,"props":154,"children":156},{"className":155},[],[157],{"type":37,"value":158},"javascript",{"type":37,"value":160},":",{"type":32,"tag":162,"props":163,"children":166},"pre",{"className":164,"code":165,"language":158,"meta":16,"style":16},"language-javascript shiki shiki-themes github-dark","assert: [\n  {\n    type: \"javascript\",\n    value: \"JSON.parse(output).title.length \u003C= 60\"\n  },\n  {\n    type: \"is-json\",\n    value: true\n  }\n]\n",[167],{"type":32,"tag":62,"props":168,"children":169},{"__ignoreMap":16},[170,188,197,217,231,240,248,265,278,287],{"type":32,"tag":171,"props":172,"children":175},"span",{"class":173,"line":174},"line",1,[176,182],{"type":32,"tag":171,"props":177,"children":179},{"style":178},"--shiki-default:#B392F0",[180],{"type":37,"value":181},"assert",{"type":32,"tag":171,"props":183,"children":185},{"style":184},"--shiki-default:#E1E4E8",[186],{"type":37,"value":187},": [\n",{"type":32,"tag":171,"props":189,"children":191},{"class":173,"line":190},2,[192],{"type":32,"tag":171,"props":193,"children":194},{"style":184},[195],{"type":37,"value":196},"  {\n",{"type":32,"tag":171,"props":198,"children":200},{"class":173,"line":199},3,[201,206,212],{"type":32,"tag":171,"props":202,"children":203},{"style":184},[204],{"type":37,"value":205},"    type: ",{"type":32,"tag":171,"props":207,"children":209},{"style":208},"--shiki-default:#9ECBFF",[210],{"type":37,"value":211},"\"javascript\"",{"type":32,"tag":171,"props":213,"children":214},{"style":184},[215],{"type":37,"value":216},",\n",{"type":32,"tag":171,"props":218,"children":220},{"class":173,"line":219},4,[221,226],{"type":32,"tag":171,"props":222,"children":223},{"style":184},[224],{"type":37,"value":225},"    value: ",{"type":32,"tag":171,"props":227,"children":228},{"style":208},[229],{"type":37,"value":230},"\"JSON.parse(output).title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":232,"children":234},{"class":173,"line":233},5,[235],{"type":32,"tag":171,"props":236,"children":237},{"style":184},[238],{"type":37,"value":239},"  },\n",{"type":32,"tag":171,"props":241,"children":243},{"class":173,"line":242},6,[244],{"type":32,"tag":171,"props":245,"children":246},{"style":184},[247],{"type":37,"value":196},{"type":32,"tag":171,"props":249,"children":251},{"class":173,"line":250},7,[252,256,261],{"type":32,"tag":171,"props":253,"children":254},{"style":184},[255],{"type":37,"value":205},{"type":32,"tag":171,"props":257,"children":258},{"style":208},[259],{"type":37,"value":260},"\"is-json\"",{"type":32,"tag":171,"props":262,"children":263},{"style":184},[264],{"type":37,"value":216},{"type":32,"tag":171,"props":266,"children":267},{"class":173,"line":26},[268,272],{"type":32,"tag":171,"props":269,"children":270},{"style":184},[271],{"type":37,"value":225},{"type":32,"tag":171,"props":273,"children":275},{"style":274},"--shiki-default:#79B8FF",[276],{"type":37,"value":277},"true\n",{"type":32,"tag":171,"props":279,"children":281},{"class":173,"line":280},9,[282],{"type":32,"tag":171,"props":283,"children":284},{"style":184},[285],{"type":37,"value":286},"  }\n",{"type":32,"tag":171,"props":288,"children":290},{"class":173,"line":289},10,[291],{"type":32,"tag":171,"props":292,"children":293},{"style":184},[294],{"type":37,"value":295},"]\n",{"type":32,"tag":33,"props":297,"children":298},{},[299,304],{"type":32,"tag":123,"props":300,"children":301},{},[302],{"type":37,"value":303},"Capa de semántica",{"type":37,"value":305}," — calidad del contenido:",{"type":32,"tag":84,"props":307,"children":308},{},[309,314,319],{"type":32,"tag":88,"props":310,"children":311},{},[312],{"type":37,"value":313},"¿La respuesta es relevante al tema? (similitud de embeddings, cosine distance > 0.85)",{"type":32,"tag":88,"props":315,"children":316},{},[317],{"type":37,"value":318},"¿Contiene palabras prohibidas? (regex, token filtering)",{"type":32,"tag":88,"props":320,"children":321},{},[322],{"type":37,"value":323},"¿Es el tono correcto? (modelo clasificador, sentiment score)",{"type":32,"tag":33,"props":325,"children":326},{},[327],{"type":37,"value":328},"Evaluador personalizado en LangSmith:",{"type":32,"tag":162,"props":330,"children":334},{"className":331,"code":332,"language":333,"meta":16,"style":16},"language-python shiki shiki-themes github-dark","from langsmith import evaluate\n\ndef check_brand_compliance(run, example):\n    forbidden = [\"experto\", \"líder\", \"revolucionario\"]\n    output = run.outputs[\"text\"].lower()\n    violations = [w for w in forbidden if w in output]\n    return {\"score\": 0 if violations else 1, \"violations\": violations}\n\nevaluate(\n    dataset_name=\"marketing_blog_posts\",\n    evaluators=[check_brand_compliance]\n)\n","python",[335],{"type":32,"tag":62,"props":336,"children":337},{"__ignoreMap":16},[338,362,371,389,435,462,517,579,586,594,616,634],{"type":32,"tag":171,"props":339,"children":340},{"class":173,"line":174},[341,347,352,357],{"type":32,"tag":171,"props":342,"children":344},{"style":343},"--shiki-default:#F97583",[345],{"type":37,"value":346},"from",{"type":32,"tag":171,"props":348,"children":349},{"style":184},[350],{"type":37,"value":351}," langsmith ",{"type":32,"tag":171,"props":353,"children":354},{"style":343},[355],{"type":37,"value":356},"import",{"type":32,"tag":171,"props":358,"children":359},{"style":184},[360],{"type":37,"value":361}," evaluate\n",{"type":32,"tag":171,"props":363,"children":364},{"class":173,"line":190},[365],{"type":32,"tag":171,"props":366,"children":368},{"emptyLinePlaceholder":367},true,[369],{"type":37,"value":370},"\n",{"type":32,"tag":171,"props":372,"children":373},{"class":173,"line":199},[374,379,384],{"type":32,"tag":171,"props":375,"children":376},{"style":343},[377],{"type":37,"value":378},"def",{"type":32,"tag":171,"props":380,"children":381},{"style":178},[382],{"type":37,"value":383}," check_brand_compliance",{"type":32,"tag":171,"props":385,"children":386},{"style":184},[387],{"type":37,"value":388},"(run, example):\n",{"type":32,"tag":171,"props":390,"children":391},{"class":173,"line":219},[392,397,402,407,412,417,422,426,431],{"type":32,"tag":171,"props":393,"children":394},{"style":184},[395],{"type":37,"value":396},"    forbidden ",{"type":32,"tag":171,"props":398,"children":399},{"style":343},[400],{"type":37,"value":401},"=",{"type":32,"tag":171,"props":403,"children":404},{"style":184},[405],{"type":37,"value":406}," [",{"type":32,"tag":171,"props":408,"children":409},{"style":208},[410],{"type":37,"value":411},"\"experto\"",{"type":32,"tag":171,"props":413,"children":414},{"style":184},[415],{"type":37,"value":416},", ",{"type":32,"tag":171,"props":418,"children":419},{"style":208},[420],{"type":37,"value":421},"\"líder\"",{"type":32,"tag":171,"props":423,"children":424},{"style":184},[425],{"type":37,"value":416},{"type":32,"tag":171,"props":427,"children":428},{"style":208},[429],{"type":37,"value":430},"\"revolucionario\"",{"type":32,"tag":171,"props":432,"children":433},{"style":184},[434],{"type":37,"value":295},{"type":32,"tag":171,"props":436,"children":437},{"class":173,"line":233},[438,443,447,452,457],{"type":32,"tag":171,"props":439,"children":440},{"style":184},[441],{"type":37,"value":442},"    output ",{"type":32,"tag":171,"props":444,"children":445},{"style":343},[446],{"type":37,"value":401},{"type":32,"tag":171,"props":448,"children":449},{"style":184},[450],{"type":37,"value":451}," run.outputs[",{"type":32,"tag":171,"props":453,"children":454},{"style":208},[455],{"type":37,"value":456},"\"text\"",{"type":32,"tag":171,"props":458,"children":459},{"style":184},[460],{"type":37,"value":461},"].lower()\n",{"type":32,"tag":171,"props":463,"children":464},{"class":173,"line":242},[465,470,474,479,484,489,494,499,504,508,512],{"type":32,"tag":171,"props":466,"children":467},{"style":184},[468],{"type":37,"value":469},"    violations ",{"type":32,"tag":171,"props":471,"children":472},{"style":343},[473],{"type":37,"value":401},{"type":32,"tag":171,"props":475,"children":476},{"style":184},[477],{"type":37,"value":478}," [w ",{"type":32,"tag":171,"props":480,"children":481},{"style":343},[482],{"type":37,"value":483},"for",{"type":32,"tag":171,"props":485,"children":486},{"style":184},[487],{"type":37,"value":488}," w ",{"type":32,"tag":171,"props":490,"children":491},{"style":343},[492],{"type":37,"value":493},"in",{"type":32,"tag":171,"props":495,"children":496},{"style":184},[497],{"type":37,"value":498}," forbidden ",{"type":32,"tag":171,"props":500,"children":501},{"style":343},[502],{"type":37,"value":503},"if",{"type":32,"tag":171,"props":505,"children":506},{"style":184},[507],{"type":37,"value":488},{"type":32,"tag":171,"props":509,"children":510},{"style":343},[511],{"type":37,"value":493},{"type":32,"tag":171,"props":513,"children":514},{"style":184},[515],{"type":37,"value":516}," output]\n",{"type":32,"tag":171,"props":518,"children":519},{"class":173,"line":250},[520,525,530,535,540,545,550,555,560,565,569,574],{"type":32,"tag":171,"props":521,"children":522},{"style":343},[523],{"type":37,"value":524},"    return",{"type":32,"tag":171,"props":526,"children":527},{"style":184},[528],{"type":37,"value":529}," {",{"type":32,"tag":171,"props":531,"children":532},{"style":208},[533],{"type":37,"value":534},"\"score\"",{"type":32,"tag":171,"props":536,"children":537},{"style":184},[538],{"type":37,"value":539},": ",{"type":32,"tag":171,"props":541,"children":542},{"style":274},[543],{"type":37,"value":544},"0",{"type":32,"tag":171,"props":546,"children":547},{"style":343},[548],{"type":37,"value":549}," if",{"type":32,"tag":171,"props":551,"children":552},{"style":184},[553],{"type":37,"value":554}," violations ",{"type":32,"tag":171,"props":556,"children":557},{"style":343},[558],{"type":37,"value":559},"else",{"type":32,"tag":171,"props":561,"children":562},{"style":274},[563],{"type":37,"value":564}," 1",{"type":32,"tag":171,"props":566,"children":567},{"style":184},[568],{"type":37,"value":416},{"type":32,"tag":171,"props":570,"children":571},{"style":208},[572],{"type":37,"value":573},"\"violations\"",{"type":32,"tag":171,"props":575,"children":576},{"style":184},[577],{"type":37,"value":578},": violations}\n",{"type":32,"tag":171,"props":580,"children":581},{"class":173,"line":26},[582],{"type":32,"tag":171,"props":583,"children":584},{"emptyLinePlaceholder":367},[585],{"type":37,"value":370},{"type":32,"tag":171,"props":587,"children":588},{"class":173,"line":280},[589],{"type":32,"tag":171,"props":590,"children":591},{"style":184},[592],{"type":37,"value":593},"evaluate(\n",{"type":32,"tag":171,"props":595,"children":596},{"class":173,"line":289},[597,603,607,612],{"type":32,"tag":171,"props":598,"children":600},{"style":599},"--shiki-default:#FFAB70",[601],{"type":37,"value":602},"    dataset_name",{"type":32,"tag":171,"props":604,"children":605},{"style":343},[606],{"type":37,"value":401},{"type":32,"tag":171,"props":608,"children":609},{"style":208},[610],{"type":37,"value":611},"\"marketing_blog_posts\"",{"type":32,"tag":171,"props":613,"children":614},{"style":184},[615],{"type":37,"value":216},{"type":32,"tag":171,"props":617,"children":619},{"class":173,"line":618},11,[620,625,629],{"type":32,"tag":171,"props":621,"children":622},{"style":599},[623],{"type":37,"value":624},"    evaluators",{"type":32,"tag":171,"props":626,"children":627},{"style":343},[628],{"type":37,"value":401},{"type":32,"tag":171,"props":630,"children":631},{"style":184},[632],{"type":37,"value":633},"[check_brand_compliance]\n",{"type":32,"tag":171,"props":635,"children":637},{"class":173,"line":636},12,[638],{"type":32,"tag":171,"props":639,"children":640},{"style":184},[641],{"type":37,"value":642},")\n",{"type":32,"tag":33,"props":644,"children":645},{},[646,651],{"type":32,"tag":123,"props":647,"children":648},{},[649],{"type":37,"value":650},"Capa de resultado de negocio",{"type":37,"value":652}," — impacto real:",{"type":32,"tag":84,"props":654,"children":655},{},[656,661,666],{"type":32,"tag":88,"props":657,"children":658},{},[659],{"type":37,"value":660},"¿Cambió el CTR?",{"type":32,"tag":88,"props":662,"children":663},{},[664],{"type":37,"value":665},"¿Bajó la conversión?",{"type":32,"tag":88,"props":667,"children":668},{},[669],{"type":37,"value":670},"¿Subió la tasa de rebote?",{"type":32,"tag":33,"props":672,"children":673},{},[674,676,685],{"type":37,"value":675},"Esta capa se conecta con telemetría de producción — en un sistema de ",{"type":32,"tag":677,"props":678,"children":682},"a",{"href":679,"rel":680},"https:\u002F\u002Fwww.roibase.com.tr\u002Fes\u002Ffirstparty",[681],"nofollow",[683],{"type":37,"value":684},"Primera Parte: Datos y Arquitectura de Medición",{"type":37,"value":686},", la versión del prompt se agrega como metadata al tracking de eventos, se une en BigQuery, y un modelo dbt calcula el conversion rate de cada versión.",{"type":32,"tag":688,"props":689,"children":691},"h3",{"id":690},"promptfoo-construir-un-suite-de-tests-determinístico",[692],{"type":37,"value":693},"Promptfoo: Construir un Suite de Tests Determinístico",{"type":32,"tag":33,"props":695,"children":696},{},[697],{"type":37,"value":698},"Promptfoo es un framework de evaluación que corre localmente, basado en YAML. El objetivo: validar cada cambio de prompt con una suite de regresión antes de desplegar.",{"type":32,"tag":33,"props":700,"children":701},{},[702],{"type":37,"value":703},"Configuración simple:",{"type":32,"tag":162,"props":705,"children":709},{"className":706,"code":707,"language":708,"meta":16,"style":16},"language-yaml shiki shiki-themes github-dark","prompts:\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n  - file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n\nproviders:\n  - anthropic:messages:claude-3-5-sonnet-20241022\n\ntests:\n  - vars:\n      topic: \"Server-side GTM\"\n      category: \"tech\"\n    assert:\n      - type: is-json\n      - type: javascript\n        value: \"output.title.length \u003C= 60\"\n      - type: similar\n        value: \"arquitectura de tracking server-side\"\n        threshold: 0.8\n      - type: not-contains\n        value: \"revolucionario\"\n","yaml",[710],{"type":32,"tag":62,"props":711,"children":712},{"__ignoreMap":16},[713,727,740,752,759,771,783,790,802,818,835,852,864,887,908,926,947,964,982,1003],{"type":32,"tag":171,"props":714,"children":715},{"class":173,"line":174},[716,722],{"type":32,"tag":171,"props":717,"children":719},{"style":718},"--shiki-default:#85E89D",[720],{"type":37,"value":721},"prompts",{"type":32,"tag":171,"props":723,"children":724},{"style":184},[725],{"type":37,"value":726},":\n",{"type":32,"tag":171,"props":728,"children":729},{"class":173,"line":190},[730,735],{"type":32,"tag":171,"props":731,"children":732},{"style":184},[733],{"type":37,"value":734},"  - ",{"type":32,"tag":171,"props":736,"children":737},{"style":208},[738],{"type":37,"value":739},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v1.md\n",{"type":32,"tag":171,"props":741,"children":742},{"class":173,"line":199},[743,747],{"type":32,"tag":171,"props":744,"children":745},{"style":184},[746],{"type":37,"value":734},{"type":32,"tag":171,"props":748,"children":749},{"style":208},[750],{"type":37,"value":751},"file:\u002F\u002Fprompts\u002Fmarketing_blog_v2.md\n",{"type":32,"tag":171,"props":753,"children":754},{"class":173,"line":219},[755],{"type":32,"tag":171,"props":756,"children":757},{"emptyLinePlaceholder":367},[758],{"type":37,"value":370},{"type":32,"tag":171,"props":760,"children":761},{"class":173,"line":233},[762,767],{"type":32,"tag":171,"props":763,"children":764},{"style":718},[765],{"type":37,"value":766},"providers",{"type":32,"tag":171,"props":768,"children":769},{"style":184},[770],{"type":37,"value":726},{"type":32,"tag":171,"props":772,"children":773},{"class":173,"line":242},[774,778],{"type":32,"tag":171,"props":775,"children":776},{"style":184},[777],{"type":37,"value":734},{"type":32,"tag":171,"props":779,"children":780},{"style":208},[781],{"type":37,"value":782},"anthropic:messages:claude-3-5-sonnet-20241022\n",{"type":32,"tag":171,"props":784,"children":785},{"class":173,"line":250},[786],{"type":32,"tag":171,"props":787,"children":788},{"emptyLinePlaceholder":367},[789],{"type":37,"value":370},{"type":32,"tag":171,"props":791,"children":792},{"class":173,"line":26},[793,798],{"type":32,"tag":171,"props":794,"children":795},{"style":718},[796],{"type":37,"value":797},"tests",{"type":32,"tag":171,"props":799,"children":800},{"style":184},[801],{"type":37,"value":726},{"type":32,"tag":171,"props":803,"children":804},{"class":173,"line":280},[805,809,814],{"type":32,"tag":171,"props":806,"children":807},{"style":184},[808],{"type":37,"value":734},{"type":32,"tag":171,"props":810,"children":811},{"style":718},[812],{"type":37,"value":813},"vars",{"type":32,"tag":171,"props":815,"children":816},{"style":184},[817],{"type":37,"value":726},{"type":32,"tag":171,"props":819,"children":820},{"class":173,"line":289},[821,826,830],{"type":32,"tag":171,"props":822,"children":823},{"style":718},[824],{"type":37,"value":825},"      topic",{"type":32,"tag":171,"props":827,"children":828},{"style":184},[829],{"type":37,"value":539},{"type":32,"tag":171,"props":831,"children":832},{"style":208},[833],{"type":37,"value":834},"\"Server-side GTM\"\n",{"type":32,"tag":171,"props":836,"children":837},{"class":173,"line":618},[838,843,847],{"type":32,"tag":171,"props":839,"children":840},{"style":718},[841],{"type":37,"value":842},"      category",{"type":32,"tag":171,"props":844,"children":845},{"style":184},[846],{"type":37,"value":539},{"type":32,"tag":171,"props":848,"children":849},{"style":208},[850],{"type":37,"value":851},"\"tech\"\n",{"type":32,"tag":171,"props":853,"children":854},{"class":173,"line":636},[855,860],{"type":32,"tag":171,"props":856,"children":857},{"style":718},[858],{"type":37,"value":859},"    assert",{"type":32,"tag":171,"props":861,"children":862},{"style":184},[863],{"type":37,"value":726},{"type":32,"tag":171,"props":865,"children":867},{"class":173,"line":866},13,[868,873,878,882],{"type":32,"tag":171,"props":869,"children":870},{"style":184},[871],{"type":37,"value":872},"      - ",{"type":32,"tag":171,"props":874,"children":875},{"style":718},[876],{"type":37,"value":877},"type",{"type":32,"tag":171,"props":879,"children":880},{"style":184},[881],{"type":37,"value":539},{"type":32,"tag":171,"props":883,"children":884},{"style":208},[885],{"type":37,"value":886},"is-json\n",{"type":32,"tag":171,"props":888,"children":890},{"class":173,"line":889},14,[891,895,899,903],{"type":32,"tag":171,"props":892,"children":893},{"style":184},[894],{"type":37,"value":872},{"type":32,"tag":171,"props":896,"children":897},{"style":718},[898],{"type":37,"value":877},{"type":32,"tag":171,"props":900,"children":901},{"style":184},[902],{"type":37,"value":539},{"type":32,"tag":171,"props":904,"children":905},{"style":208},[906],{"type":37,"value":907},"javascript\n",{"type":32,"tag":171,"props":909,"children":911},{"class":173,"line":910},15,[912,917,921],{"type":32,"tag":171,"props":913,"children":914},{"style":718},[915],{"type":37,"value":916},"        value",{"type":32,"tag":171,"props":918,"children":919},{"style":184},[920],{"type":37,"value":539},{"type":32,"tag":171,"props":922,"children":923},{"style":208},[924],{"type":37,"value":925},"\"output.title.length \u003C= 60\"\n",{"type":32,"tag":171,"props":927,"children":929},{"class":173,"line":928},16,[930,934,938,942],{"type":32,"tag":171,"props":931,"children":932},{"style":184},[933],{"type":37,"value":872},{"type":32,"tag":171,"props":935,"children":936},{"style":718},[937],{"type":37,"value":877},{"type":32,"tag":171,"props":939,"children":940},{"style":184},[941],{"type":37,"value":539},{"type":32,"tag":171,"props":943,"children":944},{"style":208},[945],{"type":37,"value":946},"similar\n",{"type":32,"tag":171,"props":948,"children":950},{"class":173,"line":949},17,[951,955,959],{"type":32,"tag":171,"props":952,"children":953},{"style":718},[954],{"type":37,"value":916},{"type":32,"tag":171,"props":956,"children":957},{"style":184},[958],{"type":37,"value":539},{"type":32,"tag":171,"props":960,"children":961},{"style":208},[962],{"type":37,"value":963},"\"arquitectura de tracking server-side\"\n",{"type":32,"tag":171,"props":965,"children":967},{"class":173,"line":966},18,[968,973,977],{"type":32,"tag":171,"props":969,"children":970},{"style":718},[971],{"type":37,"value":972},"        threshold",{"type":32,"tag":171,"props":974,"children":975},{"style":184},[976],{"type":37,"value":539},{"type":32,"tag":171,"props":978,"children":979},{"style":274},[980],{"type":37,"value":981},"0.8\n",{"type":32,"tag":171,"props":983,"children":985},{"class":173,"line":984},19,[986,990,994,998],{"type":32,"tag":171,"props":987,"children":988},{"style":184},[989],{"type":37,"value":872},{"type":32,"tag":171,"props":991,"children":992},{"style":718},[993],{"type":37,"value":877},{"type":32,"tag":171,"props":995,"children":996},{"style":184},[997],{"type":37,"value":539},{"type":32,"tag":171,"props":999,"children":1000},{"style":208},[1001],{"type":37,"value":1002},"not-contains\n",{"type":32,"tag":171,"props":1004,"children":1006},{"class":173,"line":1005},20,[1007,1011,1015],{"type":32,"tag":171,"props":1008,"children":1009},{"style":718},[1010],{"type":37,"value":916},{"type":32,"tag":171,"props":1012,"children":1013},{"style":184},[1014],{"type":37,"value":539},{"type":32,"tag":171,"props":1016,"children":1017},{"style":208},[1018],{"type":37,"value":1019},"\"revolucionario\"\n",{"type":32,"tag":33,"props":1021,"children":1022},{},[1023,1025,1031],{"type":37,"value":1024},"Con ",{"type":32,"tag":62,"props":1026,"children":1028},{"className":1027},[],[1029],{"type":37,"value":1030},"promptfoo eval",{"type":37,"value":1032}," se prueban todas las variantes, devolviendo una tabla de métricas:",{"type":32,"tag":1034,"props":1035,"children":1036},"table",{},[1037,1066],{"type":32,"tag":1038,"props":1039,"children":1040},"thead",{},[1041],{"type":32,"tag":1042,"props":1043,"children":1044},"tr",{},[1045,1051,1056,1061],{"type":32,"tag":1046,"props":1047,"children":1048},"th",{},[1049],{"type":37,"value":1050},"Prompt",{"type":32,"tag":1046,"props":1052,"children":1053},{},[1054],{"type":37,"value":1055},"Pass Rate",{"type":32,"tag":1046,"props":1057,"children":1058},{},[1059],{"type":37,"value":1060},"Latencia Promedio",{"type":32,"tag":1046,"props":1062,"children":1063},{},[1064],{"type":37,"value":1065},"Costo",{"type":32,"tag":1067,"props":1068,"children":1069},"tbody",{},[1070,1094],{"type":32,"tag":1042,"props":1071,"children":1072},{},[1073,1079,1084,1089],{"type":32,"tag":1074,"props":1075,"children":1076},"td",{},[1077],{"type":37,"value":1078},"v1",{"type":32,"tag":1074,"props":1080,"children":1081},{},[1082],{"type":37,"value":1083},"92%",{"type":32,"tag":1074,"props":1085,"children":1086},{},[1087],{"type":37,"value":1088},"2.3s",{"type":32,"tag":1074,"props":1090,"children":1091},{},[1092],{"type":37,"value":1093},"$0.012",{"type":32,"tag":1042,"props":1095,"children":1096},{},[1097,1102,1107,1112],{"type":32,"tag":1074,"props":1098,"children":1099},{},[1100],{"type":37,"value":1101},"v2",{"type":32,"tag":1074,"props":1103,"children":1104},{},[1105],{"type":37,"value":1106},"98%",{"type":32,"tag":1074,"props":1108,"children":1109},{},[1110],{"type":37,"value":1111},"2.1s",{"type":32,"tag":1074,"props":1113,"children":1114},{},[1115],{"type":37,"value":1116},"$0.014",{"type":32,"tag":33,"props":1118,"children":1119},{},[1120],{"type":37,"value":1121},"v2 mejoró el pass rate pero el costo subió 17% — el token count está aumentando, hay que investigar. Sin ver este tradeoff, hubieras desplegado v2 y el presupuesto mensual se habría reventado.",{"type":32,"tag":45,"props":1123,"children":1125},{"id":1124},"ab-testing-comparar-variantes-de-prompts-en-producción",[1126],{"type":37,"value":1127},"A\u002FB Testing: Comparar Variantes de Prompts en Producción",{"type":32,"tag":33,"props":1129,"children":1130},{},[1131],{"type":37,"value":1132},"La suite de evaluación devolvió verde, ahora necesitas tráfico real. El A\u002FB testing en sistemas LLM se configura así:",{"type":32,"tag":1134,"props":1135,"children":1136},"ol",{},[1137,1147,1165,1175],{"type":32,"tag":88,"props":1138,"children":1139},{},[1140,1145],{"type":32,"tag":123,"props":1141,"children":1142},{},[1143],{"type":37,"value":1144},"Ruteo de variantes",{"type":37,"value":1146}," — selecciona la versión del prompt según el ID de usuario\u002Fsesión (split %)",{"type":32,"tag":88,"props":1148,"children":1149},{},[1150,1155,1157,1163],{"type":32,"tag":123,"props":1151,"children":1152},{},[1153],{"type":37,"value":1154},"Etiquetado de metadata",{"type":37,"value":1156}," — agrega ",{"type":32,"tag":62,"props":1158,"children":1160},{"className":1159},[],[1161],{"type":37,"value":1162},"prompt_version",{"type":37,"value":1164}," a cada llamada a API",{"type":32,"tag":88,"props":1166,"children":1167},{},[1168,1173],{"type":32,"tag":123,"props":1169,"children":1170},{},[1171],{"type":37,"value":1172},"Tracking de métricas",{"type":37,"value":1174}," — mantén la información de variante en los eventos downstream",{"type":32,"tag":88,"props":1176,"children":1177},{},[1178,1183],{"type":32,"tag":123,"props":1179,"children":1180},{},[1181],{"type":37,"value":1182},"Significancia estadística",{"type":37,"value":1184}," — cuando haya suficiente muestra (min 385 observaciones por variante, confianza 95%), toma una decisión",{"type":32,"tag":33,"props":1186,"children":1187},{},[1188],{"type":37,"value":1189},"Ejemplo en workflow n8n:",{"type":32,"tag":162,"props":1191,"children":1193},{"className":164,"code":1192,"language":158,"meta":16,"style":16},"\u002F\u002F Selección de variante A\u002FB\nconst userId = $json.user_id;\nconst variant = (userId % 100 \u003C 50) ? 'v1' : 'v2';\nconst promptUrl = `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${variant}.md`;\n\n\u002F\u002F Agrega metadata a la llamada a API\nreturn {\n  json: {\n    prompt: await fetch(promptUrl).then(r => r.text()),\n    metadata: {\n      prompt_version: variant,\n      experiment_id: 'blog_tone_test_2026_05'\n    }\n  }\n};\n",[1194],{"type":32,"tag":62,"props":1195,"children":1196},{"__ignoreMap":16},[1197,1206,1229,1300,1335,1342,1350,1363,1371,1428,1436,1444,1457,1465,1472],{"type":32,"tag":171,"props":1198,"children":1199},{"class":173,"line":174},[1200],{"type":32,"tag":171,"props":1201,"children":1203},{"style":1202},"--shiki-default:#6A737D",[1204],{"type":37,"value":1205},"\u002F\u002F Selección de variante A\u002FB\n",{"type":32,"tag":171,"props":1207,"children":1208},{"class":173,"line":190},[1209,1214,1219,1224],{"type":32,"tag":171,"props":1210,"children":1211},{"style":343},[1212],{"type":37,"value":1213},"const",{"type":32,"tag":171,"props":1215,"children":1216},{"style":274},[1217],{"type":37,"value":1218}," userId",{"type":32,"tag":171,"props":1220,"children":1221},{"style":343},[1222],{"type":37,"value":1223}," =",{"type":32,"tag":171,"props":1225,"children":1226},{"style":184},[1227],{"type":37,"value":1228}," $json.user_id;\n",{"type":32,"tag":171,"props":1230,"children":1231},{"class":173,"line":199},[1232,1236,1241,1245,1250,1255,1260,1265,1270,1275,1280,1285,1290,1295],{"type":32,"tag":171,"props":1233,"children":1234},{"style":343},[1235],{"type":37,"value":1213},{"type":32,"tag":171,"props":1237,"children":1238},{"style":274},[1239],{"type":37,"value":1240}," variant",{"type":32,"tag":171,"props":1242,"children":1243},{"style":343},[1244],{"type":37,"value":1223},{"type":32,"tag":171,"props":1246,"children":1247},{"style":184},[1248],{"type":37,"value":1249}," (userId ",{"type":32,"tag":171,"props":1251,"children":1252},{"style":343},[1253],{"type":37,"value":1254},"%",{"type":32,"tag":171,"props":1256,"children":1257},{"style":274},[1258],{"type":37,"value":1259}," 100",{"type":32,"tag":171,"props":1261,"children":1262},{"style":343},[1263],{"type":37,"value":1264}," \u003C",{"type":32,"tag":171,"props":1266,"children":1267},{"style":274},[1268],{"type":37,"value":1269}," 50",{"type":32,"tag":171,"props":1271,"children":1272},{"style":184},[1273],{"type":37,"value":1274},") ",{"type":32,"tag":171,"props":1276,"children":1277},{"style":343},[1278],{"type":37,"value":1279},"?",{"type":32,"tag":171,"props":1281,"children":1282},{"style":208},[1283],{"type":37,"value":1284}," 'v1'",{"type":32,"tag":171,"props":1286,"children":1287},{"style":343},[1288],{"type":37,"value":1289}," :",{"type":32,"tag":171,"props":1291,"children":1292},{"style":208},[1293],{"type":37,"value":1294}," 'v2'",{"type":32,"tag":171,"props":1296,"children":1297},{"style":184},[1298],{"type":37,"value":1299},";\n",{"type":32,"tag":171,"props":1301,"children":1302},{"class":173,"line":219},[1303,1307,1312,1316,1321,1326,1331],{"type":32,"tag":171,"props":1304,"children":1305},{"style":343},[1306],{"type":37,"value":1213},{"type":32,"tag":171,"props":1308,"children":1309},{"style":274},[1310],{"type":37,"value":1311}," promptUrl",{"type":32,"tag":171,"props":1313,"children":1314},{"style":343},[1315],{"type":37,"value":1223},{"type":32,"tag":171,"props":1317,"children":1318},{"style":208},[1319],{"type":37,"value":1320}," `https:\u002F\u002Fraw.githubusercontent.com\u002Froibase\u002Fprompts\u002Fmain\u002F${",{"type":32,"tag":171,"props":1322,"children":1323},{"style":184},[1324],{"type":37,"value":1325},"variant",{"type":32,"tag":171,"props":1327,"children":1328},{"style":208},[1329],{"type":37,"value":1330},"}.md`",{"type":32,"tag":171,"props":1332,"children":1333},{"style":184},[1334],{"type":37,"value":1299},{"type":32,"tag":171,"props":1336,"children":1337},{"class":173,"line":233},[1338],{"type":32,"tag":171,"props":1339,"children":1340},{"emptyLinePlaceholder":367},[1341],{"type":37,"value":370},{"type":32,"tag":171,"props":1343,"children":1344},{"class":173,"line":242},[1345],{"type":32,"tag":171,"props":1346,"children":1347},{"style":1202},[1348],{"type":37,"value":1349},"\u002F\u002F Agrega metadata a la llamada a API\n",{"type":32,"tag":171,"props":1351,"children":1352},{"class":173,"line":250},[1353,1358],{"type":32,"tag":171,"props":1354,"children":1355},{"style":343},[1356],{"type":37,"value":1357},"return",{"type":32,"tag":171,"props":1359,"children":1360},{"style":184},[1361],{"type":37,"value":1362}," {\n",{"type":32,"tag":171,"props":1364,"children":1365},{"class":173,"line":26},[1366],{"type":32,"tag":171,"props":1367,"children":1368},{"style":184},[1369],{"type":37,"value":1370},"  json: {\n",{"type":32,"tag":171,"props":1372,"children":1373},{"class":173,"line":280},[1374,1379,1384,1389,1394,1399,1404,1409,1414,1419,1423],{"type":32,"tag":171,"props":1375,"children":1376},{"style":184},[1377],{"type":37,"value":1378},"    prompt: ",{"type":32,"tag":171,"props":1380,"children":1381},{"style":343},[1382],{"type":37,"value":1383},"await",{"type":32,"tag":171,"props":1385,"children":1386},{"style":178},[1387],{"type":37,"value":1388}," fetch",{"type":32,"tag":171,"props":1390,"children":1391},{"style":184},[1392],{"type":37,"value":1393},"(promptUrl).",{"type":32,"tag":171,"props":1395,"children":1396},{"style":178},[1397],{"type":37,"value":1398},"then",{"type":32,"tag":171,"props":1400,"children":1401},{"style":184},[1402],{"type":37,"value":1403},"(",{"type":32,"tag":171,"props":1405,"children":1406},{"style":599},[1407],{"type":37,"value":1408},"r",{"type":32,"tag":171,"props":1410,"children":1411},{"style":343},[1412],{"type":37,"value":1413}," =>",{"type":32,"tag":171,"props":1415,"children":1416},{"style":184},[1417],{"type":37,"value":1418}," r.",{"type":32,"tag":171,"props":1420,"children":1421},{"style":178},[1422],{"type":37,"value":37},{"type":32,"tag":171,"props":1424,"children":1425},{"style":184},[1426],{"type":37,"value":1427},"()),\n",{"type":32,"tag":171,"props":1429,"children":1430},{"class":173,"line":289},[1431],{"type":32,"tag":171,"props":1432,"children":1433},{"style":184},[1434],{"type":37,"value":1435},"    metadata: {\n",{"type":32,"tag":171,"props":1437,"children":1438},{"class":173,"line":618},[1439],{"type":32,"tag":171,"props":1440,"children":1441},{"style":184},[1442],{"type":37,"value":1443},"      prompt_version: variant,\n",{"type":32,"tag":171,"props":1445,"children":1446},{"class":173,"line":636},[1447,1452],{"type":32,"tag":171,"props":1448,"children":1449},{"style":184},[1450],{"type":37,"value":1451},"      experiment_id: ",{"type":32,"tag":171,"props":1453,"children":1454},{"style":208},[1455],{"type":37,"value":1456},"'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1458,"children":1459},{"class":173,"line":866},[1460],{"type":32,"tag":171,"props":1461,"children":1462},{"style":184},[1463],{"type":37,"value":1464},"    }\n",{"type":32,"tag":171,"props":1466,"children":1467},{"class":173,"line":889},[1468],{"type":32,"tag":171,"props":1469,"children":1470},{"style":184},[1471],{"type":37,"value":286},{"type":32,"tag":171,"props":1473,"children":1474},{"class":173,"line":910},[1475],{"type":32,"tag":171,"props":1476,"children":1477},{"style":184},[1478],{"type":37,"value":1479},"};\n",{"type":32,"tag":33,"props":1481,"children":1482},{},[1483],{"type":37,"value":1484},"Análisis en BigQuery:",{"type":32,"tag":162,"props":1486,"children":1490},{"className":1487,"code":1488,"language":1489,"meta":16,"style":16},"language-sql shiki shiki-themes github-dark","SELECT\n  metadata.value:prompt_version AS variant,\n  COUNT(DISTINCT user_id) AS users,\n  AVG(session_duration_sec) AS avg_duration,\n  SUM(conversion) \u002F COUNT(*) AS cvr\nFROM events\nWHERE experiment_id = 'blog_tone_test_2026_05'\n  AND event_date >= '2026-05-01'\nGROUP BY 1\n","sql",[1491],{"type":32,"tag":62,"props":1492,"children":1493},{"__ignoreMap":16},[1494,1502,1535,1566,1588,1633,1646,1668,1691],{"type":32,"tag":171,"props":1495,"children":1496},{"class":173,"line":174},[1497],{"type":32,"tag":171,"props":1498,"children":1499},{"style":343},[1500],{"type":37,"value":1501},"SELECT\n",{"type":32,"tag":171,"props":1503,"children":1504},{"class":173,"line":190},[1505,1510,1515,1520,1525,1530],{"type":32,"tag":171,"props":1506,"children":1507},{"style":274},[1508],{"type":37,"value":1509},"  metadata",{"type":32,"tag":171,"props":1511,"children":1512},{"style":184},[1513],{"type":37,"value":1514},".",{"type":32,"tag":171,"props":1516,"children":1517},{"style":274},[1518],{"type":37,"value":1519},"value",{"type":32,"tag":171,"props":1521,"children":1522},{"style":184},[1523],{"type":37,"value":1524},":prompt_version ",{"type":32,"tag":171,"props":1526,"children":1527},{"style":343},[1528],{"type":37,"value":1529},"AS",{"type":32,"tag":171,"props":1531,"children":1532},{"style":184},[1533],{"type":37,"value":1534}," variant,\n",{"type":32,"tag":171,"props":1536,"children":1537},{"class":173,"line":199},[1538,1543,1547,1552,1557,1561],{"type":32,"tag":171,"props":1539,"children":1540},{"style":274},[1541],{"type":37,"value":1542},"  COUNT",{"type":32,"tag":171,"props":1544,"children":1545},{"style":184},[1546],{"type":37,"value":1403},{"type":32,"tag":171,"props":1548,"children":1549},{"style":343},[1550],{"type":37,"value":1551},"DISTINCT",{"type":32,"tag":171,"props":1553,"children":1554},{"style":184},[1555],{"type":37,"value":1556}," user_id) ",{"type":32,"tag":171,"props":1558,"children":1559},{"style":343},[1560],{"type":37,"value":1529},{"type":32,"tag":171,"props":1562,"children":1563},{"style":184},[1564],{"type":37,"value":1565}," users,\n",{"type":32,"tag":171,"props":1567,"children":1568},{"class":173,"line":219},[1569,1574,1579,1583],{"type":32,"tag":171,"props":1570,"children":1571},{"style":274},[1572],{"type":37,"value":1573},"  AVG",{"type":32,"tag":171,"props":1575,"children":1576},{"style":184},[1577],{"type":37,"value":1578},"(session_duration_sec) ",{"type":32,"tag":171,"props":1580,"children":1581},{"style":343},[1582],{"type":37,"value":1529},{"type":32,"tag":171,"props":1584,"children":1585},{"style":184},[1586],{"type":37,"value":1587}," avg_duration,\n",{"type":32,"tag":171,"props":1589,"children":1590},{"class":173,"line":233},[1591,1596,1601,1606,1611,1615,1620,1624,1628],{"type":32,"tag":171,"props":1592,"children":1593},{"style":274},[1594],{"type":37,"value":1595},"  SUM",{"type":32,"tag":171,"props":1597,"children":1598},{"style":184},[1599],{"type":37,"value":1600},"(conversion) ",{"type":32,"tag":171,"props":1602,"children":1603},{"style":343},[1604],{"type":37,"value":1605},"\u002F",{"type":32,"tag":171,"props":1607,"children":1608},{"style":274},[1609],{"type":37,"value":1610}," COUNT",{"type":32,"tag":171,"props":1612,"children":1613},{"style":184},[1614],{"type":37,"value":1403},{"type":32,"tag":171,"props":1616,"children":1617},{"style":343},[1618],{"type":37,"value":1619},"*",{"type":32,"tag":171,"props":1621,"children":1622},{"style":184},[1623],{"type":37,"value":1274},{"type":32,"tag":171,"props":1625,"children":1626},{"style":343},[1627],{"type":37,"value":1529},{"type":32,"tag":171,"props":1629,"children":1630},{"style":184},[1631],{"type":37,"value":1632}," cvr\n",{"type":32,"tag":171,"props":1634,"children":1635},{"class":173,"line":242},[1636,1641],{"type":32,"tag":171,"props":1637,"children":1638},{"style":343},[1639],{"type":37,"value":1640},"FROM",{"type":32,"tag":171,"props":1642,"children":1643},{"style":184},[1644],{"type":37,"value":1645}," events\n",{"type":32,"tag":171,"props":1647,"children":1648},{"class":173,"line":250},[1649,1654,1659,1663],{"type":32,"tag":171,"props":1650,"children":1651},{"style":343},[1652],{"type":37,"value":1653},"WHERE",{"type":32,"tag":171,"props":1655,"children":1656},{"style":184},[1657],{"type":37,"value":1658}," experiment_id ",{"type":32,"tag":171,"props":1660,"children":1661},{"style":343},[1662],{"type":37,"value":401},{"type":32,"tag":171,"props":1664,"children":1665},{"style":208},[1666],{"type":37,"value":1667}," 'blog_tone_test_2026_05'\n",{"type":32,"tag":171,"props":1669,"children":1670},{"class":173,"line":26},[1671,1676,1681,1686],{"type":32,"tag":171,"props":1672,"children":1673},{"style":343},[1674],{"type":37,"value":1675},"  AND",{"type":32,"tag":171,"props":1677,"children":1678},{"style":184},[1679],{"type":37,"value":1680}," event_date ",{"type":32,"tag":171,"props":1682,"children":1683},{"style":343},[1684],{"type":37,"value":1685},">=",{"type":32,"tag":171,"props":1687,"children":1688},{"style":208},[1689],{"type":37,"value":1690}," '2026-05-01'\n",{"type":32,"tag":171,"props":1692,"children":1693},{"class":173,"line":280},[1694,1699],{"type":32,"tag":171,"props":1695,"children":1696},{"style":343},[1697],{"type":37,"value":1698},"GROUP BY",{"type":32,"tag":171,"props":1700,"children":1701},{"style":274},[1702],{"type":37,"value":1703}," 1\n",{"type":32,"tag":33,"props":1705,"children":1706},{},[1707],{"type":37,"value":1708},"Resultado: la variante v2 aumentó el CVR de 0.042 a 0.051 (+21%), p-value 0.003 — confianza para llevar a producción.",{"type":32,"tag":45,"props":1710,"children":1712},{"id":1711},"langsmith-observabilidad-y-detección-de-regresiones-a-largo-plazo",[1713],{"type":37,"value":1714},"LangSmith: Observabilidad y Detección de Regresiones a Largo Plazo",{"type":32,"tag":33,"props":1716,"children":1717},{},[1718],{"type":37,"value":1719},"Promptfoo es testing local, LangSmith es observabilidad en producción. Cada llamada a LLM queda registrada: input, output, latencia, token count, versión del modelo, versión del prompt.",{"type":32,"tag":33,"props":1721,"children":1722},{},[1723,1725,1730],{"type":37,"value":1724},"La ventaja de LangSmith es el ",{"type":32,"tag":123,"props":1726,"children":1727},{},[1728],{"type":37,"value":1729},"tracking de métricas a largo plazo",{"type":37,"value":1731},". Un bug en la versión del prompt de hace 3 meses se descubre hoy por feedback — vuelves a la traza, ves el diff input\u002Foutput, encuentras qué versión era ese día, y haces rollback.",{"type":32,"tag":33,"props":1733,"children":1734},{},[1735],{"type":37,"value":1736},"Ejemplo de traza:",{"type":32,"tag":162,"props":1738,"children":1742},{"className":1739,"code":1740,"language":1741,"meta":16,"style":16},"language-json shiki shiki-themes github-dark","{\n  \"run_id\": \"abc123\",\n  \"prompt_version\": \"v2.1\",\n  \"model\": \"claude-3-5-sonnet-20241022\",\n  \"input\": {\"topic\": \"Server-side GTM\", \"category\": \"tech\"},\n  \"output\": \"---\\ntitle: \\\"Server-Side GTM...\\\"\",\n  \"latency_ms\": 2341,\n  \"tokens\": {\"input\": 1842, \"output\": 1523},\n  \"cost_usd\": 0.0137,\n  \"feedback\": {\"score\": 4, \"comment\": \"el título es demasiado largo\"}\n}\n","json",[1743],{"type":32,"tag":62,"props":1744,"children":1745},{"__ignoreMap":16},[1746,1754,1775,1796,1817,1867,1917,1938,1986,2007,2055],{"type":32,"tag":171,"props":1747,"children":1748},{"class":173,"line":174},[1749],{"type":32,"tag":171,"props":1750,"children":1751},{"style":184},[1752],{"type":37,"value":1753},"{\n",{"type":32,"tag":171,"props":1755,"children":1756},{"class":173,"line":190},[1757,1762,1766,1771],{"type":32,"tag":171,"props":1758,"children":1759},{"style":274},[1760],{"type":37,"value":1761},"  \"run_id\"",{"type":32,"tag":171,"props":1763,"children":1764},{"style":184},[1765],{"type":37,"value":539},{"type":32,"tag":171,"props":1767,"children":1768},{"style":208},[1769],{"type":37,"value":1770},"\"abc123\"",{"type":32,"tag":171,"props":1772,"children":1773},{"style":184},[1774],{"type":37,"value":216},{"type":32,"tag":171,"props":1776,"children":1777},{"class":173,"line":199},[1778,1783,1787,1792],{"type":32,"tag":171,"props":1779,"children":1780},{"style":274},[1781],{"type":37,"value":1782},"  \"prompt_version\"",{"type":32,"tag":171,"props":1784,"children":1785},{"style":184},[1786],{"type":37,"value":539},{"type":32,"tag":171,"props":1788,"children":1789},{"style":208},[1790],{"type":37,"value":1791},"\"v2.1\"",{"type":32,"tag":171,"props":1793,"children":1794},{"style":184},[1795],{"type":37,"value":216},{"type":32,"tag":171,"props":1797,"children":1798},{"class":173,"line":219},[1799,1804,1808,1813],{"type":32,"tag":171,"props":1800,"children":1801},{"style":274},[1802],{"type":37,"value":1803},"  \"model\"",{"type":32,"tag":171,"props":1805,"children":1806},{"style":184},[1807],{"type":37,"value":539},{"type":32,"tag":171,"props":1809,"children":1810},{"style":208},[1811],{"type":37,"value":1812},"\"claude-3-5-sonnet-20241022\"",{"type":32,"tag":171,"props":1814,"children":1815},{"style":184},[1816],{"type":37,"value":216},{"type":32,"tag":171,"props":1818,"children":1819},{"class":173,"line":233},[1820,1825,1830,1835,1839,1844,1848,1853,1857,1862],{"type":32,"tag":171,"props":1821,"children":1822},{"style":274},[1823],{"type":37,"value":1824},"  \"input\"",{"type":32,"tag":171,"props":1826,"children":1827},{"style":184},[1828],{"type":37,"value":1829},": {",{"type":32,"tag":171,"props":1831,"children":1832},{"style":274},[1833],{"type":37,"value":1834},"\"topic\"",{"type":32,"tag":171,"props":1836,"children":1837},{"style":184},[1838],{"type":37,"value":539},{"type":32,"tag":171,"props":1840,"children":1841},{"style":208},[1842],{"type":37,"value":1843},"\"Server-side GTM\"",{"type":32,"tag":171,"props":1845,"children":1846},{"style":184},[1847],{"type":37,"value":416},{"type":32,"tag":171,"props":1849,"children":1850},{"style":274},[1851],{"type":37,"value":1852},"\"category\"",{"type":32,"tag":171,"props":1854,"children":1855},{"style":184},[1856],{"type":37,"value":539},{"type":32,"tag":171,"props":1858,"children":1859},{"style":208},[1860],{"type":37,"value":1861},"\"tech\"",{"type":32,"tag":171,"props":1863,"children":1864},{"style":184},[1865],{"type":37,"value":1866},"},\n",{"type":32,"tag":171,"props":1868,"children":1869},{"class":173,"line":242},[1870,1875,1879,1884,1889,1894,1899,1904,1908,1913],{"type":32,"tag":171,"props":1871,"children":1872},{"style":274},[1873],{"type":37,"value":1874},"  \"output\"",{"type":32,"tag":171,"props":1876,"children":1877},{"style":184},[1878],{"type":37,"value":539},{"type":32,"tag":171,"props":1880,"children":1881},{"style":208},[1882],{"type":37,"value":1883},"\"---",{"type":32,"tag":171,"props":1885,"children":1886},{"style":274},[1887],{"type":37,"value":1888},"\\n",{"type":32,"tag":171,"props":1890,"children":1891},{"style":208},[1892],{"type":37,"value":1893},"title: ",{"type":32,"tag":171,"props":1895,"children":1896},{"style":274},[1897],{"type":37,"value":1898},"\\\"",{"type":32,"tag":171,"props":1900,"children":1901},{"style":208},[1902],{"type":37,"value":1903},"Server-Side GTM...",{"type":32,"tag":171,"props":1905,"children":1906},{"style":274},[1907],{"type":37,"value":1898},{"type":32,"tag":171,"props":1909,"children":1910},{"style":208},[1911],{"type":37,"value":1912},"\"",{"type":32,"tag":171,"props":1914,"children":1915},{"style":184},[1916],{"type":37,"value":216},{"type":32,"tag":171,"props":1918,"children":1919},{"class":173,"line":250},[1920,1925,1929,1934],{"type":32,"tag":171,"props":1921,"children":1922},{"style":274},[1923],{"type":37,"value":1924},"  \"latency_ms\"",{"type":32,"tag":171,"props":1926,"children":1927},{"style":184},[1928],{"type":37,"value":539},{"type":32,"tag":171,"props":1930,"children":1931},{"style":274},[1932],{"type":37,"value":1933},"2341",{"type":32,"tag":171,"props":1935,"children":1936},{"style":184},[1937],{"type":37,"value":216},{"type":32,"tag":171,"props":1939,"children":1940},{"class":173,"line":26},[1941,1946,1950,1955,1959,1964,1968,1973,1977,1982],{"type":32,"tag":171,"props":1942,"children":1943},{"style":274},[1944],{"type":37,"value":1945},"  \"tokens\"",{"type":32,"tag":171,"props":1947,"children":1948},{"style":184},[1949],{"type":37,"value":1829},{"type":32,"tag":171,"props":1951,"children":1952},{"style":274},[1953],{"type":37,"value":1954},"\"input\"",{"type":32,"tag":171,"props":1956,"children":1957},{"style":184},[1958],{"type":37,"value":539},{"type":32,"tag":171,"props":1960,"children":1961},{"style":274},[1962],{"type":37,"value":1963},"1842",{"type":32,"tag":171,"props":1965,"children":1966},{"style":184},[1967],{"type":37,"value":416},{"type":32,"tag":171,"props":1969,"children":1970},{"style":274},[1971],{"type":37,"value":1972},"\"output\"",{"type":32,"tag":171,"props":1974,"children":1975},{"style":184},[1976],{"type":37,"value":539},{"type":32,"tag":171,"props":1978,"children":1979},{"style":274},[1980],{"type":37,"value":1981},"1523",{"type":32,"tag":171,"props":1983,"children":1984},{"style":184},[1985],{"type":37,"value":1866},{"type":32,"tag":171,"props":1987,"children":1988},{"class":173,"line":280},[1989,1994,1998,2003],{"type":32,"tag":171,"props":1990,"children":1991},{"style":274},[1992],{"type":37,"value":1993},"  \"cost_usd\"",{"type":32,"tag":171,"props":1995,"children":1996},{"style":184},[1997],{"type":37,"value":539},{"type":32,"tag":171,"props":1999,"children":2000},{"style":274},[2001],{"type":37,"value":2002},"0.0137",{"type":32,"tag":171,"props":2004,"children":2005},{"style":184},[2006],{"type":37,"value":216},{"type":32,"tag":171,"props":2008,"children":2009},{"class":173,"line":289},[2010,2015,2019,2023,2027,2032,2036,2041,2045,2050],{"type":32,"tag":171,"props":2011,"children":2012},{"style":274},[2013],{"type":37,"value":2014},"  \"feedback\"",{"type":32,"tag":171,"props":2016,"children":2017},{"style":184},[2018],{"type":37,"value":1829},{"type":32,"tag":171,"props":2020,"children":2021},{"style":274},[2022],{"type":37,"value":534},{"type":32,"tag":171,"props":2024,"children":2025},{"style":184},[2026],{"type":37,"value":539},{"type":32,"tag":171,"props":2028,"children":2029},{"style":274},[2030],{"type":37,"value":2031},"4",{"type":32,"tag":171,"props":2033,"children":2034},{"style":184},[2035],{"type":37,"value":416},{"type":32,"tag":171,"props":2037,"children":2038},{"style":274},[2039],{"type":37,"value":2040},"\"comment\"",{"type":32,"tag":171,"props":2042,"children":2043},{"style":184},[2044],{"type":37,"value":539},{"type":32,"tag":171,"props":2046,"children":2047},{"style":208},[2048],{"type":37,"value":2049},"\"el título es demasiado largo\"",{"type":32,"tag":171,"props":2051,"children":2052},{"style":184},[2053],{"type":37,"value":2054},"}\n",{"type":32,"tag":171,"props":2056,"children":2057},{"class":173,"line":618},[2058],{"type":32,"tag":171,"props":2059,"children":2060},{"style":184},[2061],{"type":37,"value":2054},{"type":32,"tag":33,"props":2063,"children":2064},{},[2065],{"type":37,"value":2066},"Loop de feedback: cada editor asigna 1-5 puntos a cada blog, LangSmith vincula esos puntajes a las trazas, el reporte semanal alerta \"la versión v2.3 bajó el score promedio a 3.2\". Rollback inmediato → ves el diff del prompt → identificas el problema → lo arreglas.",{"type":32,"tag":688,"props":2068,"children":2070},{"id":2069},"gestión-de-datasets-mantener-el-golden-set-bajo-control-de-versiones",[2071],{"type":37,"value":2072},"Gestión de Datasets: Mantener el Golden Set Bajo Control de Versiones",{"type":32,"tag":33,"props":2074,"children":2075},{},[2076,2078,2083],{"type":37,"value":2077},"El corazón del pipeline de evaluación es el ",{"type":32,"tag":123,"props":2079,"children":2080},{},[2081],{"type":37,"value":2082},"golden dataset",{"type":37,"value":2084}," — pares conocidos de input\u002Foutput, la referencia del comportamiento esperado. Mantener este dataset en Notion, actualizarlo manualmente en Google Sheets es riesgo de regresión.",{"type":32,"tag":33,"props":2086,"children":2087},{},[2088],{"type":37,"value":2089},"El dataset de LangSmith se controla por versión:",{"type":32,"tag":162,"props":2091,"children":2093},{"className":331,"code":2092,"language":333,"meta":16,"style":16},"from langsmith import Client\n\nclient = Client()\n\ndataset = client.create_dataset(\"marketing_blog_golden_v3\")\n\n# Agregar ejemplos golden\nexamples = [\n    {\n        \"inputs\": {\"topic\": \"Server-side GTM\", \"category\": \"tech\"},\n        \"outputs\": {\"title\": \"Server-Side GTM: Medición Post-Cookie\"},\n        \"metadata\": {\"expected_h2_count\": 5, \"expected_word_count\": 1500}\n    },\n    # 50+ ejemplos...\n]\n\nfor ex in examples:\n    client.create_example(**ex, dataset_id=dataset.id)\n",[2094],{"type":32,"tag":62,"props":2095,"children":2096},{"__ignoreMap":16},[2097,2117,2124,2141,2148,2174,2181,2189,2206,2214,2258,2288,2336,2344,2352,2359,2366,2387],{"type":32,"tag":171,"props":2098,"children":2099},{"class":173,"line":174},[2100,2104,2108,2112],{"type":32,"tag":171,"props":2101,"children":2102},{"style":343},[2103],{"type":37,"value":346},{"type":32,"tag":171,"props":2105,"children":2106},{"style":184},[2107],{"type":37,"value":351},{"type":32,"tag":171,"props":2109,"children":2110},{"style":343},[2111],{"type":37,"value":356},{"type":32,"tag":171,"props":2113,"children":2114},{"style":184},[2115],{"type":37,"value":2116}," Client\n",{"type":32,"tag":171,"props":2118,"children":2119},{"class":173,"line":190},[2120],{"type":32,"tag":171,"props":2121,"children":2122},{"emptyLinePlaceholder":367},[2123],{"type":37,"value":370},{"type":32,"tag":171,"props":2125,"children":2126},{"class":173,"line":199},[2127,2132,2136],{"type":32,"tag":171,"props":2128,"children":2129},{"style":184},[2130],{"type":37,"value":2131},"client ",{"type":32,"tag":171,"props":2133,"children":2134},{"style":343},[2135],{"type":37,"value":401},{"type":32,"tag":171,"props":2137,"children":2138},{"style":184},[2139],{"type":37,"value":2140}," Client()\n",{"type":32,"tag":171,"props":2142,"children":2143},{"class":173,"line":219},[2144],{"type":32,"tag":171,"props":2145,"children":2146},{"emptyLinePlaceholder":367},[2147],{"type":37,"value":370},{"type":32,"tag":171,"props":2149,"children":2150},{"class":173,"line":233},[2151,2156,2160,2165,2170],{"type":32,"tag":171,"props":2152,"children":2153},{"style":184},[2154],{"type":37,"value":2155},"dataset ",{"type":32,"tag":171,"props":2157,"children":2158},{"style":343},[2159],{"type":37,"value":401},{"type":32,"tag":171,"props":2161,"children":2162},{"style":184},[2163],{"type":37,"value":2164}," client.create_dataset(",{"type":32,"tag":171,"props":2166,"children":2167},{"style":208},[2168],{"type":37,"value":2169},"\"marketing_blog_golden_v3\"",{"type":32,"tag":171,"props":2171,"children":2172},{"style":184},[2173],{"type":37,"value":642},{"type":32,"tag":171,"props":2175,"children":2176},{"class":173,"line":242},[2177],{"type":32,"tag":171,"props":2178,"children":2179},{"emptyLinePlaceholder":367},[2180],{"type":37,"value":370},{"type":32,"tag":171,"props":2182,"children":2183},{"class":173,"line":250},[2184],{"type":32,"tag":171,"props":2185,"children":2186},{"style":1202},[2187],{"type":37,"value":2188},"# Agregar ejemplos golden\n",{"type":32,"tag":171,"props":2190,"children":2191},{"class":173,"line":26},[2192,2197,2201],{"type":32,"tag":171,"props":2193,"children":2194},{"style":184},[2195],{"type":37,"value":2196},"examples ",{"type":32,"tag":171,"props":2198,"children":2199},{"style":343},[2200],{"type":37,"value":401},{"type":32,"tag":171,"props":2202,"children":2203},{"style":184},[2204],{"type":37,"value":2205}," [\n",{"type":32,"tag":171,"props":2207,"children":2208},{"class":173,"line":280},[2209],{"type":32,"tag":171,"props":2210,"children":2211},{"style":184},[2212],{"type":37,"value":2213},"    {\n",{"type":32,"tag":171,"props":2215,"children":2216},{"class":173,"line":289},[2217,2222,2226,2230,2234,2238,2242,2246,2250,2254],{"type":32,"tag":171,"props":2218,"children":2219},{"style":208},[2220],{"type":37,"value":2221},"        \"inputs\"",{"type":32,"tag":171,"props":2223,"children":2224},{"style":184},[2225],{"type":37,"value":1829},{"type":32,"tag":171,"props":2227,"children":2228},{"style":208},[2229],{"type":37,"value":1834},{"type":32,"tag":171,"props":2231,"children":2232},{"style":184},[2233],{"type":37,"value":539},{"type":32,"tag":171,"props":2235,"children":2236},{"style":208},[2237],{"type":37,"value":1843},{"type":32,"tag":171,"props":2239,"children":2240},{"style":184},[2241],{"type":37,"value":416},{"type":32,"tag":171,"props":2243,"children":2244},{"style":208},[2245],{"type":37,"value":1852},{"type":32,"tag":171,"props":2247,"children":2248},{"style":184},[2249],{"type":37,"value":539},{"type":32,"tag":171,"props":2251,"children":2252},{"style":208},[2253],{"type":37,"value":1861},{"type":32,"tag":171,"props":2255,"children":2256},{"style":184},[2257],{"type":37,"value":1866},{"type":32,"tag":171,"props":2259,"children":2260},{"class":173,"line":618},[2261,2266,2270,2275,2279,2284],{"type":32,"tag":171,"props":2262,"children":2263},{"style":208},[2264],{"type":37,"value":2265},"        \"outputs\"",{"type":32,"tag":171,"props":2267,"children":2268},{"style":184},[2269],{"type":37,"value":1829},{"type":32,"tag":171,"props":2271,"children":2272},{"style":208},[2273],{"type":37,"value":2274},"\"title\"",{"type":32,"tag":171,"props":2276,"children":2277},{"style":184},[2278],{"type":37,"value":539},{"type":32,"tag":171,"props":2280,"children":2281},{"style":208},[2282],{"type":37,"value":2283},"\"Server-Side GTM: Medición Post-Cookie\"",{"type":32,"tag":171,"props":2285,"children":2286},{"style":184},[2287],{"type":37,"value":1866},{"type":32,"tag":171,"props":2289,"children":2290},{"class":173,"line":636},[2291,2296,2300,2305,2309,2314,2318,2323,2327,2332],{"type":32,"tag":171,"props":2292,"children":2293},{"style":208},[2294],{"type":37,"value":2295},"        \"metadata\"",{"type":32,"tag":171,"props":2297,"children":2298},{"style":184},[2299],{"type":37,"value":1829},{"type":32,"tag":171,"props":2301,"children":2302},{"style":208},[2303],{"type":37,"value":2304},"\"expected_h2_count\"",{"type":32,"tag":171,"props":2306,"children":2307},{"style":184},[2308],{"type":37,"value":539},{"type":32,"tag":171,"props":2310,"children":2311},{"style":274},[2312],{"type":37,"value":2313},"5",{"type":32,"tag":171,"props":2315,"children":2316},{"style":184},[2317],{"type":37,"value":416},{"type":32,"tag":171,"props":2319,"children":2320},{"style":208},[2321],{"type":37,"value":2322},"\"expected_word_count\"",{"type":32,"tag":171,"props":2324,"children":2325},{"style":184},[2326],{"type":37,"value":539},{"type":32,"tag":171,"props":2328,"children":2329},{"style":274},[2330],{"type":37,"value":2331},"1500",{"type":32,"tag":171,"props":2333,"children":2334},{"style":184},[2335],{"type":37,"value":2054},{"type":32,"tag":171,"props":2337,"children":2338},{"class":173,"line":866},[2339],{"type":32,"tag":171,"props":2340,"children":2341},{"style":184},[2342],{"type":37,"value":2343},"    },\n",{"type":32,"tag":171,"props":2345,"children":2346},{"class":173,"line":889},[2347],{"type":32,"tag":171,"props":2348,"children":2349},{"style":1202},[2350],{"type":37,"value":2351},"    # 50+ ejemplos...\n",{"type":32,"tag":171,"props":2353,"children":2354},{"class":173,"line":910},[2355],{"type":32,"tag":171,"props":2356,"children":2357},{"style":184},[2358],{"type":37,"value":295},{"type":32,"tag":171,"props":2360,"children":2361},{"class":173,"line":928},[2362],{"type":32,"tag":171,"props":2363,"children":2364},{"emptyLinePlaceholder":367},[2365],{"type":37,"value":370},{"type":32,"tag":171,"props":2367,"children":2368},{"class":173,"line":949},[2369,2373,2378,2382],{"type":32,"tag":171,"props":2370,"children":2371},{"style":343},[2372],{"type":37,"value":483},{"type":32,"tag":171,"props":2374,"children":2375},{"style":184},[2376],{"type":37,"value":2377}," ex ",{"type":32,"tag":171,"props":2379,"children":2380},{"style":343},[2381],{"type":37,"value":493},{"type":32,"tag":171,"props":2383,"children":2384},{"style":184},[2385],{"type":37,"value":2386}," examples:\n",{"type":32,"tag":171,"props":2388,"children":2389},{"class":173,"line":966},[2390,2395,2400,2405,2410,2414],{"type":32,"tag":171,"props":2391,"children":2392},{"style":184},[2393],{"type":37,"value":2394},"    client.create_example(",{"type":32,"tag":171,"props":2396,"children":2397},{"style":343},[2398],{"type":37,"value":2399},"**",{"type":32,"tag":171,"props":2401,"children":2402},{"style":184},[2403],{"type":37,"value":2404},"ex, ",{"type":32,"tag":171,"props":2406,"children":2407},{"style":599},[2408],{"type":37,"value":2409},"dataset_id",{"type":32,"tag":171,"props":2411,"children":2412},{"style":343},[2413],{"type":37,"value":401},{"type":32,"tag":171,"props":2415,"children":2416},{"style":184},[2417],{"type":37,"value":2418},"dataset.id)\n",{"type":32,"tag":33,"props":2420,"children":2421},{},[2422],{"type":37,"value":2423},"Antes de cada cambio de prompt, prueba contra este dataset. Si el pass rate baja, no despliegues. Agrega nuevo edge case al dataset (bugs que encuentras en producción) para evitar regresiones.",{"type":32,"tag":45,"props":2425,"children":2427},{"id":2426},"tradeoff-métricas-determinísticas-vs-output-creativo",[2428],{"type":37,"value":2429},"Tradeoff: Métricas Determinísticas vs Output Creativo",{"type":32,"tag":33,"props":2431,"children":2432},{},[2433],{"type":37,"value":2434},"La fortaleza del LLM es ser no-determinístico — mismo input, output diferente. Pero en sistemas production eso es un riesgo: el usuario ve diferente markdown cada vez que recarga, algunos con errores.",{"type":32,"tag":33,"props":2436,"children":2437},{},[2438],{"type":37,"value":2439},"Menor temperatura da más determinismo pero output menos creativo. El tradeoff es:",{"type":32,"tag":84,"props":2441,"children":2442},{},[2443,2453,2463],{"type":32,"tag":88,"props":2444,"children":2445},{},[2446,2451],{"type":32,"tag":123,"props":2447,"children":2448},{},[2449],{"type":37,"value":2450},"Temperatura 0",{"type":37,"value":2452},": ideal para eval suite, output monótono en producción",{"type":32,"tag":88,"props":2454,"children":2455},{},[2456,2461],{"type":32,"tag":123,"props":2457,"children":2458},{},[2459],{"type":37,"value":2460},"Temperatura 0.3-0.5",{"type":37,"value":2462},": variedad razonable, aún consistente",{"type":32,"tag":88,"props":2464,"children":2465},{},[2466,2471],{"type":32,"tag":123,"props":2467,"children":2468},{},[2469],{"type":37,"value":2470},"Temperatura 0.7+",{"type":37,"value":2472},": creativo pero sorpresas en producción incluso si la suite pasó",{"type":32,"tag":33,"props":2474,"children":2475},{},[2476],{"type":37,"value":2477},"Solución: temperatura 0 en eval, 0.4 en producción, golden set con 5 outputs aceptables por cada input (control de rango).",{"type":32,"tag":33,"props":2479,"children":2480},{},[2481,2483,2488],{"type":37,"value":2482},"Otro tradeoff: ",{"type":32,"tag":123,"props":2484,"children":2485},{},[2486],{"type":37,"value":2487},"latencia vs calidad",{"type":37,"value":2489},". Prompts más largos dan mejor output pero el costo de input tokens sube, latencia aumenta. En Promptfoo, si la latencia excede 2.5s, disparar alerta — la experiencia del usuario se degrada.",{"type":32,"tag":45,"props":2491,"children":2493},{"id":2492},"checklist-de-producción-antes-de-desplegar-un-sistema-llm",[2494],{"type":37,"value":2495},"Checklist de Producción: Antes de Desplegar un Sistema LLM",{"type":32,"tag":33,"props":2497,"children":2498},{},[2499],{"type":37,"value":2500},"Lista de verificación previa al despliegue:",{"type":32,"tag":84,"props":2502,"children":2505},{"className":2503},[2504],"contains-task-list",[2506,2518,2527,2536,2545,2554,2563,2572,2581],{"type":32,"tag":88,"props":2507,"children":2510},{"className":2508},[2509],"task-list-item",[2511,2516],{"type":32,"tag":2512,"props":2513,"children":2515},"input",{"disabled":367,"type":2514},"checkbox",[],{"type":37,"value":2517}," Prompt en repo git, historial de commits limpio",{"type":32,"tag":88,"props":2519,"children":2521},{"className":2520},[2509],[2522,2525],{"type":32,"tag":2512,"props":2523,"children":2524},{"disabled":367,"type":2514},[],{"type":37,"value":2526}," Suite de evaluación Promptfoo con pass rate > 95%",{"type":32,"tag":88,"props":2528,"children":2530},{"className":2529},[2509],[2531,2534],{"type":32,"tag":2512,"props":2532,"children":2533},{"disabled":367,"type":2514},[],{"type":37,"value":2535}," Golden dataset mín 50 ejemplos",{"type":32,"tag":88,"props":2537,"children":2539},{"className":2538},[2509],[2540,2543],{"type":32,"tag":2512,"props":2541,"children":2542},{"disabled":367,"type":2514},[],{"type":37,"value":2544}," Plan de A\u002FB test listo, tamaño de muestra calculado",{"type":32,"tag":88,"props":2546,"children":2548},{"className":2547},[2509],[2549,2552],{"type":32,"tag":2512,"props":2550,"children":2551},{"disabled":367,"type":2514},[],{"type":37,"value":2553}," LangSmith tracing habilitado, API key en producción",{"type":32,"tag":88,"props":2555,"children":2557},{"className":2556},[2509],[2558,2561],{"type":32,"tag":2512,"props":2559,"children":2560},{"disabled":367,"type":2514},[],{"type":37,"value":2562}," Loop de feedback configurado (puntuación de editores, join en BigQuery)",{"type":32,"tag":88,"props":2564,"children":2566},{"className":2565},[2509],[2567,2570],{"type":32,"tag":2512,"props":2568,"children":2569},{"disabled":367,"type":2514},[],{"type":37,"value":2571}," Procedimiento de rollback definido (qué métrica hace que vuelvas atrás automáticamente)",{"type":32,"tag":88,"props":2573,"children":2575},{"className":2574},[2509],[2576,2579],{"type":32,"tag":2512,"props":2577,"children":2578},{"disabled":367,"type":2514},[],{"type":37,"value":2580}," Monitoreo de costo — threshold diario de gasto en tokens $X",{"type":32,"tag":88,"props":2582,"children":2584},{"className":2583},[2509],[2585,2588],{"type":32,"tag":2512,"props":2586,"children":2587},{"disabled":367,"type":2514},[],{"type":37,"value":2589}," SLA de latencia — p95 \u003C 3s",{"type":32,"tag":33,"props":2591,"children":2592},{},[2593],{"type":37,"value":2594},"Sin completar esta lista, decir que ofreces \"servicio de IA\" es prematuro. Sin versionado, evaluación y observabilidad, cada despliegue de LLM en producción arriesga el rendimiento anterior — eso no es progreso, es caos controlado.",{"type":32,"tag":2596,"props":2597,"children":2598},"hr",{},[],{"type":32,"tag":33,"props":2600,"children":2601},{},[2602,2604,2611],{"type":37,"value":2603},"El versionado de prompts es cuestión de disciplina — no para ir más rápido, sino para ser confiable. En tácticas como ",{"type":32,"tag":677,"props":2605,"children":2608},{"href":2606,"rel":2607},"https:\u002F\u002Fwww.roibase.com.tr\u002Fes\u002Fgeo",[681],[2609],{"type":37,"value":2610},"Generative Engine Optimization",{"type":37,"value":2612},", la calidad del output se vincula directamente al resultado de negocio. Sin un pipeline de evaluación, cada despliegue pone en riesgo el rendimiento anterior. Promptfoo proporciona seguridad local, LangSmith visibilidad en producción. Juntos elevan las operaciones LLM al estándar de la ingeniería de software.",{"type":32,"tag":2614,"props":2615,"children":2616},"style",{},[2617],{"type":37,"value":2618},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":16,"searchDepth":199,"depth":199,"links":2620},[2621,2622,2625,2626,2629,2630],{"id":47,"depth":190,"text":50},{"id":110,"depth":190,"text":113,"children":2623},[2624],{"id":690,"depth":199,"text":693},{"id":1124,"depth":190,"text":1127},{"id":1711,"depth":190,"text":1714,"children":2627},[2628],{"id":2069,"depth":199,"text":2072},{"id":2426,"depth":190,"text":2429},{"id":2492,"depth":190,"text":2495},"markdown","content:es:ai:versionado-prompts-ab-testing-llm-ops.md","content","es\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops.md","es\u002Fai\u002Fversionado-prompts-ab-testing-llm-ops","md",1778709810620]