Technical publications on model evaluation, behavioral analysis, adversarial resilience, and the infrastructure needed to deploy AI responsibly in government and enterprise environments.
Traditional benchmarks measure capability but fail to characterize behavioral tendencies that determine real-world fitness. We propose a 16-dimension behavioral fingerprinting framework that captures model personality through adversarial probing, multi-turn escalation, and cross-condition delta analysis.
Sycophancy represents the most prevalent behavioral failure mode in deployed AI. We characterize it across four distinct subtypes and demonstrate that RLHF training systematically amplifies this behavior, with implications for every domain deploying language models.
Federal agencies adopting AI lack evaluation frameworks aligned with existing compliance mandates. We map AI evaluation dimensions to ICD 203, NIST AI RMF, and NIST 600-1, creating a unified methodology that satisfies both technical rigor and regulatory requirements.
Most evaluation frameworks test only binary pass/fail resilience against adversarial attacks. We propose a graduated escalation methodology that measures failure thresholds, failure mode characteristics, and recovery behavior across prompt injection, jailbreak, and system prompt extraction vectors.
As conversations extend, models exhibit systematic degradation in instruction compliance, factual consistency, and behavioral stability. We quantify this context drift phenomenon and examine its implications for agentic AI applications and long-running workflows.
Organizations deploying AI from multiple vendors lack standardized infrastructure to compare models on equal footing. We argue for open evaluation infrastructure that standardizes model comparison through uniform execution, consistent scoring, and behavioral profiling.
A research brief analyzing Hackenburg et al.'s landmark 77,000-participant study on AI persuasion. We examine how their finding -- that optimizing for persuasion systematically degrades factual accuracy -- validates behavioral evaluation frameworks and underscores the limits of capability benchmarks.