structural-intelligence

/papers/beyond-fluency/README.md

Beyond Fluency: A Human-Centered Evaluation Rubric for Grounding, Answerability, and Reliability in LLM Outputs

Author: Vladisav Jovanović
Status: Preprint
Version: Latest archived (Feb 2026)

Abstract

Large language models can produce responses that are coherent, persuasive, and stylistically appropriate even when they are weakly grounded, poorly constrained, or factually unreliable. This creates a persistent evaluation problem: fluency is easy to mistake for quality. Existing evaluation practices often emphasize correctness, harmlessness, user preference, or instruction-following, but they do not always capture a deeper distinction between outputs that merely sound complete and outputs that remain answerable to evidence, uncertainty, correction, and practical consequence. This paper proposes a human-centered framework for evaluating LLM outputs beyond fluency. It introduces three primary dimensions: grounding, answerability, and reliability. The goal is not to replace existing benchmarks, but to add a missing evaluative layer: a human-centered account of what makes an AI response not merely readable, but responsibly usable.

Keywords

artificial intelligence; large language models; LLM evaluation; grounding; answerability; reliability; hallucination; trustworthy AI; human–AI interaction; AI ethics; epistemology