LLMEval-Logic: Solver-Verified Chinese Logic Reasoning Benchmark (80% Public Release)
LLMEval-Logic is a Chinese logical reasoning benchmark double-audited by the Z3 SMT solver and human rubrics, and toughened via an adversarial-hardening agent loop. We are releasing 80% of the items (197 Base + 152 Hard + 197 rubrics); the remaining 20% is held out as a private contamination-resistant test set maintained by Fudan NLP Lab.