Blog

Updates and announcements from the LLMEval team.

·LLMEval Team

LLMEval-Logic: Solver-Verified Chinese Logic Reasoning Benchmark (80% Public Release)

LLMEval-Logic is a Chinese logical reasoning benchmark double-audited by the Z3 SMT solver and human rubrics, and toughened via an adversarial-hardening agent loop. We are releasing 80% of the items (197 Base + 152 Hard + 197 rubrics); the remaining 20% is held out as a private contamination-resistant test set maintained by Fudan NLP Lab.

LLMEval-Logiclogical reasoningZ3contamination-resistant
Read more