The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got ...
A series of recent research papers have shown that ChatGPT and related large language models can produce original, verifiable mathematical proofs, including solutions to problems that had not been ...
AI math proof verification reached a new frontier as DeepMind’s AlphaProof Nexus solved nine open Erdős research problems with Lean-verified proofs, some unsolved for 56 years. The May 2026 Science Ne ...
Since the start of the 20th century, the heart of mathematics has been the proof — a rigorous, logical argument for whether a given statement is true or false. Mathematicians’ careers are measured by ...
Large language models struggle to solve research-level math questions. It takes a human to assess just how poorly they perform. By Siobhan Roberts A few weeks ago, a high school student emailed Martin ...
Kendra Pierre-Louis: For Scientific American’s Science Quickly, I’m Kendra Pierre-Louis, in for Rachel Feltman. In 1997, Deep Blue, a supercomputer built by IBM, did the unexpected: it defeated chess ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results