Re: At Secret Math Meeting, Researchers Struggle to Outsmart AI

Brent Meeker Sun, 08 Jun 2025 16:25:59 -0700

"Proof by intimidation."  Wait'll Trump hears about that.

I've long thought that computers would take over mathematics, but Ithought it would be software instantianting logic, like Prolog. Howironic that mathematics is taken over by reasoners that no more canexplain how we do it than Poincaire' could.


Brent

On 6/8/2025 5:50 AM, John Clark wrote:

Two days ago the following article went online:
At Secret Math Meeting, Researchers Struggle to Outsmart AI<https://www.scientificamerican.com/article/inside-the-secret-meeting-where-mathematicians-struggled-to-outsmart-ai/>
I think it's behind a paywall but it's super important so you can readit below. It looks like professional human mathematicians will soon beobsolete, if they're not already.
==
/The world's leading mathematicians were stunned by how adeptartificial intelligence is at doing their jobs/
By Lyndie Chiou
*On a weekend in mid-May, a clandestine mathematical conclaveconvened. Thirty of the world’s most renowned mathematicians traveledto Berkeley, Calif., with some coming from as far away as the U.K. Thegroup’s members faced off in a showdown with a “reasoning” chatbotthat was tasked with solving problems they had devised to test itsmathematical mettle. After throwing professor-level questions at thebot for two days, the researchers were stunned to discover it wascapable of answering some of the world’s hardest solvable problems. “Ihave colleagues who literally said these models are approachingmathematical genius,” says Ken Ono, a mathematician at the Universityof Virginia and a leader and judge at the meeting.*
*The chatbot in question is powered by o4-mini, a so-called reasoninglarge language model (LLM). It was trained by OpenAI to be capable ofmaking highly intricate deductions. Google’s equivalent, Gemini 2.5Flash, has similar abilities. Like the LLMs that powered earlierversions of ChatGPT, o4-mini learns to predict the next word in asequence. Compared with those earlier LLMs, however, o4-mini and itsequivalents are lighter-weight, more nimble models that train onspecialized datasets with stronger reinforcement from humans. Theapproach leads to a chatbot capable of diving much deeper into complexproblems in math than traditional LLMs.*
*To track the progress of o4-mini, OpenAI previously tasked Epoch AI,a nonprofit that benchmarks LLMs, to come up with 300 math questionswhose solutions had not yet been published. Even traditional LLMs cancorrectly answer many complicated math questions. Yet when Epoch AIasked several such models these questions, which were dissimilar tothose they had been trained on, the most successful were able to solveless than 2 percent, showing these LLMs lacked the ability to reason.But o4-mini would prove to be very different.*
*Epoch AI hired Elliot Glazer, who had recently finished his mathPh.D., to join the new collaboration for the benchmark, dubbedFrontierMath, in September 2024. The project collected novel questionsover varying tiers of difficulty, with the first three tiers coveringundergraduate-, graduate- and research-level challenges. By April2025, Glazer found that o4-mini could solve around 20 percent of thequestions. He then moved on to a fourth tier: a set of questions thatwould be challenging even for an academic mathematician. Only a smallgroup of people in the world would be capable of developing suchquestions, let alone answering them. The mathematicians whoparticipated had to sign a nondisclosure agreement requiring them tocommunicate solely via the messaging app Signal. Other forms ofcontact, such as traditional e-mail, could potentially be scanned byan LLM and inadvertently train it, thereby contaminating the dataset.*
*Each problem the o4-mini couldn’t solve would garner themathematician who came up with it a $7,500 reward. The group madeslow, steady progress in finding questions. But Glazer wanted to speedthings up, so Epoch AI hosted the in-person meeting on Saturday, May17, and Sunday, May 18. There, the participants would finalize thelast batch of challenge questions. The 30 attendees were split intogroups of six. For two days, the academics competed against themselvesto devise problems that they could solve but would trip up the AIreasoning bot.*
*
*
*By the end of that Saturday night, Ono was frustrated with the bot,whose unexpected mathematical prowess was foiling the group’sprogress. “I came up with a problem which experts in my field wouldrecognize as an open question in number theory—a good Ph.D.-levelproblem,” he says. He asked o4-mini to solve the question. Over thenext 10 minutes, Ono watched in stunned silence as the bot unfurled asolution in real time, showing its reasoning process along the way.The bot spent the first two minutes finding and mastering the relatedliterature in the field. Then it wrote on the screen that it wanted totry solving a simpler “toy” version of the question first in order tolearn. A few minutes later, it wrote that it was finally prepared tosolve the more difficult problem. Five minutes after that, o4-minipresented a correct but sassy solution. “It was starting to get reallycheeky,” says Ono, who is also a freelance mathematical consultant forEpoch AI. “And at the end, it says, ‘No citation necessary because themystery number was computed by me!’”
Defeated, Ono jumped onto Signal early that Sunday morning and alertedthe rest of the participants. “I was not prepared to be contendingwith an LLM like this,” he says, “I’ve never seen that kind ofreasoning before in models. That’s what a scientist does. That’sfrightening.”
Although the group did eventually succeed in finding 10 questions thatstymied the bot, the researchers were astonished by how far AI hadprogressed in the span of one year. Ono likened it to working with a“strong collaborator.” Yang Hui He, a mathematician at the LondonInstitute for Mathematical Sciences and an early pioneer of using AIin math, says, “This is what a very, very good graduate student wouldbe doing—in fact, more.”
The bot was also much faster than a professional mathematician, takingmere minutes to do what it would take such a human expert weeks ormonths to complete.
While sparring with o4-mini was thrilling, its progress was alsoalarming. Ono and He express concern that the o4-mini’s results mightbe trusted too much. “There’s proof by induction, proof bycontradiction, and then proof by intimidation,” He says. “If you saysomething with enough authority, people just get scared. I thinko4-mini has mastered proof by intimidation; it says everything with somuch confidence.”
By the end of the meeting, the group started to consider what thefuture might look like for mathematicians. Discussions turned to theinevitable “tier five”—questions that even the best mathematicianscouldn't solve. If AI reaches that level, the role of mathematicianswould undergo a sharp change. For instance, mathematicians may shiftto simply posing questions and interacting with reasoning-bots to helpthem discover new mathematical truths, much the same as a professordoes with graduate students. As such, Ono predicts that nurturingcreativity in higher education will be a key in keeping mathematicsgoing for future generations.
“I’ve been telling my colleagues that it’s a grave mistake to say thatgeneralized artificial intelligence will never come, [that] it’s justa computer,” Ono says. “I don’t want to add to the hysteria, but insome ways these large language models are already outperforming mostof our best graduate students in the world.”*
--
You received this message because you are subscribed to the GoogleGroups "Everything List" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].To view this discussion visithttps://groups.google.com/d/msgid/everything-list/CAJPayv1zxVfKa3tdSNW9BCy%3Db2PzR-7cSE1O8h1M9DxE5F1yzw%40mail.gmail.com<https://groups.google.com/d/msgid/everything-list/CAJPayv1zxVfKa3tdSNW9BCy%3Db2PzR-7cSE1O8h1M9DxE5F1yzw%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/everything-list/88f10072-ab7a-41d8-8178-c28453b71752%40gmail.com.

Re: At Secret Math Meeting, Researchers Struggle to Outsmart AI

Reply via email to