"Proof by intimidation." Wait'll Trump hears about that.
I've long thought that computers would take over mathematics, but I
thought it would be software instantianting logic, like Prolog. How
ironic that mathematics is taken over by reasoners that no more can
explain how we do it than Poincaire' could.
Brent
On 6/8/2025 5:50 AM, John Clark wrote:
Two days ago the following article went online:
At Secret Math Meeting, Researchers Struggle to Outsmart AI
<https://www.scientificamerican.com/article/inside-the-secret-meeting-where-mathematicians-struggled-to-outsmart-ai/>
I think it's behind a paywall but it's super important so you can read
it below. It looks like professional human mathematicians will soon be
obsolete, if they're not already.
==
/The world's leading mathematicians were stunned by how adept
artificial intelligence is at doing their jobs/
By Lyndie Chiou
*On a weekend in mid-May, a clandestine mathematical conclave
convened. Thirty of the world’s most renowned mathematicians traveled
to Berkeley, Calif., with some coming from as far away as the U.K. The
group’s members faced off in a showdown with a “reasoning” chatbot
that was tasked with solving problems they had devised to test its
mathematical mettle. After throwing professor-level questions at the
bot for two days, the researchers were stunned to discover it was
capable of answering some of the world’s hardest solvable problems. “I
have colleagues who literally said these models are approaching
mathematical genius,” says Ken Ono, a mathematician at the University
of Virginia and a leader and judge at the meeting.*
*The chatbot in question is powered by o4-mini, a so-called reasoning
large language model (LLM). It was trained by OpenAI to be capable of
making highly intricate deductions. Google’s equivalent, Gemini 2.5
Flash, has similar abilities. Like the LLMs that powered earlier
versions of ChatGPT, o4-mini learns to predict the next word in a
sequence. Compared with those earlier LLMs, however, o4-mini and its
equivalents are lighter-weight, more nimble models that train on
specialized datasets with stronger reinforcement from humans. The
approach leads to a chatbot capable of diving much deeper into complex
problems in math than traditional LLMs.*
*To track the progress of o4-mini, OpenAI previously tasked Epoch AI,
a nonprofit that benchmarks LLMs, to come up with 300 math questions
whose solutions had not yet been published. Even traditional LLMs can
correctly answer many complicated math questions. Yet when Epoch AI
asked several such models these questions, which were dissimilar to
those they had been trained on, the most successful were able to solve
less than 2 percent, showing these LLMs lacked the ability to reason.
But o4-mini would prove to be very different.*
*Epoch AI hired Elliot Glazer, who had recently finished his math
Ph.D., to join the new collaboration for the benchmark, dubbed
FrontierMath, in September 2024. The project collected novel questions
over varying tiers of difficulty, with the first three tiers covering
undergraduate-, graduate- and research-level challenges. By April
2025, Glazer found that o4-mini could solve around 20 percent of the
questions. He then moved on to a fourth tier: a set of questions that
would be challenging even for an academic mathematician. Only a small
group of people in the world would be capable of developing such
questions, let alone answering them. The mathematicians who
participated had to sign a nondisclosure agreement requiring them to
communicate solely via the messaging app Signal. Other forms of
contact, such as traditional e-mail, could potentially be scanned by
an LLM and inadvertently train it, thereby contaminating the dataset.*
*Each problem the o4-mini couldn’t solve would garner the
mathematician who came up with it a $7,500 reward. The group made
slow, steady progress in finding questions. But Glazer wanted to speed
things up, so Epoch AI hosted the in-person meeting on Saturday, May
17, and Sunday, May 18. There, the participants would finalize the
last batch of challenge questions. The 30 attendees were split into
groups of six. For two days, the academics competed against themselves
to devise problems that they could solve but would trip up the AI
reasoning bot.*
*
*
*By the end of that Saturday night, Ono was frustrated with the bot,
whose unexpected mathematical prowess was foiling the group’s
progress. “I came up with a problem which experts in my field would
recognize as an open question in number theory—a good Ph.D.-level
problem,” he says. He asked o4-mini to solve the question. Over the
next 10 minutes, Ono watched in stunned silence as the bot unfurled a
solution in real time, showing its reasoning process along the way.
The bot spent the first two minutes finding and mastering the related
literature in the field. Then it wrote on the screen that it wanted to
try solving a simpler “toy” version of the question first in order to
learn. A few minutes later, it wrote that it was finally prepared to
solve the more difficult problem. Five minutes after that, o4-mini
presented a correct but sassy solution. “It was starting to get really
cheeky,” says Ono, who is also a freelance mathematical consultant for
Epoch AI. “And at the end, it says, ‘No citation necessary because the
mystery number was computed by me!’”
Defeated, Ono jumped onto Signal early that Sunday morning and alerted
the rest of the participants. “I was not prepared to be contending
with an LLM like this,” he says, “I’ve never seen that kind of
reasoning before in models. That’s what a scientist does. That’s
frightening.”
Although the group did eventually succeed in finding 10 questions that
stymied the bot, the researchers were astonished by how far AI had
progressed in the span of one year. Ono likened it to working with a
“strong collaborator.” Yang Hui He, a mathematician at the London
Institute for Mathematical Sciences and an early pioneer of using AI
in math, says, “This is what a very, very good graduate student would
be doing—in fact, more.”
The bot was also much faster than a professional mathematician, taking
mere minutes to do what it would take such a human expert weeks or
months to complete.
While sparring with o4-mini was thrilling, its progress was also
alarming. Ono and He express concern that the o4-mini’s results might
be trusted too much. “There’s proof by induction, proof by
contradiction, and then proof by intimidation,” He says. “If you say
something with enough authority, people just get scared. I think
o4-mini has mastered proof by intimidation; it says everything with so
much confidence.”
By the end of the meeting, the group started to consider what the
future might look like for mathematicians. Discussions turned to the
inevitable “tier five”—questions that even the best mathematicians
couldn't solve. If AI reaches that level, the role of mathematicians
would undergo a sharp change. For instance, mathematicians may shift
to simply posing questions and interacting with reasoning-bots to help
them discover new mathematical truths, much the same as a professor
does with graduate students. As such, Ono predicts that nurturing
creativity in higher education will be a key in keeping mathematics
going for future generations.
“I’ve been telling my colleagues that it’s a grave mistake to say that
generalized artificial intelligence will never come, [that] it’s just
a computer,” Ono says. “I don’t want to add to the hysteria, but in
some ways these large language models are already outperforming most
of our best graduate students in the world.”*
--
You received this message because you are subscribed to the Google
Groups "Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/CAJPayv1zxVfKa3tdSNW9BCy%3Db2PzR-7cSE1O8h1M9DxE5F1yzw%40mail.gmail.com
<https://groups.google.com/d/msgid/everything-list/CAJPayv1zxVfKa3tdSNW9BCy%3Db2PzR-7cSE1O8h1M9DxE5F1yzw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/88f10072-ab7a-41d8-8178-c28453b71752%40gmail.com.