Lee Hinman created LUCENE-6046:
----------------------------------
Summary: RegExp.toAutomaton high memory use
Key: LUCENE-6046
URL: https://issues.apache.org/jira/browse/LUCENE-6046
Project: Lucene - Core
Issue Type: Bug
Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Priority: Minor
When creating an automaton from an org.apache.lucene.util.automaton.RegExp,
it's possible for the automaton to use so much memory it exceeds the maximum
array size for java.
The following caused an OutOfMemoryError with a 32gb heap:
{noformat}
new
RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
{noformat}
When increased to a 60gb heap, the following exception is thrown:
{noformat}
1> java.lang.IllegalArgumentException: requested array size 2147483624
exceeds maximum array in java (2147483623)
1>
__randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
1> org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
1> org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
1>
org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
1>
org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
1>
org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
1>
org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]