[
https://issues.apache.org/jira/browse/LUCENE-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626406#comment-14626406
]
Michael McCandless commented on LUCENE-6672:
--------------------------------------------
I'm not sure we need to do anything here: the intention is to set the limit of
how many states {{determinize}} is allowed to create, not to be a hard limit on
the number of states the final automaton has after UTF8 conversion ...
> CompiledAutomaton can generate a binary automaton that have more than
> 12*maxDeterminizedStates
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-6672
> URL: https://issues.apache.org/jira/browse/LUCENE-6672
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.10.3, Trunk
> Reporter: David Causse
> Attachments: maxStates-overview.patch, quick-fix.patch
>
>
> The maxDeterminizedStates parameter to Automaton has introduced a way to
> prevent massive states explosion during the generation of Automatas. This is
> a nice feature to protect applications against DoS attacks. Unfortunately in
> some cases like wildcard queries with a lot of wildcards the resulting binary
> Automaton can exceed maxDeterminizedStates by a factor of ~12.
> If I configure my application with the default maxDeterminizedStates to
> 10,000 CompiledAutomaton can potentially generate Automatas with more than
> 120,000 states.
> This is because UTF32ToUTF8 ignores maxDeterminizedStates and can generate a
> large binary automata that will be passed to the costly
> Operations.getCommonSuffixBytesRef.
> Current workaround is to set maxDeterminizedStates to expectedMaxStates/13.
> I'm not sure what's the best way to fix this issue, UTF32ToUTF8.convert()
> uses the Automaton.Builder which is very fast to create states, adding a
> check after each state creation is maybe not the best idea.
> A partial quick fix could be to check the size of the resulting binary
> automata and fail before running the costly
> Operations.getCommonSuffixBytesRef.
> Another fix would be to generalize maxDeterminizedStates to maxStates at the
> Automaton.Builder level. The maxStates could be checked before costly
> operations (before ArrayUtil.grow in addTransition and in finishState).
> Unfortunately this one requires more refactoring (not included in the patch).
> I included two patches to illustrate the above two fixes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]