[ 
https://issues.apache.org/jira/browse/LUCENE-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626406#comment-14626406
 ] 

Michael McCandless commented on LUCENE-6672:
--------------------------------------------

I'm not sure we need to do anything here: the intention is to set the limit of 
how many states {{determinize}} is allowed to create, not to be a hard limit on 
the number of states the final automaton has after UTF8 conversion ...

> CompiledAutomaton can generate a binary automaton that have more than 
> 12*maxDeterminizedStates
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6672
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6672
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.10.3, Trunk
>            Reporter: David Causse
>         Attachments: maxStates-overview.patch, quick-fix.patch
>
>
> The maxDeterminizedStates parameter to Automaton has introduced a way to 
> prevent massive states explosion during the generation of Automatas. This is 
> a nice feature to protect applications against DoS attacks. Unfortunately in 
> some cases like wildcard queries with a lot of wildcards the resulting binary 
> Automaton can exceed maxDeterminizedStates by a factor of ~12.
> If I configure my application with the default maxDeterminizedStates to 
> 10,000 CompiledAutomaton can potentially generate Automatas with more than 
> 120,000 states.
> This is because UTF32ToUTF8 ignores maxDeterminizedStates and can generate a 
> large binary automata that will be passed to the costly 
> Operations.getCommonSuffixBytesRef.
> Current workaround is to set maxDeterminizedStates to expectedMaxStates/13.
> I'm not sure what's the best way to fix this issue, UTF32ToUTF8.convert() 
> uses the Automaton.Builder which is very fast to create states, adding a 
> check after each state creation is maybe not the best idea.
> A partial quick fix could be to check the size of the resulting binary 
> automata and fail before running the costly 
> Operations.getCommonSuffixBytesRef.
> Another fix would be to generalize maxDeterminizedStates to maxStates at the 
> Automaton.Builder level. The maxStates could be checked before costly 
> operations (before ArrayUtil.grow in addTransition and in finishState). 
> Unfortunately this one requires more refactoring (not included in the patch).
> I included two patches to illustrate the above two fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to