[ 
https://issues.apache.org/jira/browse/LUCENE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6367:
---------------------------------------
    Attachment: LUCENE-6367.patch

Patch, cutting over PrefixQuery to AutomatonQuery and removing
PrefixTermsEnum.

I explored the optimization of having Byte/CharRunAutomaton.run
optimize (short-circuit) when you're in a sink state but it became
quite difficult/invasive fixing all callers of .step to handle this.
With LUCENE-5879 we also need to know the sink state under-the-hood,
but that's separate from fixing .run to make use of it.

So I backed out that opto and tried just doing the PrefixQuery cutover
without optimizing for sink states.  I'm running a perf test w/
luceneutil and it looks like the impact is trivial (well within
noise).  Net/net I think it's fine to "just cutover" without the
invasive opto?

I also changed PrefixQuery's semantics to apply to full binary space
terms, not just UTF-8 space.  While this is technically a change in
behavior, it won't impact users who index only unicode terms.  It's
also necessary for LUCENE-5879, because if prefixing is done only in
unicode space (like today), then the resulting binary space automaton
will not have a sink state and auto-prefix can't apply.

If this part is somehow controversial I can revert and try to do it
only with LUCENE-5879 instead... if it's OK, I'll add some tests
showing that PrefixQuery on binary terms works.


> Can PrefixQuery subclass AutomatonQuery?
> ----------------------------------------
>
>                 Key: LUCENE-6367
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6367
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6367.patch
>
>
> Spinoff/blocker for LUCENE-5879.
> It seems like PrefixQuery should "simply" be an AutomatonQuery rather than 
> specializing its own TermsEnum ... with maybe some performance improvements 
> to ByteRunAutomaton.run to short-circuit once it's in a "sink state", 
> AutomatonTermsEnum could be just as fast as PrefixTermsEnum.
> If we can do this it will make LUCENE-5879 simpler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to