Erik Hatcher wrote:

On Aug 30, 2006, at 6:13 PM, Mark Miller wrote:
* An implementation tying Java's built-in java.util.regex to RegexQuery.
*
* Note that because this implementation currently only returns null from
* [EMAIL PROTECTED] #prefix} that queries using this implementation will enumerate and * attempt to [EMAIL PROTECTED] #match} each term for the specified field in the index.

Is this another way to say im gonna be friggen slow? Say it aint so...

"slow" is relative. It will enumerate all the terms for the specified field and run a regular expression match on each one. The same thing happens with FuzzyQuery and prefixed WildcardQuery too. These aren't necessarily "slow", so try it and see.

I want to use this as a multi-phrase query...a spannear with a term that could be the regex "term1|term2"

What about nesting a SpanOrQuery for those two terms inside a SpanNearQuery?

I need this. Pipe dream for speed on a huge index?

Feel free to implement a robust prefix method :) It's much more difficult than I wanted to tackle when I created this infrastructure. But thankfully Regexp implemented it, so you could use it for prefix computation and a different matcher implementation if you like.

    Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Thanks for the info Erik. I did not realize that WildcardQuery and FuzzyQuery did this as well. A lot of my concern was that I needed to implement WildcardQuery as a SpanRegexQuery so that I could get nested wildcard searches in my proximity searches. If it's the same speed as WildcardQuery I am not worried. However, it seems like it could be even faster:

I only need to support * and ? as wildcard does. I don't want to include Jakarta regex with my distro. I made a new Regex implementation based on the Java 5 util stuff that only allows * and ?.

I pass the pattern string into a short method that:
    * Removes single backslashes, halves double backslashes, escapes
    * non-alphanumeric, and records prefix. Ignores * and ?.

Then I replace * with .* and ? with *{1}.

Only supporting * and ? seems to make grabbing the prefix nice and simple.

Now my question: should I use this instead of wildcardquery even when not in a span search? Sounds like it would be more efficient.
A
lso, how does a spanOr query work? Is the resulting span anchored at the start of the word and the length of the word? Like a term span? So that its an Or Term span? If there are more than one matches does the span cover all of them or is each match a span the size of each hit?

Thanks,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to