Re: SpanRegex speed

Mark Miller Fri, 01 Sep 2006 07:29:19 -0700

Erick Erickson wrote:

OK, a not very helpful answer, but "of course they're slower, they domorework" (the span versions). But that's fairly useless, since thequestion is

really "is it enough slower in my situation that I need to find an

alternative?". And the only way I know of to answer that question isto make

some tests with the data representing my particular problem......


Sorry I can't be more help....
Erick

On 9/1/06, Mark Miller <[EMAIL PROTECTED]> wrote:


Erick Erickson wrote:
> Let me chime in here on a different note.... before you get happy with
> wildcard queries, take a look at the thread "I just don't get
> wildcards at

> all". There is lots of good info that Erik, Chris and Otis providedme.

>
> The danger with prefixquery and wildcard query is that they will throw

> TooManyClauses exceptions when you start matching a number of terms(the

> default is 1024, although you can make this much bigger if memory
> allows).
> If you're aware of this and it is and will be OK in your app, ignore
> this.
> But if your index is going to grow significantly, this is a real
> problem. I

> went with implementing filters with WildCardTermEnum (you couldalso use

> RegexTermEnum) for the wildcard portions of my query. Which has
> interesting
> implications for spans, we elected to say spans didn't work with
> wildcards.
>
> Anyway, as I said, if you're aware of the TooManyClauses issue and are
> sure
> it doesn't matter, ignore me. After all, everybody else does <G>.....
>
>
> Best
> Erick
>
>
>
> On 8/30/06, Mark Miller <[EMAIL PROTECTED]> wrote:
>>
>> Ignore that last question. I see that you said prefix wildcard query
and

>> not wildcard query. A quick look at the code seems to show itgrabbing

a
>> prefix as well.
>>
>> Do you think one would be any faster than the other? Should I used
>> Wildcardqueries outside of spanqueries and the regexquery inside
>> spanqueries or use regex both places?
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
Thanks a lot for the info Eric. Good stuff to know for sure.
I guess the real question I have been trying to spit out is this:
Is a span version of any of these searches--fuzzy, wildcard,
etc--inherently slower than their non-span brothers. If they have the
same limitations and speeds then that is all I am looking for.

P.S.
I realize I have been screwing up the threading by replying when
starting a new topic. I have been alerted and will stop this pernicious
activity.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Thanks Eric. Your always more than helpful. The reason I only care thatthey are as good as they can be is that I am looking for a generalsolution and not one tailored to a particular dataset. This is for ageneral query parser. I want to be able to search for wildcard, fuzzies,etc in a proximity search. mark*off NEAR Bork?on. This may just be aslow query in general but other search engines appear to offer this, andthey must face similar limitations. So if a fuzzy search is slow in aproximity search just because it is slow...I don't mind. If it is slowbecause lucene implements spans in a way that makes wildcard and fuzziesparticularly slow in them...thats what I would like to know. And if thatis the case...someone should make a fuzzy and wildcard that is fast in aspan :)


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanRegex speed

Reply via email to