Erick Erickson wrote:
OK, a not very helpful answer, but "of course they're slower, they do
more
work" (the span versions). But that's fairly useless, since the
question is
really "is it enough slower in my situation that I need to find an
alternative?". And the only way I know of to answer that question is
to make
some tests with the data representing my particular problem......
Sorry I can't be more help....
Erick
On 9/1/06, Mark Miller <[EMAIL PROTECTED]> wrote:
Erick Erickson wrote:
> Let me chime in here on a different note.... before you get happy with
> wildcard queries, take a look at the thread "I just don't get
> wildcards at
> all". There is lots of good info that Erik, Chris and Otis provided
me.
>
> The danger with prefixquery and wildcard query is that they will throw
> TooManyClauses exceptions when you start matching a number of terms
(the
> default is 1024, although you can make this much bigger if memory
> allows).
> If you're aware of this and it is and will be OK in your app, ignore
> this.
> But if your index is going to grow significantly, this is a real
> problem. I
> went with implementing filters with WildCardTermEnum (you could
also use
> RegexTermEnum) for the wildcard portions of my query. Which has
> interesting
> implications for spans, we elected to say spans didn't work with
> wildcards.
>
> Anyway, as I said, if you're aware of the TooManyClauses issue and are
> sure
> it doesn't matter, ignore me. After all, everybody else does <G>.....
>
>
> Best
> Erick
>
>
>
> On 8/30/06, Mark Miller <[EMAIL PROTECTED]> wrote:
>>
>> Ignore that last question. I see that you said prefix wildcard query
and
>> not wildcard query. A quick look at the code seems to show it
grabbing
a
>> prefix as well.
>>
>> Do you think one would be any faster than the other? Should I used
>> Wildcardqueries outside of spanqueries and the regexquery inside
>> spanqueries or use regex both places?
>>
>> - Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
Thanks a lot for the info Eric. Good stuff to know for sure.
I guess the real question I have been trying to spit out is this:
Is a span version of any of these searches--fuzzy, wildcard,
etc--inherently slower than their non-span brothers. If they have the
same limitations and speeds then that is all I am looking for.
P.S.
I realize I have been screwing up the threading by replying when
starting a new topic. I have been alerted and will stop this pernicious
activity.
- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks Eric. Your always more than helpful. The reason I only care that
they are as good as they can be is that I am looking for a general
solution and not one tailored to a particular dataset. This is for a
general query parser. I want to be able to search for wildcard, fuzzies,
etc in a proximity search. mark*off NEAR Bork?on. This may just be a
slow query in general but other search engines appear to offer this, and
they must face similar limitations. So if a fuzzy search is slow in a
proximity search just because it is slow...I don't mind. If it is slow
because lucene implements spans in a way that makes wildcard and fuzzies
particularly slow in them...thats what I would like to know. And if that
is the case...someone should make a fuzzy and wildcard that is fast in a
span :)
- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]