You are right Paul, 0 would not work, probably something less than zero, as
Paul suggested. Give it a try and tell us if it worked ; )
On Sun, Nov 22, 2009 at 9:50 AM, Paul Elschot wrote:
> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > Hi,
> >
> > I didn't test, but you might
How are you invoking the spell checker?
On Nov 19, 2009, at 1:22 AM, m.harig wrote:
>
> hello all
>
> i've a doubt in spell checker , when i search for a keyword hoem
> am getting the spell results as in the following order (in which am
> retrieving 4 suggested words)
>
> form
> hol
To call clear, you can always downcast to AttributeImpl. But you need to
know, that it may clear also other attributes (like if it is a Token). So
setting termLength to 0 is the fastest approach, if you only need the term
att.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.the
Ok I see you fixed it at the same time I sent the email :).
I think I get it ... so far.
So far I had to cache just TermAttribute. I think it'll get messy when I'll
need to cache more, like Type and PositionIncrement. But I haven't reached
those yet. Perhaps instead of creating many types of clon
I assume termAtt is the input's TermAttribute, right? Therefore it has no
copyTo ...
What I've done so far is create a TermAttribute like you proposed (fixed
from my previous TermAttributeImpl):
TermAttribute clone = (TermAttribute)
input.getAttributeFactory().createAttributeInstance(TermAttribut
Sorry small error:
Class Initializer:
private final AttributeSource lastState = cloneAttributes();
private final TermAttribute lastTermAtt =
lastState.addAttribute(TermAttribute.class);
incrementToken:
if (input.incrementToken()) {
if (lastTermAtt.checkSomethingAsYouProposed) {
The cast to TermAttributeImpl may not work if the factory creates a Token...
So declare termBuf as TermAttribute (without impl).
To clear, you can always downcast the interface to AttributeImpl. Or create
a second variable. Alternatively use my second approach.
-
Uwe Schindler
H.-H.-Meier-All
Did you mean something like:
TermAttributeImpl termBuf = (TermAttributeImpl)
input.getAttributeFactory().createAttributeInstance(TermAttribute.class);
I need to use the methods on TermAttributeImpl like clear() ...
Shai
On Sun, Nov 22, 2009 at 9:03 PM, Uwe Schindler wrote:
> I said, you *coul
Another idea, what you can also do is, create an AttributeSource instance in
your TokenStream one time using the AttributeSource.cloneAttributes() call.
You can use this copy of the attributes in parallel and maybe update the
TermAttribute there and so on. If you want to look at the last token, jus
I said, you *could* if it would be exposed. But the State is a holder class
without functionality. Because the internals are impl dependent, maybe we
will add such thing in future. But: If the state contains a real map, it
would be slow, because each captureState call would need to fill the map,
wh
Yes I can clone the term itself by instantiating a TermAttributeImpl, which
is better than storing the String, because the latter always allocates
char[], while the former will reuse the char[] if it's big enough.
What if State included a HashMap of all attributes, in addition to its
"linked-list"
Hmmm, could you show us what you do in your collector? Because
one of the gotchas about a collector is loading the documents in
the inner loop. Quick test: comment out whatever you're doing in
the underlying collector loop, and see if there's *any* noticeable
difference in speed. That'll tell you w
Hi Jake,
Many thanks for your quick reply.
I shall check these out.
Thanks!
Peter
> Date: Sun, 22 Nov 2009 09:20:24 -0800
> Subject: Re: Top field count scoring across documents
> From: jake.man...@gmail.com
> To: java-user@lucene.apache.org
>
> Peter,
>
> You want to do a facet qu
Op zondag 22 november 2009 17:23:53 schreef Eran Sevi:
> Thanks for the tips.
>
> I'm still using version 2.4 so I can't use MultiTermQueryWrapperFilter but
> I'll definitely try to re-group the the terms that are not changing in order
> to cache them.
> How can I join several such filters togethe
Peter,
You want to do a facet query. This kind of functionality is not in
Lucene-core (sadly), but both Solr (the fully featured search application
built on Lucene) and bobo-browse (just a library, like Lucene itself) are
open-source and work with Lucene to provide faceting capabilities for yo
Hello Lucene Experts,
I wonder if someone might be able to shed some insight on this interesting
scoring question:
The problem:
Build a search query that will return [ordered] hits by the top number of
occurences of field values across matched documents (or as close to this as
possible).
Th
I think it shouldn't take X5 times longer since the number of results is
only about X2 times larger (and much smaller than the number of terms in the
filter), but maybe I'm wrong here since I'm not familiar with the filter
internals.
Unfortunately, the time to construct the filter is mere millisec
Hmmm, I'm not very clear here. Are you saying that you effectively
form 10-50K filters and OR them all together? That would be
consistent with the 50K case taking approx. 5X a long as the 10K
case.
Do you know where in your code the time is being spent? That'd
be a big help in suggesting alter
Thanks for the tips.
I'm still using version 2.4 so I can't use MultiTermQueryWrapperFilter but
I'll definitely try to re-group the the terms that are not changing in order
to cache them.
How can I join several such filters together?
Using FieldCacheTermsFilter sounds promising. Fortunately it is
Maybe this helps you, but read the docs, it will work only with
single-value-fields:
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/FieldC
acheTermsFilter.html
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> --
Try a MultiTermQueryWrapperFilter instead of the QueryFilter.
I'd expect a modest gain in performance.
In case it is possible to form a few groups of terms that are reused,
it could even be more efficient to also use a CachingWrapperFilter
for each of these groups.
Regards,
Paul Elschot
Op zonda
Hi,
I have a need to filter my queries using a rather large subset of terms (can
be 10K or even 50K).
All these terms are sure to exist in the index so the number of results can
be about the same number of terms in the filter.
The terms are numbers but are not subsequent and are from a large set o
> Because that'd mean I'll check for abbreviations for every token. Which is
> a
> big performance loss. That way, I can just check abbr if I encountered a
> "."
> (not even all end-of-sentence tokens).
OK, than simply copy the term to a String and store it. The cost is the same
like cloning/copyi
Because that'd mean I'll check for abbreviations for every token. Which is a
big performance loss. That way, I can just check abbr if I encountered a "."
(not even all end-of-sentence tokens).
Why can't State offer a "getAttribute" like AttributeSource?
Shai
On Sun, Nov 22, 2009 at 4:34 PM, Uwe
If you just want to lookup if "Mr" is an abbreviation, why not look it up
when you handle that token and set a boolean variable in the TS
(lastTokenWasAbbreviation). When you process the ".", remove it if the
Boolean is set.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.
What I've done is:
State state = in.captureState();
...
// Upon new call to incrementToken().
State tmp = in.captureState();
in.restoreState(state);
// check if termAttribute is an abbreviation.
If not : in.restoreState(tmp);
But seems a lot of capturing/restoring to me ... how expensive is that?
Perhaps I misunderstand something. The current use case I'm trying to solve
is - I have an abbreviations TokenFilter which reads a token and stores it.
If the next token is end-of-sentence, it checks whether the previous one is
in the abbreviations list, and discards the end-of-sentence token. I ne
Use captureState and save the state somewhere. You can restore the state
with restoreState to the TokenStream. CachingTokenFilter does this.
So the new API uses the State object to put away tokens for later reference.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
ok so from what I understand, I should stop working w/ Token, and move to
working w/ the Attributes.
addAttribute indeed does not work. Even though it does not through an
exception, if I call in.addAttribute(Token.class), I get a new instance of
Token and not the once that was added by in. So this
> But I do use addAttribute(Token.class), so I don't understand why you say
> it's not possible. And I completely don't understand why the new API
> allows
> me to just work w/ interfaces and not impls ... A while ago I got the
> impression that we're trying to get rid of interfaces because they're
Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> Hi,
>
> I didn't test, but you might want to try SpanNearQuery and set slop to zero.
> Give it a try and let me know if it worked.
The slop is the number of positions "in between", so zero would still be too
much to only match at the
But I do use addAttribute(Token.class), so I don't understand why you say
it's not possible. And I completely don't understand why the new API allows
me to just work w/ interfaces and not impls ... A while ago I got the
impression that we're trying to get rid of interfaces because they're not
easy
>
> I want to add Token.class, and then work w/ Token. Not TermAttribute,
> PosIncrAttribute, OffsetAttribute, PayloadAttribute and TypeAttribute
> (these
> are the five attributes I'm using from Token). Why can't the code add
> Token
> to the attributes map? If all of these are anyway mapped to t
> I started to migrate my Analyzers, Tokenizer, TokenStreams and
> TokenFilters
> to the new API. Since the entire set of classes handled Token before, I
> decided to not change it for now, and was happy to discover that Token
> extends AttributeImpl, which makes the migration easier.
>
> So I sta
Thanks Uwe for the response, however that doesn't get me anywhere. I already
know that Token is added once, and that after I add Token I cannot add more
of them. And I understand why the double printing.
I want to add Token.class, and then work w/ Token. Not TermAttribute,
PosIncrAttribute, Offset
> To add to my previous email, If I do the following:
>
> StringReader sr = new StringReader("hello world");
> TokenStream ts = new WhitespaceTokenizer(Token.TOKEN_ATTRIBUTE_FACTORY,
> sr);
>
> for (Iterator> iter =
> ts.getAttributeClassesIterator(); iter.hasNext();) {
> Class< ? extends Attri
To add to my previous email, If I do the following:
StringReader sr = new StringReader("hello world");
TokenStream ts = new WhitespaceTokenizer(Token.TOKEN_ATTRIBUTE_FACTORY, sr);
for (Iterator> iter =
ts.getAttributeClassesIterator(); iter.hasNext();) {
Class< ? extends Attribute> type = iter.
Hi
I started to migrate my Analyzers, Tokenizer, TokenStreams and TokenFilters
to the new API. Since the entire set of classes handled Token before, I
decided to not change it for now, and was happy to discover that Token
extends AttributeImpl, which makes the migration easier.
So I started w/ my
38 matches
Mail list logo