Hey folks.
Ran into a problem on MacOS X...that doesn't exist in Linux (CentOS, or Ubuntu).
No signature of method: static
org.apache.lucene.store.FSDirectory.open() is applicable for argument
types: (java.io.File) values: [/home/kplummer/Development/tmp]
I'm actually running this through Groovy,
Thanks Eric and Ian!
Yes, time stamp is one of our sort fields. By splitting it into
year/month/day/... it'll reduce the memory usage dramatically. But I don't
know if we can specify the significance of the sort fields, like year first,
followed by month, day ... etc.
Another thing is about un
Thanks Eric for the detailed explanation. Now I understand what Ian means.
-Fujian
--
View this message in context:
http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p884107.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Consider analyzing on whitespace, without
removing stopwords for the input "the fox is in
his den". You'd have the terms:
the
fox
is
in
his
den
What does it mean to sort on this field? Which term
should be used?
What if you remove stopwords? What about casing?
Or any of a myriad of other possible
In addition to Ian's comment, an important question is what
kind of values you're sorting on. It sounds like a time stamp,
because most languages only have a (relatively) small number
of terms.
It's not the total terms in the field, it's the total *unique* terms
in the field. So even with a very l
For performance. There is a one-off initial hit then things get quick.
--
Ian.
On Wed, Jun 9, 2010 at 4:46 PM, fujian wrote:
>
>
> Hello,
>
> I'm using lucene 2.9.0 and ran into OutOfMemory error when doing a search
> with sort on a big index. After a bit research, I found that when doing sort
Sorting on tokenized fields can work, but may not necessarily do what
you expect, depending on your requirements and how the field is
tokenized.
--
Ian.
On Wed, Jun 9, 2010 at 4:35 PM, fujian wrote:
>
>
> Hello,
>
> I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
> it
Doing your own sorting is certainly an acceptable thing to do, and for
low numbers of hits might even be "the right way". There are also
some tips and tricks that you can use to reduce Lucene's memory usage
for sorting such as using NumericField or splitting the sortable
field(s) into chunks, e.g.
Hello,
We are using lucene 2.9.0. and ran into OutOfMemory error when sorting on a
highly unique field on a big index. After doing some research we learned
that lucene will load the sort field value for all documents into memory to
do sorting, and ended up with the OutOfMemory if the index is to
Hello,
I'm using lucene 2.9.0 and ran into OutOfMemory error when doing a search
with sort on a big index. After a bit research, I found that when doing sort
lucene loads the field value for all docs in the index into memory, not the
matched ones only.
Just wondering why this? maybe for the pe
Hello,
I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
it says "The field must be indexed, but should not be tokenized".
But I tried to sort on a tokenized field, it works too. Just wondering
what's the difference between tokenized and untokenized in terms of sort?
W
A news bout google search index. Index system of Lucene can also support
realtime search,
Is there some difference between them?
With Caffeine, we analyze the web in small portions and update our search
index on a continuous basis, globally. As we find new pages, or new
information on existin
Hello to all !
I have _0.cfs file of a lucene index directory but segments.gen and
segments_2 are missing. Can I generate the segments.gen and segments_2 files
without having to regenerate the _0.cfs file. Does these "segments" files
contain any index specific data, which will thus force me to re
13 matches
Mail list logo