Problem loading file from Directory on MacOS X...

2010-06-09 Thread Kit Plummer
Hey folks. Ran into a problem on MacOS X...that doesn't exist in Linux (CentOS, or Ubuntu). No signature of method: static org.apache.lucene.store.FSDirectory.open() is applicable for argument types: (java.io.File) values: [/home/kplummer/Development/tmp] I'm actually running this through Groovy,

Re: is this the right way to go?

2010-06-09 Thread fujian
Thanks Eric and Ian! Yes, time stamp is one of our sort fields. By splitting it into year/month/day/... it'll reduce the memory usage dramatically. But I don't know if we can specify the significance of the sort fields, like year first, followed by month, day ... etc. Another thing is about un

Re: sort field should not be tokenized?

2010-06-09 Thread fujian
Thanks Eric for the detailed explanation. Now I understand what Ian means. -Fujian -- View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p884107.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: sort field should not be tokenized?

2010-06-09 Thread Erick Erickson
Consider analyzing on whitespace, without removing stopwords for the input "the fox is in his den". You'd have the terms: the fox is in his den What does it mean to sort on this field? Which term should be used? What if you remove stopwords? What about casing? Or any of a myriad of other possible

Re: is this the right way to go?

2010-06-09 Thread Erick Erickson
In addition to Ian's comment, an important question is what kind of values you're sorting on. It sounds like a time stamp, because most languages only have a (relatively) small number of terms. It's not the total terms in the field, it's the total *unique* terms in the field. So even with a very l

Re: why lucene loads field value for every doc (not only the matched docs) when doing sort?

2010-06-09 Thread Ian Lea
For performance. There is a one-off initial hit then things get quick. -- Ian. On Wed, Jun 9, 2010 at 4:46 PM, fujian wrote: > > > Hello, > > I'm using lucene 2.9.0 and ran into OutOfMemory error when doing a search > with sort on a big index. After a bit research, I found that when doing sort

Re: sort field should not be tokenized?

2010-06-09 Thread Ian Lea
Sorting on tokenized fields can work, but may not necessarily do what you expect, depending on your requirements and how the field is tokenized. -- Ian. On Wed, Jun 9, 2010 at 4:35 PM, fujian wrote: > > > Hello, > > I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed > it

Re: is this the right way to go?

2010-06-09 Thread Ian Lea
Doing your own sorting is certainly an acceptable thing to do, and for low numbers of hits might even be "the right way". There are also some tips and tricks that you can use to reduce Lucene's memory usage for sorting such as using NumericField or splitting the sortable field(s) into chunks, e.g.

is this the right way to go?

2010-06-09 Thread fujian
Hello, We are using lucene 2.9.0. and ran into OutOfMemory error when sorting on a highly unique field on a big index. After doing some research we learned that lucene will load the sort field value for all documents into memory to do sorting, and ended up with the OutOfMemory if the index is to

why lucene loads field value for every doc (not only the matched docs) when doing sort?

2010-06-09 Thread fujian
Hello, I'm using lucene 2.9.0 and ran into OutOfMemory error when doing a search with sort on a big index. After a bit research, I found that when doing sort lucene loads the field value for all docs in the index into memory, not the matched ones only. Just wondering why this? maybe for the pe

sort field should not be tokenized?

2010-06-09 Thread fujian
Hello, I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed it says "The field must be indexed, but should not be tokenized". But I tried to sort on a tokenized field, it works too. Just wondering what's the difference between tokenized and untokenized in terms of sort? W

A question bout google search index?

2010-06-09 Thread luocanrao
A news bout google search index. Index system of Lucene can also support realtime search, Is there some difference between them? With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existin

segment_N file is missed

2010-06-09 Thread maryam ma'danipour
Hello to all ! I have _0.cfs file of a lucene index directory but segments.gen and segments_2 are missing. Can I generate the segments.gen and segments_2 files without having to regenerate the _0.cfs file. Does these "segments" files contain any index specific data, which will thus force me to re