Re: About Lucene ...

2009-12-03 Thread Weiwei Wang
You can do everything related to search(full text or just paths) with Lucene:-) On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote: > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > > On Wednesday 02 December 2009 15:50:45 archibal wrote: > > > -optionnally i want to have a central

Sorting issues resolved in 3.0?

2009-12-03 Thread Ganesh
Hello all, Sorting consumes huge amount of memory. Did 2.9.1 /3.0 has ability to customize field cache. In my case, 80% of documents are required to be sorted. Currently the field cache is loading all records. Is there any custom interface available to decide which document to be loaded in ca

RE: Sorting issues resolved in 3.0?

2009-12-03 Thread Uwe Schindler
In 3.0 nothing changed about that (and also in 2.9). Only that the FieldCache is now segment-wise which makes IndexReader.reopen be faster. But you are still able to do your own sorting with own structures, you just have to write your own TopDocsCollector. - Uwe Schindler H.-H.-Meier-Allee 63

Re: About Lucene ...

2009-12-03 Thread 杨建华
May be you can try Omnifind Yahoo Edition. 2009/12/3 Weiwei Wang > You can do everything related to search(full text or just paths) with > Lucene:-) > > On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote: > > > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > > > On Wednesday 02 De

Re: About Lucene ...

2009-12-03 Thread Lukáš Vlček
This might be OT but did you consider Google Desktop Search? Seems that somebody reported success with hacking it to allow network file system index/search: http://www.geekzone.co.nz/content.asp?contentid=3939 Regards, Lukas http://blog.lukas-vlcek.com/ 2009/12/3 杨建华 > May be you can try Omni

Re: IndexDivisor

2009-12-03 Thread Ganesh
Thanks. IndexDivisor means, When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. For example i am having 100,000 unique terms and termIndexInterval set to 5 then (10 / (5*128) ) terms (==X terms)will be loaded in to memory. If termIndexInterval is

Re: IndexDivisor

2009-12-03 Thread Michael McCandless
That's indeed how index divisor works, so, the memory difference should be 100X lower, for just the RAM consumed by the terms index. Other things consume RAM (norms, deleted docs, field cache), so maybe those are messing up your measurements? Mike On Fri, Nov 27, 2009 at 5:30 AM, Danil ŢORIN wr

Re: IndexDivisor

2009-12-03 Thread Ganesh
I don't have norms, I don't delete docs using IndexReader. I have switched off sorting. I think fieldcache is used for sorting. I am just loading all the index with different values and calculating the memory difference. I am having 100 million records splitted across database. I loaded all the

Re: IndexDivisor

2009-12-03 Thread Michael McCandless
How do you measure memory consumption? If you pass -1 for the divisor do you still see no difference? Can you post the output of CheckIndex on your index? Are you sure your index has no deletions? Mike On Thu, Dec 3, 2009 at 5:55 AM, Ganesh wrote: > I don't have norms, I don't delete docs usi

Re: IndexDivisor

2009-12-03 Thread Ganesh
I don't have deletions. No norms. No sorting. I am setting 70 MB for IndexWriter RamBuffer but not indexing any documents. Swtiched off indexing. I have enabled TermVector for one field. I am opening all 30 database with different value of indexdivisor. Below are my stats IndexDivisorMemory

How to do relevancy ranking in lucene

2009-12-03 Thread DHIVYA M
Hi all, Am using lucene 2.3.2. When i search using lucene demo am getting all the results which contains the query. But i would like to restrict my results to the relevant match and not all the documents containing the query string.   Ex: Query: how to search a string?   Response am getting is: a

Re: IndexDivisor

2009-12-03 Thread Michael McCandless
On Thu, Dec 3, 2009 at 7:15 AM, Ganesh wrote: > Below are my stats > IndexDivisor    Memory >    -1                7 MB >    1                  486 MB >    100              180 MB >    1000            176 MB. Do you simply create the IndexWriter & IndexReader, but do no searching/indexing? How

Re: Potential leak of file resources in SpellChecker

2009-12-03 Thread Michael McCandless
This sounds like an important bug fix -- could you open a Jira issue & attach a patch? Thanks! Mike 2009/12/2 Eirik Bjørsnøs : > Hi, > > I'm using SpellChecker (in Lucene contrib) to help users of SVNSearch > who can't type right: > > http://svnsearch.org/svnsearch/repos/ASF/search?logMessage=lu

Re: How to do relevancy ranking in lucene

2009-12-03 Thread Erick Erickson
I don't really understand your goal here. Lucene already does this with it's relevancy ranking. By definition, it calculates a score for each document and ranks them in order of the score. This is NOT a simple "is the word in the document or not" You can read about the scoring algorithm here:

Problem with close IndexWriter pausing due to locked files

2009-12-03 Thread Paul Williams
Hi, I have found an issue with a Lucene index locking files from a Searcher instance. So that when a mergeSegments happens on a indexwriter close it actually hangs until the search instance is closed This seems to happen only when the index is stored on a network SAN device. When stored elsewhe

Re: IndexDivisor

2009-12-03 Thread Ganesh
Thanks mike. I am opening the reader and warming it up and then calculating the memory consumed. long usedMemory = runtime.totalMemory() - runtime.freeMemory(); Regards Ganesh - Original Message - From: "Michael McCandless" To: Sent: Thursday, December 03, 2009 6:22 PM Subject: Re:

Re: IndexDivisor

2009-12-03 Thread Danil ŢORIN
Run System.gc() exactly before measuring memory usage. On sun jvm it will FORCE gc (unless DisableExplicitGC is used). On Thu, Dec 3, 2009 at 16:30, Ganesh wrote: > Thanks mike. > > I am opening the reader and warming it up and then calculating the memory > consumed. > long usedMemory   = runt

Re: Problem with close IndexWriter pausing due to locked files

2009-12-03 Thread Michael McCandless
That's rather spooky. It could be your SAN device hangs on delete, if another machine has that file open? Is it possible that it does this (intentionally)? You could make a simple standalone test case. Different OSs have different semantics. Windows refuses to do the delete (throws IOException

Re: IndexDivisor

2009-12-03 Thread Ganesh
I am doing GC before calculating the memory. Even i set my indexdivisor to 1 but there seems to be no change. Below are my stats IndexDivisor Memory -1 7 MB 1 486 MB 100 180 MB 1000 176 MB. 1176MB Regards Ganesh - Original Message - From: "Danil

Re: IndexDivisor

2009-12-03 Thread Michael McCandless
Can you run w/ a memory profiler? I don't trust that gc is truly running. Mike On Thu, Dec 3, 2009 at 10:47 AM, Ganesh wrote: > I am doing GC before calculating the memory. Even i set my indexdivisor to > 1 but there seems to be no change. > > Below are my stats >  IndexDivisor Memory >  -

Re: IndexDivisor

2009-12-03 Thread Benjamin Heilbrunn
Maybe the command line argument "-verbose:gc output" would help to determine if GC is running. But you are right - a profiler would be the best way. Benjamin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For a

Getting score of explicit documents for a query

2009-12-03 Thread Erdinc Yilmazel
Hi, Given a query, is there a way to learn score of some specific documents in the index against this query? I don't want to make a global search in the index and rank and sort all the matching documents. What I want to do is learn the rank of a bunch of documents in the index that I can identify

Norm Value of not existing Field

2009-12-03 Thread Benjamin Heilbrunn
Hi, I'm using Lucene 2.9.1 patched with http://issues.apache.org/jira/browse/LUCENE-1260 For some special reason I need to find all documents which contain at least 1 term in a certain field. This works by iterating the norms array only as long as the field exists on every document. For documents

Re: Norm Value of not existing Field

2009-12-03 Thread Michael McCandless
This isn't easy to change; it's hardcoded, in oal.index.NormsWriter, to 1.0, and also in SegmentReader, to 1.0 (when the field doesn't have norms stored, but eg someone is requesting them anyway). 1.0 must encode to 124. I suppose we could empower Similarity to define what the "undefined norm val

Re: Norm Value of not existing Field

2009-12-03 Thread Erick Erickson
It would be clumsier, but you could create a Filter by spinning through all the terms on a field and setting the appropriate bit. You could even do this at startup and store the filters around for all the fields you care about, or cache them when first used. The advantage I see here is that it wo

Snowball Stemmer Question

2009-12-03 Thread Christopher Condit
The Snowball Analyzer works well for certain constructs but not others. In particular I'm having a problem with things like "colossal" vs "colossus" and "hippocampus" vs "hippocampal". Is there a way to customize the analyzer to include these rules? Thanks, -Chris ---

Re: Potential leak of file resources in SpellChecker

2009-12-03 Thread Eirik Bjørsnøs
On Thu, Dec 3, 2009 at 2:15 PM, Michael McCandless wrote: > This sounds like an important bug fix -- could you open a Jira issue & > attach a patch?  Thanks! Mike, I've opened an issue with a patch that should be pretty trivial: https://issues.apache.org/jira/browse/LUCENE-2108 Looking forward

Re: Potential leak of file resources in SpellChecker

2009-12-03 Thread Michael McCandless
Thanks! I'm glad to hear your upgrade to 3.0.0 was smooth. Mike 2009/12/3 Eirik Bjørsnøs : > On Thu, Dec 3, 2009 at 2:15 PM, Michael McCandless > wrote: >> This sounds like an important bug fix -- could you open a Jira issue & >> attach a patch?  Thanks! > > Mike, > > I've opened an issue with

Re: Snowball Stemmer Question

2009-12-03 Thread Otis Gospodnetic
Chris, You could look at KStem to see if that does a better job. Or perhaps WordNet can be used to get the lemma of those terms instead of using stemming. Finally what was I going to say... ah, yes, using synonyms may be another way this can be handled. Otis -- Sematext -- http://sematext.c

Re: Getting score of explicit documents for a query

2009-12-03 Thread Otis Gospodnetic
I think you should be able to use 1+ FilteredQuery (with IDs of your docs) with your main query and thus get the scores only for docs that interest you. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Erdinc Yilmazel > To: java-user@lucen

Re: How to do relevancy ranking in lucene

2009-12-03 Thread DHIVYA M
yes ofcourse but am a beginner in using lucene. So i couldnt find out where , in which part of the code is this ranking handled?   So kindly point me out the place or the code if possible   Thanks in advance, Dhivya --- On Thu, 3/12/09, Erick Erickson wrote: From: Erick Erickson Subject: Re:

Re: IndexDivisor

2009-12-03 Thread Ganesh
Thanks mike.. Please find the attached file. I ran the testing for 1,100,1000,1 divisor value. There is difference from 1 to 100 but there is no difference between 100 to 1. I created a new application, in which i opened all reader and searcher and warmed up. Sleep for a minute and