Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Zhangchi Thanks for your reply. We have about 3 million records (different isbns) in the database and documents little more than that, and we wouldn't want to do the deduping at indexing time, because one book ( one isbn ) can be available under 2 or more categories( like fiction, comics &

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Ian, Thanks for your reply. We had actually done what you had suggested first, and it wasn't working, so I was hoping for some sample code. But then we found out that the field name on which we wanted the duplicate filter to be applied was not actually indexed while adding it into the document

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread zhangchi
i think you should check the index first.using the lukeall to see if there is the duplicate books. On Thu, 04 Mar 2010 20:43:26 +0800, ani...@ekkitab wrote: Hi there, Could someone help me with the usage of DuplicateFilters. Here is my problem I have created a search index on book

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Makes sense. Thanks for the tip! I haven't seen a response to my 2-pass scoring question, so maybe I've asked at least one difficult one. :-) - Original Message From: Uwe Schindler To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:32:06 PM Subject: RE: File descriptor lea

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
Sorry, small change: > You should not directly instantiate a TopScoreDocCollector but instead > use the Searcher method that returns TopDocs. This has the benefit, > that the searcher automatically chooses the right parameter for scoring > docs out/in order. In your example, search would be a litt

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
We must have been getting lucky. Thanks Mark and Uwe! - Original Message From: Uwe Schindler To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:20:56 PM Subject: RE: File descriptor leak in ParallelReader.reopen() That was always the same with reopen(). Its documented in the

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
See my other mail for you file descriptor leak. A short note about your search code: You should not directly instantiate a TopScoreDocCollector but instead use the Searcher method that returns TopDocs. This has the benefit, that the searcher automatically chooses the right parameter for scoring

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
That was always the same with reopen(). Its documented in the javadocs, with a short example: http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/IndexReader.html#reopen() also in 2.4.1: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexReader.html#reopen() Uwe

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Has this changed since 2.4.1? Our application didn't explicitly close with 2.4.1 and that combination never had this problem. - Original Message From: Mark Miller To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:00:02 PM Subject: Re: File descriptor leak in ParallelReader.r

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Mark Miller
On 03/04/2010 06:52 PM, Justin wrote: Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index, r

File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index, reopening the index after each, before m

Lucene Web Demo

2010-03-04 Thread DasHeap
Another newcomer to Lucene here. I've got the Lucene web demo up and running on my test server. The indexing and search functions are working perfect. The problem I'm running regards the format of urls to found objects. for instance lucene will return a hit like this: '/Library/Apache2/htdocs/foo

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread Digy
I don't think that it is related with lucene version. Please inspect the C# code below. "fragments1" has no highlight info, on the other hand "fragments2" has one. RAMDirectory dir = new RAMDirectory(); IndexWriter wr = new IndexWriter(dir, new Whites

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread halbtuerderschwarze
Not with Lucene 3.0.1. Tomorrow I will try it with 2.9.2. Arne -- View this message in context: http://old.nabble.com/FastVectorHighlighter-truncated-queries-tp27709797p27786722.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Erick What a star!! Hadn't thought of that. Assumed (always a mistake) that the classpath only pointed to the directory. Using the following command: java -cp /home/paul/bin/lucene-3.0.0/lucene-core-3.0.0.jar:/home/paul/bin/lucene-3.0.0/lucene-demos-3.0.0.jar org.apache.lucene.demo.IndexFiles d

Re: Problem running demo's - java classes not found

2010-03-04 Thread Erick Erickson
Doesn't your classpath need the full path to the jar, not just the containing directory? On Thu, Mar 4, 2010 at 1:22 PM, Paul Rogers wrote: > Dear All > > Further to my previous email I notice I made a mistake with the second > example. When I entered the second command it actually read: > > ja

Re: Why is frequency a float number

2010-03-04 Thread PlusPlus
Thanks for the reply. Actually what I'm looking for is to have a kind of fuzzy memberships for the terms of a document. That is, for each term of a document, I will have a membership value for that term and each term will be in each document, at most once. For that, I will need float TF and IDF

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
On 2010-03-04 17:56, Otis Gospodnetic wrote: Andrzej, Does that mean the regular Lucene QP will get Span query syntax support (vs. having it in that separate Surround QP)? Or maybe that already happened and I missed it? :) I wish that were the case ;) No, this simply means that you will be

Fwd: Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Dear All Further to my previous email I notice I made a mistake with the second example. When I entered the second command it actually read: java -cp org.apache.lucene.demo.IndexFiles docs This is what gave the strange error about the docs Class was. If I issue the correct command: java org.a

Re: Why is frequency a float number

2010-03-04 Thread Chris Hostetter
:I was wondering why TF method gets a float parameter. Isn't frequency : always considered to be integer? : :public abstract float tf(float freq) Take a look at how PhraseQuery and SPanNearQuery use tf(float). For simple terms (and TermQuery) tf is always an integer, but when dealing

Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Dear All Hope someone can help. I'm trying to run the demo's that came with Lucene (3.0.0). I extracted the tar.gz to a directory /home/paul/bin/lucene-3.0.0 and changed into the directory. The contents of the directory are as follows: total 2288 -rw-r--r-- 1 paul paul3759 2009-11-16 14:

Re: Fuzzy membership of a term to the document

2010-03-04 Thread Chris Hostetter
:I want to change the Lucene's similarity in a way that I can add Fuzzy : memberships to the terms of a document. Thus, TF value of a term in one : document is not always 1, it can add 0.7 to the value of the TF ( (In my : application, each term is contained in a document at most once). This :

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread Digy
I used Lucene.Net 2.9.2. Didn't it work? DIGY -Original Message- From: halbtuerderschwarze [mailto:halbtuerderschwa...@web.de] Sent: Thursday, March 04, 2010 6:15 PM To: java-user@lucene.apache.org Subject: RE: FastVectorHighlighter truncated queries I tried MultiTermQuery in combinati

Re: SpanQueries in Luke

2010-03-04 Thread Otis Gospodnetic
Andrzej, Does that mean the regular Lucene QP will get Span query syntax support (vs. having it in that separate Surround QP)? Or maybe that already happened and I missed it? :) Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://searc

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread halbtuerderschwarze
I tried MultiTermQuery in combination with setRewriteMethod: MultiTermQuery mtq = new WildcardQuery(new Term(FIELD, queryString)); mtq.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); Did you also use Lucene 3.0.0? -- View this message in context: http://old.nabble.com/FastVec

Re: SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi Andrzej, Thanks! I'll keep my eyes open for that. FWIW, implementing this by replacing the QueryParser with the CoreParser worked fine. Thanks again, Rene Am 04.03.2010 16:22, schrieb Andrzej Bialecki: On 2010-03-04 14:13, Rene Hackl-Sommer wrote: Hi, I would like to submit SpanQueries

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
On 2010-03-04 14:13, Rene Hackl-Sommer wrote: Hi, I would like to submit SpanQueries in Luke. AFAIK this isn't doable out of the box. What would be the way to go? Replace the built-in QueryParser by e.g. the xml-query-parser from the contrib section? The upcoming Luke 1.0.1 will support this

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread Ian Lea
If the field you want to use for deduping is ISBN, create a DuplicateFilter using whatever your ISBN field name is as the field name and pass that to one of the search methods that takes a filter. If your index is large I'd be worried about performance and would look at deduping at indexing time i

RE: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Murdoch, Paul
Yep. PerFieldAnalyzerWrapper seems to have solved my problem. Thanks, Paul -Original Message- From: java-user-return-45289-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45289-paul.b.murdoch=saic@lucene.apache.org ] On Behalf Of Erick Erickson Sent: Thursday, Mar

Re: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Erick Erickson
I'm still struggling with your overall goal here, but... It sounds like what you're looking for is an exact match in some cases but not others? In which case you could think about indexing the info: field in a second field and adding a clause against *that* field for your phrase case. PerFieldAnal

RE: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Murdoch, Paul
I'm using NOT_ANALYZED because I have a list of text items to index where some of the items are single words and some of the items are two words or more with punctuation. My problem is that sometimes one of the words in a item with two or more words matches one of the single text items. That soun

Re: In memory indexes in clucene

2010-03-04 Thread Erick Erickson
You'd probably get much more pertinent answers asking on the CLucene, see: http://sourceforge.net/apps/mediawiki/clucene/index.php?title=Support Erick On Thu, Mar 4, 2010 at 3:42 AM, wrote: > > Hi, > > I was looking into Luc

SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi, I would like to submit SpanQueries in Luke. AFAIK this isn't doable out of the box. What would be the way to go? Replace the built-in QueryParser by e.g. the xml-query-parser from the contrib section? Thanks, Rene - To

how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi there, Could someone help me with the usage of DuplicateFilters. Here is my problem I have created a search index on book Id , title ,and author from a database of books which fall under various categories. Some books fall under more than one category. Now, when i issue a search, I get back 'X

Re: Lucene Indexing out of memory

2010-03-04 Thread Michael McCandless
I agree, memory profiler or heap dump or small test case is the next step... the code looks fine. This is always a single thread adding docs? Are you really certain that the iterator only iterates over 2500 docs? What analyzer are you using? Mike On Thu, Mar 4, 2010 at 4:50 AM, Ian Lea wrote:

Re: Lucene Indexing out of memory

2010-03-04 Thread Ian Lea
Have you run it through a memory profiler yet? Seems the obvious next step. If that doesn't help, cut it down to the simplest possible self-contained program that demonstrates the problem and post it here. -- Ian. On Thu, Mar 4, 2010 at 6:04 AM, ajay_gupta wrote: > > Erick, > w_context and c

In memory indexes in clucene

2010-03-04 Thread suman . holani
Hi, I was looking into Lucene in-memory Indexes using RAMDirectory. It has also provided with something "MMapDirectory" I want the indexes to persist , so want go for FSDirectory. But to enhance the searching capability , need to put the indexes onto RAM. Now , problem is how can i synchronise b