Re: Merging several taxonomy indexes for faceted search

2011-10-19 Thread Shai Erera
Hi Christoph, You can certainly do that, and there are a bunch of APIs that will help you do that. We have a very high-level utility called TaxonomyMergeUtils, which offers a bunch of merge() methods, each taking more parameters. Perhaps start with the simplest one (the one taking 4 directories) a

Re: Picking single results out of a list of results

2011-10-19 Thread Herb Roitblat
Thanks, Ian. On 10/17/2011 2:21 AM, Ian Lea wrote: The Hits class was deprecated at some point and has been removed from recent releases. The 2.9.3 javadoc at http://lucene.apache.org/java/2_9_3/api/core/org/apache/lucene/search/Hits.html shows a little code sample TopDocs topDocs = searcher.s

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Steven A Rowe
Hi Paul, What version of Lucene are you using? The JFlex spec you quote below looks pre-v3.1? Steve > -Original Message- > From: Paul Taylor [mailto:paul_t...@fastmail.fm] > Sent: Wednesday, October 19, 2011 6:50 AM > To: Steven A Rowe; java-user@lucene.apache.org >> "'java- > u...@luc

RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Steven A Rowe
Hi Paul, On 10/19/2011 at 5:26 AM, Paul Taylor wrote: > On 18/10/2011 15:25, Steven A Rowe wrote: > > On 10/18/2011 at 4:57 AM, Paul Taylor wrote: > > > On 18/10/2011 06:19, Steven A Rowe wrote: > > > > Another option is to create a char filter that substitutes > > > > PUNCT-EXCLAMATION for exclam

Re: FW: How to find the original position of the match in the pdf document

2011-10-19 Thread Ian Lea
Second person today who thinks that posting the same question a few hours later is a good idea. Please read http://catb.org/~esr/faqs/smart-questions.html, specifically http://catb.org/~esr/faqs/smart-questions.html#id479876 -- Ian. On Wed, Oct 19, 2011 at 1:41 PM, Vidya Kanigiluppai Sivasubra

FW: How to find the original position of the match in the pdf document

2011-10-19 Thread Vidya Kanigiluppai Sivasubramanian
Hi Can someone answer my question please Regards, Vidya From: Vidya Kanigiluppai Sivasubramanian Sent: Wednesday, October 19, 2011 6:06 PM To: ''java-user@lucene.apache.org' Subject: FW: How to find the original position of the match in the pdf document Hi Can someone answer my question p

Re: OutOfMemoryError

2011-10-19 Thread Tamara Bobic
Thank you all (Otis, Mead, Uwe) for your replies! It was very helpful and the problem turned out to be very trivial. I was running 32-bit java instead of 64-bit and not enough memory could be reserved. Thanks once again, I finally managed to do the whole run successfully :) All the best, Tamara

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Paul Taylor
On 18/10/2011 05:19, Steven A Rowe wrote: Hi Paul, You could add a rule to the StandardTokenizer JFlex grammar to handle this case, bypassing its other rules. THis seemed to be working, just to test it out I changed the EMAIL one to this EMAIL = ("!"|"*"|"^"|"!"|"."|"@"|"%"|"♠"|"\"")+

Merging several taxonomy indexes for faceted search

2011-10-19 Thread Christoph Kaser
Hi all, I am planing to change my existing lucene index to use the new facets introduced in lucene 3.4.0. Unfortunately, I could not find an answer to my question in the documentation: I create a relatively large index of 8 million books by dividing it into several smaller groups of docume

How to find the original position of the match in the pdf document

2011-10-19 Thread Vidya Kanigiluppai Sivasubramanian
Hi, I am using lucene 2.4.1 in my project and want to extract the original match term position in the text doc. I tried using the TermPositions. But it does not give me the original position of the term in the text source. Kindly help me on this. Thanks Vidya K S

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-19 Thread Paul Taylor
On 18/10/2011 15:25, Steven A Rowe wrote: Hi Paul, On 10/18/2011 at 4:57 AM, Paul Taylor wrote: On 18/10/2011 06:19, Steven A Rowe wrote: Another option is to create a char filter that substitutes PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc., Yes that is how I firs

Merging several taxonomy indexes for faceted search

2011-10-19 Thread Christoph Kaser
Hi all, I am planing to change my existing lucene index to use the new facets introduced in lucene 3.4.0. Unfortunately, I could not find an answer to my question in the documentation: I create a relatively large index of 8 million books by dividing it into several smaller groups of docume