Re: Help with delimited text

2011-04-05 Thread Mark Wiltshire
To add more information I am then wanting to search this field using part or all of the path using wildcards i.e. Search category_path with /Top/My Prods* Hi java-users I need some help. I am indexing categories into a single field category_path

Highlighting a phrase with "Single"

2011-04-05 Thread shrinath.m
If there is a phrase in search, the highlighter highlights every word separately.. Like this : I love Lucene Instead what I want is like this : I love Lucene Is there a way to ask Lucene do this ? I know we could ask css or jquery to do the task, but whats the point ? right ? So, is

Re: Re: Re: A likely bug of TermsPosition.nextPosition

2011-04-05 Thread 袁武 [GMail]
Dear Mike: Thanks to your help, and apologise for delayed reply. Yes, the execption still ocuur in 3.1 and the index is now being rebuilt for 3.1. In the index, only the term '\1' has payload, If the search switch to other terms, the exception doesn't raise. If you can send me a new recent co

Re: Question about open files

2011-04-05 Thread Jean-Baptiste Reure
We are using version 3.0.3. So you can confirm that closing the writer (and the reader created from that writer) should be enough to release the file handles? If that is the case, that means our application has a bug somewhere that i need to track down. Thanks, JB. On 5 April 2011 19:48, Ian Lea

RE: word + ngram tokenization

2011-04-05 Thread Steven A Rowe
Hi Shambhu, ShingleFilter will construct word n-grams: http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html Steve > -Original Message- > From: sham singh [mailto:shamsing...@gmail.com] > Sent: Tuesday, April 05, 2011 5:53 PM > T

word + ngram tokenization

2011-04-05 Thread sham singh
Hi All, I have to do tokenization which is combination of NGram and Standard tokenization for ex if the content is :"the quick brown fox jumped over the lazy dog" requirement is to tokenize into: quick brown fox brown fox jumped fox jumped over etc .. .. Please help me to find out best analyzer

Help with delimited text

2011-04-05 Thread Mark Wiltshire
Hi java-users I need some help. I am indexing categories into a single field category_path Which may contain items such as /Top/Books,/Top/My Prods/Book Prods/Text Books, /Maths/Books/TextBooks i.e. category paths delimited by , I want to store this field, so the Analyser tokenizes the document

Re: OutOfMemoryError with FSDirectory

2011-04-05 Thread Michael McCandless
Try 1) reducing the RAM buffer of your IndexWriter (IndexWriter.setRAMBufferSizeMB), 2) using a term divisor when opening your reader (pass 2 or 3 or 4 as termInfosIndexDivisor when opening IndexReader), and 3) disabling norms or not indexing as many fields as possible. 70Mb is not that much RAM t

finding the first document created since 3 weeks ago - Numeric Field

2011-04-05 Thread tal steier
Hi, I'm indexing a time stamp for every document, using a Numeric Field. Was wandering if there's a correct way to find the first document newer then a specific date (say 3 weeks ago). I know I can perform a range search for a range starting 3 weeks ago and ending now, but was wandering if there i

OutOfMemoryError with FSDirectory

2011-04-05 Thread Claudio R
Hi, I am using Lucene 2.9.4 with FSDirectory. My index has 80 thousand documents (each document has 12 fields). My jvm has 70Mb of RAM memory (limited by my hosting). I am getting various OutOfMemoryError. I ran jmap and I got: num   #instances    #bytes    Class description -

OutOfMemoryError with FSDirectory

2011-04-05 Thread Claudio R
Hi, I am using Lucene 2.9.4 with FSDirectory. My index has 80 thousand documents (each document has 12 fields). My jvm has 70Mb of RAM memory (limited by my hosting). I am getting various OutOfMemoryError. I ran jmap and I got: num   #instances    #bytes    Class description -

OutOfMemoryError with FSDirectory

2011-04-05 Thread Claudio R
Hi, I am using Lucene 2.9.4 with FSDirectory. My index has 80 thousand documents (each document has 12 fields). My jvm has 70Mb of RAM memory (limited by my hosting). I am getting various OutOfMemoryError. I ran jmap and I got: num   #instances    #bytes    Class description -

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Michael McCandless
This (HashDocSet, and any other impls that handle the sparse case well) could be useful to have in Lucene's core. For example, for certain MultiTermQuerys we have this CONSTANT_SCORE_AUTO_REWRITE, which has iffy smelling heuristics to try to determine the best cutover point from ConstantScoreQuer

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Yonik Seeley
On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman wrote: > Seems like SortedVIntList can be used to store the info, but it has no > methods to build the list in the first place, requiring an array or bitset > in the constructor. It has a constructor that takes DocIdSetIterator - so you can pass an

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Jason Rutherglen
I think Solr has a HashDocSet implementation? On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless wrote: > Can we simply factor out (poach!) those useful-sounding classes from > Nutch into Lucene? > > Mike > > http://blog.mikemccandless.com > > On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman > w

Re: Retrieving the first document in a range

2011-04-05 Thread Yonik Seeley
On Tue, Apr 5, 2011 at 10:06 AM, Shai Erera wrote: > Can we use TermEnum to skip to the first term 'after 3 weeks'? If so, we can > pull the first doc that appears in the TermDocs of that Term (if it's a > valid term). Yep. Try this to get the term you want to use to seek: BytesRef term

Retrieving the first document in a range

2011-04-05 Thread Shai Erera
Hi We have a date field which is indexed as NumericField and we'd like to get the first docid that is since 3 weeks ago. Currently we're doing something like this: {code} Query q = NumericRangeQuery.newLongRange("date", timeBefore3Weeks, System.currentTimeMillis(), true, true); Scorer s = q.weigh

Re: Concurrent Issue

2011-04-05 Thread Ian Lea
You don't say exactly how you are dealing with the concurrent access (one shared Reader/Searcher? Each user with own Reader/Searcher? Something else?) but the underlying problem is that the reader has been closed while something else is still using it. This can easily happen in a multi-threaded se

RE: Lucene 3.1

2011-04-05 Thread Steven A Rowe
Hi Tanuj, Can you be more specific? What file did you download? (Lucene 3.1 has three downloadable packages: -src.tar.gz, .tar.gz, and .zip.) What did you expect to find that is not there? (Some examples would help.) Steve > -Original Message- > From: Tanuj Jain [mailto:tanujjain.

Lucene 3.1

2011-04-05 Thread Tanuj Jain
Hi, I have downloaded lucene 3.1 and want to use in my program. I found lot of files that differ/missing from lucene 3.0. Is there any way I could get those files as a whole rather than searching for each file and downloading it.

Concurrent Issue

2011-04-05 Thread Yogesh Dabhi
Hi My application is cluster in jobss application servers & lucene directory was shared. Concurrently 5 user access same lucene directory for searching document That time I got bellow exception org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed is there a way

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Michael McCandless
Can we simply factor out (poach!) those useful-sounding classes from Nutch into Lucene? Mike http://blog.mikemccandless.com On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman wrote: > I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). > > Many of our indexes are 5M+ Documents,

Re: Question about open files

2011-04-05 Thread Ian Lea
Which version of lucene are you using? Something changed in the 3.x release, and maybe 2.9.x, in the way that old file handles are closed. Previously it wasn't always necessary to explicitly close everything, now it is. Your usage sounds fine to me. When I've hit too many open files when using

Re: Lucene Merge failing on Open Files

2011-04-05 Thread Michael McCandless
Yeah, that mergeFactor is way too high and will cause too-many-open-files (if the index has enough segments). Also, you should setRamBufferSizeMB instead of maxBufferedDocs, for faster index throughput. Calling optimize from two threads doesn't help it run faster when using ConcurrentMergeSchedul

Re: Field Aware TokenFilter

2011-04-05 Thread Ian Lea
Can you use PerFieldAnalyzerWrapper? That would be the normal way to approach this, specifying a different, synonym aware, analyzer for the relevant field(s). -- Ian. On Mon, Apr 4, 2011 at 11:31 PM, Christopher Condit wrote: > I need to add synonyms to an index depending on the field being in

Question about open files

2011-04-05 Thread Jean-Baptiste Reure
Hi all, I have been looking for information about this and found a few things here and there but nothing very clear on when files are opened and closed by Lucene. We have an application that uses Lucene quite heavily in the following fashion: there are multiple indexes in use at all times. For ea