date:20090903

RE: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-03 Thread Uwe Schindler

This is normal. When you open an IndexReader/IndexSearcher, it opens various file handles. If you additionally update/add/delete documents in parallel (even in other process), or optimize the index, the original IndexReader stays on using the "old" state of the index. IndexWriter deletes some files

too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-03 Thread Ganesh

Hello all, In my linux pc, there are too many fd counts for lucene database. /proc//fd shows very big list. I have provided sample below. lr-x--1 root root 64 Sep 3 17:02 360 -> /opt/ganesh/lucenedb/_2w5.tvf (deleted) lr-x--1 root root 64 Sep 3 17:0

Re: First result in the group

2009-09-03 Thread Ganesh

Thanks shai and mark for your suggestions. I initially tried DuplicateFilter and it is not giving me expected results. It removes the duplicates at query time and not in the results. Regards Ganesh - Original Message - From: "mark harwood" To: Sent: Wednesday, September 02, 2009 5:36

Re: Extending Sort/FieldCache

2009-09-03 Thread Shai Erera

Thanks I plan to look into two things, and then probably create two separate issues: 1) Refactor the FieldCache API (and TopFieldCollector) such that one can provide its own Cache of native values. I'd hate to rewrite the FieldComparators logic just because the current API is not extendable. That

Re: Extending Sort/FieldCache

2009-09-03 Thread Chris Hostetter

: I wanted to avoid two things: : * Writing the logic that invokes cache-refresh upon IndexReader reload. Uh... i don't think there is any code that FieldCache refreshing on reload (yet), so you wouldn't be missing out on anything. (as long as your custom cache works at the SegmentReader level

Re: Can this regex be done?

2009-09-03 Thread Robert Muir

just a side note, LUCENE-1606 is intended to address exactly the performance issue that you described. rather than depending upon constant prefix or enumerating terms, it can efficiently skip through the term dictionary. the downside is that this behavior depends upon the ability to compile a reg

Re: Field.Store.NO & Field.Index.NOT_ANALYZED & hashCode

2009-09-03 Thread Chris Hostetter

: As for the exact matching, I am wondering if I should store the hashcode of : the text in a separate field and convert the text in the query to a hashcode : before passing it on or if Lucene already does something like that under the : covers when it sees Field.Store.NO & Field.Index.NOT_ANALYZE

Re: Can this regex be done?

2009-09-03 Thread Chris Hostetter

: Because some of the queries that I have to convert (without modifying : them, unfortunately) have a half literally a page of statements : expressed like that that, if expanded, would equal a several page long : lucene query. FWIW: the RegexQuery (in contrib) applies the regex input to every ter

RE: TokenStream API, Quick Question.

2009-09-03 Thread Uwe Schindler

The indexer only call getAttribute/addAttribute one time after initializing (see docs). It will never call it later. If you cache tokens, you always have to restore the state into the TokenStream's attributes. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u

TokenStream API, Quick Question.

2009-09-03 Thread Daniel Shane

Does a TokenStream have to return always the same number of attributes with the same underlying classes for all the tokens it generates? I mean, during the tokenization phase, can the first "token" have a Term and Offset Attribute and the second "token" only a Type Attribute or does this mean

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-03 Thread Daniel Shane

Ok, I got it, from checking other filters, I should call input.incrementToken() instead of super.incrementToken(). Do you feel this kind of breaks the object model (super.incrementToken() should also work). Maybe when the old API is gone, we can stop checking if someone has overloaded next()

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-03 Thread Daniel Shane

Uwe Schindler wrote: There may be a problem that you may not want to restore the peek token into the TokenFilter's attributes itsself. It looks like you want to have a Token instance returned from peek, but the current Stream should not reset to this Token (you only want to "look" into the next T

Re: New "Stream closed" exception with Java 6

2009-09-03 Thread Grant Ingersoll

On Sep 2, 2009, at 7:45 AM, Chris Bamford wrote: Hi Grant, I have now followed Daniel's advice and catch the exception with: try { indexWriter.addDocument(doc); What does your Document/Field creation code look like? In other words, how do you construct doc? Seems like somethin

Re: Use of tika for parsing, offsets questions

2009-09-03 Thread Grant Ingersoll

On Sep 2, 2009, at 5:40 AM, David Causse wrote: Hi, If I use tika for parsing HTML code and inject parsed String to a lucene analyzer. What about the offset information for KWIC and return to text (like the google cache view)? how can I keep track of the offsets between tika parser and lu

RE: Use of tika for parsing, offsets questions

2009-09-03 Thread Uwe Schindler

An additional good solution for Lucene (from 2.9 on), would be to create a special TIKA analyzer that can be used to directly add TIKA-parseable content and metadata to the Tokenstream as Attributes (using the new API) or only text and offset data (old Lucene TokenStream API). I wrote something si

Re: Use of tika for parsing, offsets questions

2009-09-03 Thread Jukka Zitting

Hi, On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > If I use tika for parsing HTML code and inject parsed String to a lucene > analyzer. What about the offset information for KWIC and return to text > (like the google cache view)? how can I keep track of the offsets > between tika parser and

RE: too many file descriptors opened by Lucene shows (deleted) in /proc

too many file descriptors opened by Lucene shows (deleted) in /proc

Re: First result in the group

Re: Extending Sort/FieldCache

Re: Extending Sort/FieldCache

Re: Can this regex be done?

Re: Field.Store.NO & Field.Index.NOT_ANALYZED & hashCode

Re: Can this regex be done?

RE: TokenStream API, Quick Question.

TokenStream API, Quick Question.

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

Re: New "Stream closed" exception with Java 6

Re: Use of tika for parsing, offsets questions

RE: Use of tika for parsing, offsets questions

Re: Use of tika for parsing, offsets questions

16 matches

Site Navigation

Mail list logo

Footer information