This is normal. When you open an IndexReader/IndexSearcher, it opens various
file handles. If you additionally update/add/delete documents in parallel
(even in other process), or optimize the index, the original IndexReader
stays on using the "old" state of the index. IndexWriter deletes some files
Hello all,
In my linux pc, there are too many fd counts for lucene database.
/proc//fd shows very big list. I have provided sample below.
lr-x--1 root root 64 Sep 3 17:02 360 ->
/opt/ganesh/lucenedb/_2w5.tvf (deleted)
lr-x--1 root root 64 Sep 3 17:0
Thanks shai and mark for your suggestions.
I initially tried DuplicateFilter and it is not giving me expected results. It
removes the duplicates at query time and not in the results.
Regards
Ganesh
- Original Message -
From: "mark harwood"
To:
Sent: Wednesday, September 02, 2009 5:36
Thanks
I plan to look into two things, and then probably create two separate
issues:
1) Refactor the FieldCache API (and TopFieldCollector) such that one can
provide its own Cache of native values. I'd hate to rewrite the
FieldComparators logic just because the current API is not extendable. That
: I wanted to avoid two things:
: * Writing the logic that invokes cache-refresh upon IndexReader reload.
Uh... i don't think there is any code that FieldCache refreshing on
reload (yet), so you wouldn't be missing out on anything. (as long as
your custom cache works at the SegmentReader level
just a side note, LUCENE-1606 is intended to address exactly the
performance issue that you described.
rather than depending upon constant prefix or enumerating terms, it
can efficiently skip through the term dictionary.
the downside is that this behavior depends upon the ability to compile
a reg
: As for the exact matching, I am wondering if I should store the hashcode of
: the text in a separate field and convert the text in the query to a hashcode
: before passing it on or if Lucene already does something like that under the
: covers when it sees Field.Store.NO & Field.Index.NOT_ANALYZE
: Because some of the queries that I have to convert (without modifying
: them, unfortunately) have a half literally a page of statements
: expressed like that that, if expanded, would equal a several page long
: lucene query.
FWIW: the RegexQuery (in contrib) applies the regex input to every ter
The indexer only call getAttribute/addAttribute one time after initializing
(see docs). It will never call it later. If you cache tokens, you always
have to restore the state into the TokenStream's attributes.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u
Does a TokenStream have to return always the same number of attributes
with the same underlying classes for all the tokens it generates?
I mean, during the tokenization phase, can the first "token" have a Term
and Offset Attribute and the second "token" only a Type Attribute or
does this mean
Ok, I got it, from checking other filters, I should call
input.incrementToken() instead of super.incrementToken().
Do you feel this kind of breaks the object model (super.incrementToken()
should also work).
Maybe when the old API is gone, we can stop checking if someone has
overloaded next()
Uwe Schindler wrote:
There may be a problem that you may not want to restore the peek token into
the TokenFilter's attributes itsself. It looks like you want to have a Token
instance returned from peek, but the current Stream should not reset to this
Token (you only want to "look" into the next T
On Sep 2, 2009, at 7:45 AM, Chris Bamford wrote:
Hi Grant,
I have now followed Daniel's advice and catch the exception with:
try {
indexWriter.addDocument(doc);
What does your Document/Field creation code look like? In other
words, how do you construct doc? Seems like somethin
On Sep 2, 2009, at 5:40 AM, David Causse wrote:
Hi,
If I use tika for parsing HTML code and inject parsed String to a
lucene
analyzer. What about the offset information for KWIC and return to
text
(like the google cache view)? how can I keep track of the offsets
between tika parser and lu
An additional good solution for Lucene (from 2.9 on), would be to create a
special TIKA analyzer that can be used to directly add TIKA-parseable
content and metadata to the Tokenstream as Attributes (using the new API) or
only text and offset data (old Lucene TokenStream API).
I wrote something si
Hi,
On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote:
> If I use tika for parsing HTML code and inject parsed String to a lucene
> analyzer. What about the offset information for KWIC and return to text
> (like the google cache view)? how can I keep track of the offsets
> between tika parser and
16 matches
Mail list logo