gt;
> I managed to get rid of the Reader's "memory leak" by actually setting to
> null the pointer to the actual Tika's Reader in my wrapper when the wrapper
> is closed. But I still think that I would be nicer if IndexWriter wouldn't
> ma
The "simple" solution is very easy:
Index the markup-free document by adding with new Field.Index.ANALYZED and
Field.Store.NO, so it does not get stored. Then again add the same data (but
with markup) to the index with Field.Store.YES but Field.Index.NO. If you like
you can do this even with the
Hi All,
I am working on some language data and i need to index/search it. I
have used lucene for indexing plain text documents before as well (no
fancy tricks, just plain text indexing). The data that i have now is
transcribed text and is heavily marked up. (Its mostly conversations
and interviews
27;s Reader in my wrapper when the wrapper
is closed. But I still think that I would be nicer if IndexWriter wouldn't
maintain references to the Readers after indexing.
[3] http://img.skitch.com/20100407-ntn2kg13fx49wx4q118bp9h1hb.jpg
On Wed, Apr 7, 2010 at 10:35 PM, Ruben Laguna wrote:
>
osed after IndexWriter.updateDocument. Each one of those Readers retains
1MB. The question is why IndexWriter holds references to those Readers after
the Documents have been indexed.
[1] http://img.skitch.com/20100407-1183815yiausisg73u9wfgscsj.jpg
[2] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg
--
/Rubén
Just to update and close this thread (I forgot about it) :
after investigation it turns out that 75% of the time of the custom
async-indexer (see original email) was spend in FieldInfos.add(...) . More
specifically in the part where fieldname is interned using String.intern().
Copy/pasing and usi
Hi,
(I know that this is probably not recommended and not a common
scenario, but...)
Is it possible to have an application using Lucene and a separate
(i.e. different JVM) instance of Solr both pointing at the same
index and read/write to the index from both applications?
I am trying (separately