Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-11 Thread Wojtek
Hi, thanks for the link and indeed https://issues.apache.org/jira/browse/LUCENE-7171 / https://github.com/apache/lucene/issues/8226 seems to be the issue here. > Maybe try a simple `new TermQuery(new Term("id", "flags-1-1"))` query during update and see if it returns the correct ans? That was t

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Gautam Worah
I'm confused as to what could be happening. Google led me to this StackOverflow link: https://stackoverflow.com/questions/36402235/lucene-stringfield-gets-tokenized-when-doc-is-retrieved-and-stored-again which references some longstanding old issues about fields changing their "types" and so on. Th

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Wojtek
Hi, thank you for reply and apologies for being somewhat "all over the place". Regarding "tokenization" - should it happen if I use StringField? When the document is created (before writing) i see in the debugger it's not tokenized and is of type StringField: ``` doc = {Document@4830} "Documen

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Gautam Worah
Hey, I don't think I understand the email well but I'll try my best. In your printed docs, I see that the flag data is still tokenized. See the string that you printed: DOCS stored,indexed,tokenized,omitNorms. What does your code for adding the doc look like? Are you using StringField for adding

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Wojtek
Addendum, output is: ``` maxDoc: 3 maxDoc (after second flag): 3 Document stored,indexed,tokenized,omitNorms,indexOptions=DOCS stored,indexed,tokenized,omitNorms,indexOptions=DOCS stored> Document stored,indexed,tokenized,omitNorms,indexOptions=DOCS stored,indexed,tokenized,omitNorms,indexOpti

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Wojtek
Thank you Gautam! This works. Now I went back to Lucene and I'm hitting the wall. In James they set document with "id" being constructed as "flag--" (e.g. ""). I run the code that updates the documents with flags and afterwards check the result. The code simple code I use new reader from the wri

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Gautam Worah
Hey, Use a StringField instead of a TextField for the title and your test will pass. Tokenization which is enabled for TextFields, is breaking your fancy title into tokens split by spaces, which is causing your docs to not match. https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/documen

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-10 Thread Wojtek
Hi Froh, thank you for the information. I updated the code and re-open the reader - it seems that the update is reflected and search for old document doesn't yield anything but the search for new term fails. I output all documents (there are 2) and the second one has new title but when searching

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-09 Thread Michael Froh
Hi Wojtek, Thank you for linking to your test code! When you open an IndexReader, it is locked to the view of the Lucene directory at the time that it's opened. If you make changes, you'll need to open a new IndexReader before those changes are visible. I see that you tried creating a new IndexS

Re: updating document

2006-08-15 Thread Karel Tejnora
Im sending a snippet of code how to reconstruct UNSTORED fields. It has two parts: DB+terms Class.forName("org.postgresql.Driver").newInstance(); con = DriverManager.getConnection("jdbc:postgresql:lucene", "lucene", "lucene"); PreparedStatement psCompany=con.prepareStatemen

Re: updating document

2006-08-15 Thread Karel Tejnora
Well, you can have! :-) Even I have not tested, just an idea. You can get document id after add - numDocs() and insert if DB fails, you can delete document from RAMDir. Or in my case of batches - im adding documents in DB with savepoint, than create clear index (create=true) and at the end if

Re: updating document

2006-08-12 Thread Jason Polites
This strategy can also be nicely abstracted from your main app. Whilst I haven't yet implemented it, my plan is to create a template style structure which tells me which fields are in lucene, and which are externalized. This way I don't bother storing data in lucene that it stored elsewhere, but

Re: updating document

2006-08-11 Thread Karel Tejnora
Jason is right. I think, even Im not expert on lucene too, your newly added document cann't recreate terms for field with analyzer, because field text in empty. There is very hairy solution - hack a IndexReader, FieldInfosWriter and use addIndexes. Lucene is "only" a fulltext search library, n

Re: updating document

2006-08-10 Thread Jason Polites
Unfortunately yes. It doesn't really have anything to do with the way you access the index (I don't think). The fact is that the data is simply not in the document. When you add the document again it is effectively "re-indexed", so if the raw data of the field is empty, then it won't be indexed

Re: updating document

2006-08-10 Thread Deepan Chakravarthy
On Fri, 2006-08-11 at 01:58 +1000, Jason Polites wrote: > Are your storing the contents of the fields in the index? That is, > specifying Field.Store.YES when creating the field? > > In my experience fields which are not stored are not recoverable from the > index (well.. they can be reconstructe

Re: updating document

2006-08-10 Thread Karel Tejnora
Hi, I'm facing similar problem. I found a possible way, how to copy a part of index (w/o copy whole index,delete,optimize), but don't know how to change/add/remove field (or add term vector in my case) to existing index. To copy a part of index override methods in IndexReader /** Returns

Re: updating document

2006-08-10 Thread Jason Polites
Are your storing the contents of the fields in the index? That is, specifying Field.Store.YES when creating the field? In my experience fields which are not stored are not recoverable from the index (well.. they can be reconstructed but it's a lossy process). So when you retrieve the document,

Re: updating document

2006-08-10 Thread Deepan Chakravarthy
On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote: > You say "Those documents that we updated are not searchable now". I've got > to ask the obvious question, did you close and re-open the *searcher* > (really, the indexreader you use in your searcher)? I suspect you have, but > thought I'd a

Re: updating document

2006-08-10 Thread Erick Erickson
You say "Those documents that we updated are not searchable now". I've got to ask the obvious question, did you close and re-open the *searcher* (really, the indexreader you use in your searcher)? I suspect you have, but thought I'd ask explicitly. I'd also get a copy of Luke (http://www.getopt.o

Re: updating document

2006-08-10 Thread Doron Cohen
Hi Deepan, The steps below seems correct, given that all the fields of the original document are also stored - the javadoc for indexReader.document(int n) (which I assume is what you are using) says: " Returns the stored fields of the nth Document in this index." - so, only stored fields would exis