Re: Updating specific fields of huge docs

2019-02-14 Thread Marcio Napoli
Hi Luís, If the contents of the files dont change one solution is to store the text parsed by tika in a compressed way, ~7% extracted text size. In updating the document, just search the old one with the contents ready (compressed) and update the other fields that you need. Best, Marcio http://w

Re: Updating specific fields of huge docs

2019-02-14 Thread Luís Filipe Nassif
Thank you, Erick. Unfortunately we need to index those fields. Currently we do not store text because of storage requirements and it is slow to extract it again. Thank you for the tips. Luis Em qua, 13 de fev de 2019 18:13, Erick Erickson If (and only if) the fields you need to update are sing

Re: Updating specific fields of huge docs

2019-02-13 Thread Erick Erickson
If (and only if) the fields you need to update are single-valued, docValues=true, indexed=false, you can do in-place update of the DV field only. Otherwise, you'll probably have to split the docs up. The question is whether you have evidence that reindexing is too expensive. If you do need to spl

Updating specific fields of huge docs

2019-02-13 Thread Luís Filipe Nassif
Hi all, Lucene 7 still deletes and re-adds docs when an update operation is done, as I understood. When docs have dozens of fields and one of them is large text content (extracted by Tika) and if I need to update some other small fields, what is the best approach to not reindex that large text fi