Re: How to handle corrupt Lucene index

2022-04-13 Thread Robert Muir
If you are looking at the files in hex, you can see the file format docs online for your version: https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/index/SegmentInfos.html SegID is written right after SegName, it is 16 bytes (128-bit number) On Wed, Apr 13, 2022 at 10:59 PM Robert Muir

Re: How to handle corrupt Lucene index

2022-04-13 Thread Tim Whittington
Yeah, I really appreciate the paranoia in the file format. This is a distributed/replicated database (I'd forgotten to mention that until you mentioned distributed replication), so I suspect the database server is shunting actual segment files around during a recovery process and getting things mu

Re: How to handle corrupt Lucene index

2022-04-13 Thread Robert Muir
Honestly the only time i've seen the mixed up files before (and the motivation for the paranoid checks in lucene), was bugs in some distributed replication code. In this case code that was copying files across the network had some bugs (e.g. used hashing of file contents to try to reduce network ch

Re: How to handle corrupt Lucene index

2022-04-13 Thread Tim Whittington
Using a known-broken Lucene index directory, I dropped down to the Lucene API and tracked this down a bit further. My directory listing is this: 17 Mar 13:39 _8w.fdt 17 Mar 13:39 _8w.fdx 17 Mar 13:39 _8w.fnm 17 Mar 13:39 _8w.nvd 17 Mar 13:39 _8w.nvm 17 Mar 13:39 _8w.si 17 Mar 13:

Re: How to handle corrupt Lucene index

2022-04-13 Thread Baris Kazar
yes that is a great point to look at first and that would eliminate any jdbc related issues that may lead to such problems. Best regards From: Tim Whittington Sent: Wednesday, April 13, 2022 9:17:44 PM To: java-user@lucene.apache.org Subject: Re: How to handle co

Re: How to handle corrupt Lucene index

2022-04-13 Thread Baris Kazar
That is a good practice and i pointed out that since i saw lucene 7.0 in the stack trace. Best regards From: Tim Whittington Sent: Wednesday, April 13, 2022 9:15 PM To: java-user@lucene.apache.org Subject: Re: How to handle corrupt Lucene index To be clear, th

Re: How to handle corrupt Lucene index

2022-04-13 Thread Tim Whittington
Thanks for this - I'll have a look at the database server code that is managing the Lucene indexes and see if I can track it down. Tim On Thu, 14 Apr 2022 at 12:41, Robert Muir wrote: > On Wed, Apr 13, 2022 at 8:24 PM Tim Whittington > wrote: > > > > I'm working with/on a database system that

Re: How to handle corrupt Lucene index

2022-04-13 Thread Tim Whittington
To be clear, these indexes are created and read with the same Lucene version (7.3.0). Tim On Thu, 14 Apr 2022 at 12:45, Baris Kazar wrote: > In my experience that if you built index at version x then use index also > in version x. > I never encountered any problems this way witj Lucene. > > Can

Re: How to handle corrupt Lucene index

2022-04-13 Thread Baris Kazar
In my experience that if you built index at version x then use index also in version x. I never encountered any problems this way witj Lucene. Can you maybe recreate lucene index on 7.3.0? Also how do you use database in your scenario? Are you using jdbc like operations like in Oracle database?

Re: How to handle corrupt Lucene index

2022-04-13 Thread Robert Muir
On Wed, Apr 13, 2022 at 8:24 PM Tim Whittington wrote: > > I'm working with/on a database system that uses Lucene for full text > indexes (currently using 7.3.0). > We're encountering occasional problems that occur after unclean shutdowns > of the database , resulting in > "org.apache.lucene.index

How to handle corrupt Lucene index

2022-04-13 Thread Tim Whittington
I'm working with/on a database system that uses Lucene for full text indexes (currently using 7.3.0). We're encountering occasional problems that occur after unclean shutdowns of the database , resulting in "org.apache.lucene.index.CorruptIndexException: file mismatch" errors when the IndexWriter i