Multi-threaded indexing can speed things up. Use two threads per CPU
to get maximum throughput. I wrote a simple Python program to do that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 6, 2025, at 5:11 PM, Robi Petersen <robip...@gmail.com> wrote:
> 
> Hi Bruno,
> 
> As an aside, in general you'd want your staging (pre-prod) solr instance to
> exactly match your production solr instance in every way (like solr
> version) possible.
> 
> Another thought is to have several indexing machines, each pointing at a
> portion of those 200M textfiles, to speed up indexing the entire corpus.
> 
> Cheers
> Robi
> 
> On Sat, Apr 5, 2025 at 4:08 AM Bruno Mannina <bmann...@matheo-software.com>
> wrote:
> 
>> Hi Colvin,
>> 
>> Thank for your answer and your link, I will see if I can solve my problem.
>> 
>> I use a old solr, I know :'(.
>> This old version is used since several years and I have a huge set of data
>> (around 200M of textfile to index).
>> Re-indexing my set of data will take too much time for me (several week).
>> 
>> It's a pre-production solr (I used a Solr 8.11.3 on my production).
>> This pre-production is used to check data before dumping in Production.
>> 
>> 
>> Cordialement, Best Regards
>> Bruno Mannina
>> www.matheo-software.com
>> www.patent-pulse.com
>> Mob. +33 0 634 421 817
>> 
>> 
>> -----Message d'origine-----
>> De : Colvin Cowie [mailto:colvin.cowie....@gmail.com]
>> Envoyé : vendredi 4 avril 2025 11:57
>> À : users@solr.apache.org
>> Objet : Re: Solr error...
>> 
>> Hello,
>> 
>> I think we might need some more context here, that is to say, why are you
>> using Solr 5.5.1? That was released in 2016 and is very much out of date
>> and unsupported (and will contain a number of critical CVEs).
>> So rather than trying to make it work, can you instead move to the latest
>> release (9.8.1)? A lot of things have changed in the last 9 years, so maybe
>> consider it as a fresh start?
>> 
>> By the sounds of the error, the *file* is corrupt now, that doesn't mean
>> the disk is corrupt. The reason for why that happened is probably not going
>> to be apparent, though if you go back through your logs you might identify
>> the cause.
>> A little googling of  org.apache.lucene.index.CorruptIndexException
>> suggests that you may be able to "fix" the corrupt index (and lose the
>> corrupted documents in the process) https://stackoverflow.com/a/14934177
>> 
>> But I would seriously recommend that you move to a supported version and
>> reindex your data from source instead either way.
>> 
>> 
>> 
>> On Thu, 3 Apr 2025 at 23:58, Bruno Mannina <bmann...@matheo-software.com>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> 
>>> 
>>> I have on my new computer with a solr (5.5.1) a collection with an error.
>>> 
>>> My new computer is 1.5 year old (4*4to Nvme)
>>> 
>>> 
>>> 
>>> I check my disk and I have no error ?!
>>> 
>>> 
>>> 
>>> Do you know if I can do something to solve it ?
>>> 
>>> 
>>> 
>>> Many thanks for your help !
>>> 
>>> 
>>> 
>>> The error message is:
>>> 
>>> 
>>> 
>>> java.lang.IllegalStateException: this writer hit an unrecoverable
>>> error; cannot complete commit
>>>         at
>>> org.apache.lucene.index.IndexWriter.finishCommit(IndexWriter.java:2985)
>>>         at
>>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2970)
>>>         at
>>> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2930)
>>>         at
>>> 
>>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler
>>> 2.java
>>> :619)
>>>         at
>>> 
>> org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1464)
>>>         at
>>> org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1264)
>>>         at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>>> Source)
>>>         at java.util.concurrent.FutureTask.run(Unknown Source)
>>>         at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>>> Source)
>>>         at java.util.concurrent.FutureTask.run(Unknown Source)
>>>         at
>>> 
>>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.
>>> run(Ex
>>> ecutorUtil.java:231)
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>>> Source)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>>> Source)
>>>         at java.lang.Thread.run(Unknown Source) Caused by:
>>> org.apache.lucene.index.CorruptIndexException: checksum failed
>>> (hardware problem?) : expected=d0a2833f actual=64e63211
>>> 
>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="C:\Users\Uti
>>> lisate
>>> ur\INDEX\FTCLAIMS\index\_8znd.cfs") [slice=_8znd.fdt]))
>>>         at
>>> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:334)
>>>         at
>>> org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:451)
>>>         at
>>> 
>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.che
>>> ckInte
>>> grity(CompressingStoredFieldsReader.java:669)
>>>         at
>>> 
>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.mer
>>> ge(Com
>>> pressingStoredFieldsWriter.java:595)
>>>         at
>>> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
>>>         at
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
>>>         at
>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
>>>         at
>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
>>>         at
>>> 
>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMer
>>> geSche
>>> duler.java:588)
>>>         at
>>> 
>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concu
>>> rrentM
>>> ergeScheduler.java:626)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Cordialement, Best Regards
>>> 
>>> Bruno Mannina
>>> 
>>> <http://www.matheo-software.com/> www.matheo-software.com
>>> 
>>> <http://www.patent-pulse.com/> www.patent-pulse.com
>>> 
>>> Mob. +33 0 634 421 817
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Cet e-mail a été vérifié par le logiciel antivirus d'Avast.
>>> www.avast.com
>> 
>> 
>> --
>> Cet e-mail a été vérifié par le logiciel antivirus d'Avast.
>> www.avast.com
>> 

Reply via email to