Re: corrupted index Lucene 4.4

2013-10-29 Thread Chris
Hi Mike, I changed my program and now the indexing is better. How ever I have run into another issue - I get characters like - �� - CTA - in the solr index. I am adding Java beans to solr by the addBean() function. This seems to be a character encoding issue. Any poi

Re: corrupted index Lucene 4.4

2013-10-23 Thread Chris
Hi Mike, Thanks, I have asked there also, they are investigating, will let you know if something turns up on that front :) On Thu, Oct 24, 2013 at 1:30 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hi Chris, > > Sorry, I don't know much about Solr cloud; maybe as on the solr-use

Re: corrupted index Lucene 4.4

2013-10-23 Thread Michael McCandless
Hi Chris, Sorry, I don't know much about Solr cloud; maybe as on the solr-user list, and give details about what went wrong? Mike McCandless http://blog.mikemccandless.com On Wed, Oct 23, 2013 at 11:25 AM, Chris wrote: > Wow !!! Thanks a lot for the helpfull tips I will implement this in the

Re: corrupted index Lucene 4.4

2013-10-23 Thread Chris
Wow !!! Thanks a lot for the helpfull tips I will implement this in the next two days & report back with my indexing speedI have one more question... i tried committing to solr cloud, but then something was not correct as it would not index after a few documents... Also, There seems to be som

Re: corrupted index Lucene 4.4

2013-10-23 Thread Michael McCandless
Indexing 100M web pages really should not take months; if you fix committing after every row that should make things much faster. Use multiple index threads, set a highish RAM buffer (~512 MB), use a local disk not a remote mounted fileserver, ideally an SSD, etc. See http://wiki.apache.org/lucen

Re: corrupted index Lucene 4.4

2013-10-23 Thread Chris
Actually, it contains about 100 million webpages and was built out of a web index for NLP processing :( I did the indexing & crawling over one small sized serverand researching and getting it all to this stage took me this much time...and now my index is un-usable :( On Wed, Oct 23, 2013 at

Re: corrupted index Lucene 4.4

2013-10-23 Thread Michael McCandless
On Wed, Oct 23, 2013 at 10:33 AM, Chris wrote: > I am not exactly sure if the commit() was run, as i am inserting each row & > doing a commit right away. My solr will not load the index I'm confused: if you are doing a commit right away after every row (which is REALLY bad practice: that's in

Re: corrupted index Lucene 4.4

2013-10-23 Thread Chris
I am not exactly sure if the commit() was run, as i am inserting each row & doing a commit right away. My solr will not load the index is there anyway that i can fix this, I have a huge index & will loose months if i try to reindex :( I didnt know lucene was not stable, I thought it was On W

Re: corrupted index Lucene 4.4

2013-10-23 Thread Michael McCandless
Hmm. Had you actually run a commit() on the index prior to the power loss? If so, a power loss should have left the index as of that last commit. Unfortunately, without a segments_N file, CheckIndex is unusable; a readable segments_N file is currently necessary to recover anything from the index

Re: corrupted index Lucene 4.4

2013-10-23 Thread Chris
Hi Mike, Thanks for the reply. I think it was due to power outage. I don't see any segments file except for segments.gen this is what i see in the folder. Please help - - _a73s_7sy.del _s91x.tvx _sa7s_Lucene41_0.tip _a73s.fdt _s9ez_9.del _sa7s.nvd _a

Re: corrupted index Lucene 4.4

2013-10-23 Thread Michael McCandless
How did this corruption happen? If you "ls" your index directory, is there any segments_N file? Mike McCandless http://blog.mikemccandless.com On Wed, Oct 23, 2013 at 9:01 AM, Chris wrote: > Hi, > > I am running solr 4.4 & one of my collections seems to have a corrupted > index... > > I tried

Re: Corrupted index

2005-04-11 Thread Doug Cutting
Bill Tschumy wrote: So, did this happen because he copied the data while in an inconsistent state? I'm a bit surprised that an inconsistent index is ever left on disk (except for temporarily while something is being written). Would this happen if there was a Writer that was not closed? An inde

Re: Corrupted index

2005-04-11 Thread Doug Cutting
Daniel Naber wrote: Yes, the *.cfs shows that this is a compound index which has *.fnm files only when it's being modified. When creating a compound segment, a "segments" file is never written that refers to the segment until the .cfs file is created and the .fnm files are removed. The real pro

Re: Corrupted index

2005-04-11 Thread Bill Tschumy
Daniel, Thanks for responding on this thread. I doubt the copy was made while the index was being updated and I don't see any indication of a crash. Just for my clarification, if I update the index, but don't close the IndexWriter (because I may need it again soon), can the index on disk be le

Re: Corrupted index

2005-04-08 Thread Daniel Naber
On Friday 08 April 2005 23:51, Bill Tschumy wrote: > Would > this happen if there was a Writer that was not closed? Either the copy was done while the index was being updated, or the previous index update didn't finish (e.g. because it crashed before the index was closed). Regards Daniel --

Re: Corrupted index

2005-04-08 Thread Bill Tschumy
So, did this happen because he copied the data while in an inconsistent state? I'm a bit surprised that an inconsistent index is ever left on disk (except for temporarily while something is being written). Would this happen if there was a Writer that was not closed? On Apr 8, 2005, at 1:22 PM

Re: Corrupted index

2005-04-08 Thread Daniel Naber
On Friday 08 April 2005 19:26, Bill Tschumy wrote: > The only thought I had was that he copied the data while the app was  > still running and perhaps it was in an inconsistent state. Yes, the *.cfs shows that this is a compound index which has *.fnm files only when it's being modified. You're

Re: Corrupted index

2005-04-08 Thread Daniel Naber
On Friday 08 April 2005 19:26, Bill Tschumy wrote: > The only thought I had was that he copied the data while the app was  > still running and perhaps it was in an inconsistent state. Yes, the *.cfs shows that this is a compound index which has *.fnm files only when it's being modified. You're