date:20071111

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest

Thanks very much for all your suggestions. I will work through these to see what works. Appreciate that indexing takes many hours, so it will take me a few days. Working with a subset isn't really indicative, since the problems only manifest with larger indexes. (Note that this might be a solut

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll

Not sure the numbers are off w/ documents that big, although I imagine you are hitting the token limit w/ docs that big. Is this all on one machine as you described, or are you saying you have a couple of these? If one, have you tried having just one index? Since you are using 2.3 (note t

Re: Optimizing index takes too long

2007-11-11 Thread J.J. Larrea

Hi. Here are a couple of thoughts: 1. Your problem description would be a little easier to parse if you didn't use the word "stored" to refer to fields which are not, in a Lucene sense, stored, only indexed. For example, one doesn't "store" stemmed and unstemmed versions, since stemming has ab

Re: Optimizing index takes too long

2007-11-11 Thread Mark Miller

For a start, I would lower the merge factor quite a bit. A high merge factor is over rated :) You will build the index faster, but searches will be slower and an optimize takes much longer. Essentially, the time you save when indexing is paid when optimizing anyway. You might as well amortize t

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest

Hi, Thanks for your help. I'm using Lucene 2.3. Raw document size is about 138G for 1.5M documents, which is about 250k per document. IndexWriter settings are MergeFactor 50, MaxMergeDocs 2000, RAMBufferSizeMB 32, MaxFieldLength Integer.MAX_VALUE. Each document has about 10 short bibliographic

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll

Hmmm, something doesn't sound quite right. You have 10 million docs, split into 5 or so indexes, right? And each sub index is 150 gigabytes? How big are your documents? Can you provide more info about what your Directory and IndexWriter settings are? What version of Lucene are you using

Optimizing index takes too long

2007-11-11 Thread Barry Forrest

Hi, Optimizing my index of 1.5 million documents takes days and days. I have a collection of 10 million documents that I am trying to index with Lucene. I've divided the collection into chunks of about 1.5 - 2 million documents each. Indexing 1.5 documents is fast enough (about 12 hours), but t

Re: restoring a corrupt index?

2007-11-11 Thread Ryan McKinley

Ryan can you post the output of CheckIndex on your now-working index? (1800 is still too many files I think, certainly after having optimized). ok, 1800 was wrong - that was from a botched attempt where I: 1. ran optimize on the broken 18K file index. It crashed midway through. 2. run Check

Re: restoring a corrupt index?

2007-11-11 Thread Michael McCandless

"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 11, 2007 1:59 PM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > Ryan can you share any details of how you (Solr) is using Lucene? Are > > you using autoCommit=false? I'd really love to get to the root cause > > here. > > Unfortunately, So

Re: restoring a corrupt index?

2007-11-11 Thread Yonik Seeley

On Nov 11, 2007 1:59 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Ryan can you share any details of how you (Solr) is using Lucene? Are > you using autoCommit=false? I'd really love to get to the root cause > here. Unfortunately, Solr and Lucene both have something called autocommit now.

Re: CheckIndex tool

2007-11-11 Thread Michael McCandless

"Ryan McKinley" <[EMAIL PROTECTED]> wrote: > I just used the CheckIndex tool to try to salvage a corrupt index > (http://www.nabble.com/restoring-a-corrupt-index--tf4783866.html) > > Its a great tool thanks! Phew! I think you are the first user (besides me). > I'm wondering about adding suppor

Re: restoring a corrupt index?

2007-11-11 Thread Michael McCandless

"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 11, 2007 12:48 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > > Ryan are you able to update to that commit I just did? If so I think > > > you should run the tool without -fix and post back what it printed. It > > > should report an error on t

Re: restoring a corrupt index?

2007-11-11 Thread Yonik Seeley

On Nov 11, 2007 12:48 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > Ryan are you able to update to that commit I just did? If so I think > > you should run the tool without -fix and post back what it printed. It > > should report an error on that one segment due to the missing file. > > Then,

CheckIndex tool

2007-11-11 Thread Ryan McKinley

I just used the CheckIndex tool to try to salvage a corrupt index (http://www.nabble.com/restoring-a-corrupt-index--tf4783866.html) Its a great tool thanks! I'm wondering about adding support for this tool in the solr admin interface, but have a few questions about how it works before I see if

Re: restoring a corrupt index?

2007-11-11 Thread Ryan McKinley

Ryan are you able to update to that commit I just did? If so I think you should run the tool without -fix and post back what it printed. It should report an error on that one segment due to the missing file. Then, run -fix to remove that segment (please backup your index first!). Then, if you

Re: round robin search results with same score

2007-11-11 Thread Yonik Seeley

On Nov 11, 2007 4:55 AM, Jason Bradfield <[EMAIL PROTECTED]> wrote: > Basically for any documents returned from a search if they have the same > score I need them to be returned in a round robin type of ordering based > on previous searches with the same query. > > ie. I have documents A, B and C,

RE : Re: RE : Re: RE : Re: problem undestanding the hits.score

2007-11-11 Thread Jamal H Tandina

Thanks you for your reply The thing is i'am trying to emplement a weight for a word form indexing html web pages. The is like : *50% + Weigth(word in doc d) = *20% + * 10% + ... the code is : = doc.add(new Field("url", httpd.

round robin search results with same score

2007-11-11 Thread Jason Bradfield

Hi, I am quite new to Lucene, I've read most of the documentation and can't find want I need.. Basically for any documents returned from a search if they have the same score I need them to be returned in a round robin type of ordering based on previous searches with the same query. ie. I have

Re: Optimizing index takes too long

Re: Optimizing index takes too long

Re: Optimizing index takes too long

Re: Optimizing index takes too long

Re: Optimizing index takes too long

Re: Optimizing index takes too long

Optimizing index takes too long

Re: restoring a corrupt index?

Re: restoring a corrupt index?

Re: restoring a corrupt index?

Re: CheckIndex tool

Re: restoring a corrupt index?

Re: restoring a corrupt index?

CheckIndex tool

Re: restoring a corrupt index?

Re: round robin search results with same score

RE : Re: RE : Re: RE : Re: problem undestanding the hits.score

round robin search results with same score

18 matches

Site Navigation

Mail list logo

Footer information