Thanks very much for all your suggestions.
I will work through these to see what works. Appreciate that indexing takes
many hours, so it will take me a few days. Working with a subset isn't
really indicative, since the problems only manifest with larger indexes.
(Note that this might be a solut
Not sure the numbers are off w/ documents that big, although I imagine
you are hitting the token limit w/ docs that big. Is this all on one
machine as you described, or are you saying you have a couple of
these? If one, have you tried having just one index?
Since you are using 2.3 (note t
Hi. Here are a couple of thoughts:
1. Your problem description would be a little easier to parse if you didn't use
the word "stored" to refer to fields which are not, in a Lucene sense, stored,
only indexed. For example, one doesn't "store" stemmed and unstemmed versions,
since stemming has ab
For a start, I would lower the merge factor quite a bit. A high merge
factor is over rated :) You will build the index faster, but searches
will be slower and an optimize takes much longer. Essentially, the time
you save when indexing is paid when optimizing anyway. You might as well
amortize t
Hi,
Thanks for your help.
I'm using Lucene 2.3.
Raw document size is about 138G for 1.5M documents, which is about
250k per document.
IndexWriter settings are MergeFactor 50, MaxMergeDocs 2000,
RAMBufferSizeMB 32, MaxFieldLength Integer.MAX_VALUE.
Each document has about 10 short bibliographic
Hmmm, something doesn't sound quite right. You have 10 million docs,
split into 5 or so indexes, right? And each sub index is 150
gigabytes? How big are your documents?
Can you provide more info about what your Directory and IndexWriter
settings are? What version of Lucene are you using
Hi,
Optimizing my index of 1.5 million documents takes days and days.
I have a collection of 10 million documents that I am trying to index
with Lucene. I've divided the collection into chunks of about 1.5 - 2
million documents each. Indexing 1.5 documents is fast enough (about
12 hours), but t
Ryan can you post the output of CheckIndex on your now-working index?
(1800 is still too many files I think, certainly after having
optimized).
ok, 1800 was wrong - that was from a botched attempt where I:
1. ran optimize on the broken 18K file index. It crashed midway through.
2. run Check
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 11, 2007 1:59 PM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > Ryan can you share any details of how you (Solr) is using Lucene? Are
> > you using autoCommit=false? I'd really love to get to the root cause
> > here.
>
> Unfortunately, So
On Nov 11, 2007 1:59 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> Ryan can you share any details of how you (Solr) is using Lucene? Are
> you using autoCommit=false? I'd really love to get to the root cause
> here.
Unfortunately, Solr and Lucene both have something called autocommit now.
"Ryan McKinley" <[EMAIL PROTECTED]> wrote:
> I just used the CheckIndex tool to try to salvage a corrupt index
> (http://www.nabble.com/restoring-a-corrupt-index--tf4783866.html)
>
> Its a great tool thanks!
Phew! I think you are the first user (besides me).
> I'm wondering about adding suppor
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 11, 2007 12:48 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> > > Ryan are you able to update to that commit I just did? If so I think
> > > you should run the tool without -fix and post back what it printed. It
> > > should report an error on t
On Nov 11, 2007 12:48 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> > Ryan are you able to update to that commit I just did? If so I think
> > you should run the tool without -fix and post back what it printed. It
> > should report an error on that one segment due to the missing file.
> > Then,
I just used the CheckIndex tool to try to salvage a corrupt index
(http://www.nabble.com/restoring-a-corrupt-index--tf4783866.html)
Its a great tool thanks!
I'm wondering about adding support for this tool in the solr admin
interface, but have a few questions about how it works before I see if
Ryan are you able to update to that commit I just did? If so I think
you should run the tool without -fix and post back what it printed. It
should report an error on that one segment due to the missing file.
Then, run -fix to remove that segment (please backup your index first!).
Then, if you
On Nov 11, 2007 4:55 AM, Jason Bradfield <[EMAIL PROTECTED]> wrote:
> Basically for any documents returned from a search if they have the same
> score I need them to be returned in a round robin type of ordering based
> on previous searches with the same query.
>
> ie. I have documents A, B and C,
Thanks you for your reply
The thing is i'am trying to emplement a weight for a word form indexing html
web pages.
The is like :
*50% + Weigth(word in doc d) = *20% + * 10% +
...
the code is :
=
doc.add(new Field("url", httpd.
Hi,
I am quite new to Lucene, I've read most of the documentation and can't
find want I need..
Basically for any documents returned from a search if they have the same
score I need them to be returned in a round robin type of ordering based
on previous searches with the same query.
ie. I have
18 matches
Mail list logo