Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Li Li
it's up to your machines. in our application, we indexs about 30,000,000(30M)docs/shard, and the response time is about 150ms. our machine has about 48GB memory and about 25GB is allocated to solr and other is used for disk cache in Linux. if calculated by our application, indexing 1.25T docs will

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
Well, I am sooo embarrassed: I haven't stuffed this badly in quite a while. But in the end, 13 shards is the right number. My calculator work was OK, my English usage atrocious. I'm still interested in opinion on using object storage for (static) indexes, big enough that they won't all fit in m

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Erick Erickson
I'm all confused. 100M X 13 shards = 1.3G records, not 1.25 T But I get it 1.5 x 10^7 x 12 x 7 = 1.26 x 10 ^ 9 = 1.26 Billion, or am I off base again? But yes, at 100M records that would be 13 servers. As for whether 100M documents/shard is reasonable... it depends (tm). There are so many variabl

Re: Why read past EOF

2012-02-07 Thread superruiye
public class PostponeCommitDeletionPolicy implements IndexDeletionPolicy { private final static long deletionPostPone = 60; public void onInit(List commits) { // Note that commits.size() should normally be 1: onCommit(commits); }

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
Oops again! Turns out I got to the right result earlier by the wrong means! I found this reference (http://www.dejavutechnologies.com/faq-solr-lucene.html) that states shards can be up to 100,000,000 documents. So, I'm back to 13 shards again. Phew! Now I'm just wondering if Cassandra/Lucandra

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
Whoops! Very poor basic maths, I should have written it down. I was thinking 13 shards. But yes, 13,000 is a bit different. Now I'm in even more need of help. How is "easy" - 15 million audit records a month, coming from several active systems, and a requirement to keep and search across seven

Applying LUCENE-3653 patch to Lucene 3.0.3

2012-02-07 Thread Dhruv
Hi, My company is using an older version of Lucene (3.0.3). In my profiling results with 3.0.3, I have found that my app's threads were blocked due to the issue mentioned at LUCENE-3653. Although I was able to use the 3.6 line which fixes this problem, we are still in the process of conducting per

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Erick Erickson
I'm curious what the nature of your data is such that you have 1.25 trillion documents. Even at 100M/shard, you're still talking 12,500 shards. The "laggard" problem will rear it's ugly head, not to mention the administration of that many machines will be, shall we say, non-trivial... Best Erick

Re: Custom Payload Analyzer and Query

2012-02-07 Thread Ian Lea
How does searching with PayloadSpanUtil/PayloadTermQuery/etc work to exclude/filter the matching terms based on the payload within a query itself, the original question? The javadocs for PayloadSpanUtil say that the IndexReader should only contain doc of interest so not much use for a general quer

Re: Custom Payload Analyzer and Query

2012-02-07 Thread Tommaso Teofili
2012/2/6 Ian Lea > Not sure if you got an answer to this or not. Don't recall seeing one > and gmail threading says not. > > > Is the use of payloads I've described appropriate? > > Sounds OK to me, although I'm not sure why you can't store the > metadata as a Document Field. > > > Can I exclude