Re: Scaling out/up or a mix

2009-07-02 Thread Otis Gospodnetic
cores. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Marcus Herou > To: solr-u...@lucene.apache.org; java-user@lucene.apache.org > Sent: Wednesday, July 1, 2009 10:31:28 AM > Subject: Re: Scaling out/up or a mix > > Hi ag

Re: Scaling out/up or a mix

2009-07-01 Thread Marcus Herou
Hi agree that faceting might be the thing that defines this app. The app is mostly snappy during daytime since we optimize the index around 7.00 GMT. However faceting is never snappy. We speeded things up a whole bunch by creating various "less cardinal" fields from the originating publishedDate w

Re: Scaling out/up or a mix

2009-07-01 Thread Toke Eskildsen
On Tue, 2009-06-30 at 22:59 +0200, Marcus Herou wrote: > The number of concurrent users today is insignficant but once we push > for the service we will get into trouble... I know that since even one > simple faceting query (which we will use to display trend graphs) can > take forever (talking abo

Re: Scaling out/up or a mix

2009-06-30 Thread Marcus Herou
Hi, like the sound of this. What I am not familiar with in terms of Lucene is how the index get's swapped in and out of memory. When it comes to database tables (non partitionable tables at least) I know that one should have enough memory to fit the entire index into memory to avoid file-sorts for

Re: Scaling out/up or a mix

2009-06-30 Thread Marcus Herou
Hi. The number of concurrent users today is insignficant but once we push for the service we will get into trouble... I know that since even one simple faceting query (which we will use to display trend graphs) can take forever (talking about SOLR bytw). "Normal" Lucene queries (title:blah OR desc

Re: Scaling out/up or a mix

2009-06-30 Thread Andy Goodell
I have improved date-sorted searching performance pretty dramatically by replacing the two step "search then sort" operation with a one step "use the date as the score" algorithm. The main gotcha was making sure to not affect which results get counted as hits in boolean searches, but overall I onl

RE: Scaling out/up or a mix

2009-06-30 Thread Uwe Schindler
> On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > > Index size(and growing): 16Gx8 = 128G > > Doc size (data): 20k > > Num docs: 90M > > Num users: Few hundred but most critical is that the admin staff which > is > > using the index all day long. > > Query types: Example: title:"Iphone" OR

RE: Scaling out/up or a mix

2009-06-30 Thread Toke Eskildsen
On Tue, 2009-06-30 at 11:29 +0200, Uwe Schindler wrote: > So the simple answer is always: > If 64 bit platform with lots of RAM, use MMapDirectory. Fair enough. That makes the RAM-focused solution much more scalable. My point still stands though, as Marcus is currently examining his hardware optio

RE: Scaling out/up or a mix

2009-06-30 Thread Uwe Schindler
> On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > > Index size(and growing): 16Gx8 = 128G > > Doc size (data): 20k > > Num docs: 90M > > Num users: Few hundred but most critical is that the admin staff which > is > > using the index all day long. > > Query types: Example: title:"Iphone" OR

Re: Scaling out/up or a mix

2009-06-30 Thread Toke Eskildsen
On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > Index size(and growing): 16Gx8 = 128G > Doc size (data): 20k > Num docs: 90M > Num users: Few hundred but most critical is that the admin staff which is > using the index all day long. > Query types: Example: title:"Iphone" OR description:"I

Re: Scaling out/up or a mix

2009-06-29 Thread Marcus Herou
uestion: Based on your findings what is the most challenging part to tune ? Sorting or querying or what else? //Marcus > > > > > > > - Original Message > > From: Marcus Herou > > To: java-user@lucene.apache.org > > Sent: Monday, 29 June, 2009 9:47:

Re: Scaling out/up or a mix

2009-06-29 Thread Toke Eskildsen
On Sat, 2009-06-27 at 00:00 +0200, Marcus Herou wrote: > We currently have about 90M documents and it is increasing rapidly so > getting into the G+ document range is not going to be too far away. We've performed fairly extensive tests regarding hardware for searches and some minor tests on hardwa

Re: Scaling out/up or a mix

2009-06-29 Thread eks dev
> From: Marcus Herou > To: java-user@lucene.apache.org > Sent: Monday, 29 June, 2009 9:47:13 > Subject: Re: Scaling out/up or a mix > > Thanks for the answer. > > Don't you think that part 1 of the email would give you a hint of nature of > the index ? > >

Re: Scaling out/up or a mix

2009-06-29 Thread Marcus Herou
Thanks for the answer. Don't you think that part 1 of the email would give you a hint of nature of the index ? Index size(and growing): 16Gx8 = 128G Doc size (data): 20k Num docs: 90M Num users: Few hundred but most critical is that the admin staff which is using the index all day long. Query typ

Re: Scaling out/up or a mix

2009-06-28 Thread Eric Bowman
There is no single answer -- this is always application specific. Without knowing anything about what you are doing: 1. disk i/o is probably the most critical. Go SSD or even RAM disk if you can, if performance is absolutely critical 2. Sometimes CPU can become an issue, but 8 cores is probably

Re: Scaling out/up or a mix

2009-06-28 Thread Marcus Herou
Hi. I think I need to be more specific. What I am trying to find out is if I should aim for: CPU (2x4 cores, 2.0-3.0Ghz)? or perhaps just a 4 cores is enough. Fast disk IO: 8 disks, RAID1+0 ? or perhaps 2 disks is enough... RAM - if the index does not fit into RAM how much RAM should I then buy ?

Scaling out/up or a mix

2009-06-26 Thread Marcus Herou
Hi. I currently have an index which is 16GB per machine (8 machines = 128GB) (data is stored externally, not in index) and is growing like crazy (we are indexing blogs which is crazy by nature) and have only allocated 2GB per machine to the Lucene app since we are running some other stuff there in