cores.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Marcus Herou
> To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
> Sent: Wednesday, July 1, 2009 10:31:28 AM
> Subject: Re: Scaling out/up or a mix
>
> Hi ag
Hi agree that faceting might be the thing that defines this app. The app is
mostly snappy during daytime since we optimize the index around 7.00 GMT.
However faceting is never snappy.
We speeded things up a whole bunch by creating various "less cardinal"
fields from the originating publishedDate w
On Tue, 2009-06-30 at 22:59 +0200, Marcus Herou wrote:
> The number of concurrent users today is insignficant but once we push
> for the service we will get into trouble... I know that since even one
> simple faceting query (which we will use to display trend graphs) can
> take forever (talking abo
Hi, like the sound of this.
What I am not familiar with in terms of Lucene is how the index get's
swapped in and out of memory. When it comes to database tables (non
partitionable tables at least) I know that one should have enough memory to
fit the entire index into memory to avoid file-sorts for
Hi.
The number of concurrent users today is insignficant but once we push for
the service we will get into trouble... I know that since even one simple
faceting query (which we will use to display trend graphs) can take forever
(talking about SOLR bytw). "Normal" Lucene queries (title:blah OR
desc
I have improved date-sorted searching performance pretty dramatically by
replacing the two step "search then sort" operation with a one step "use the
date as the score" algorithm. The main gotcha was making sure to not affect
which results get counted as hits in boolean searches, but overall I onl
> On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote:
> > Index size(and growing): 16Gx8 = 128G
> > Doc size (data): 20k
> > Num docs: 90M
> > Num users: Few hundred but most critical is that the admin staff which
> is
> > using the index all day long.
> > Query types: Example: title:"Iphone" OR
On Tue, 2009-06-30 at 11:29 +0200, Uwe Schindler wrote:
> So the simple answer is always:
> If 64 bit platform with lots of RAM, use MMapDirectory.
Fair enough. That makes the RAM-focused solution much more scalable.
My point still stands though, as Marcus is currently examining his
hardware optio
> On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote:
> > Index size(and growing): 16Gx8 = 128G
> > Doc size (data): 20k
> > Num docs: 90M
> > Num users: Few hundred but most critical is that the admin staff which
> is
> > using the index all day long.
> > Query types: Example: title:"Iphone" OR
On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote:
> Index size(and growing): 16Gx8 = 128G
> Doc size (data): 20k
> Num docs: 90M
> Num users: Few hundred but most critical is that the admin staff which is
> using the index all day long.
> Query types: Example: title:"Iphone" OR description:"I
uestion:
Based on your findings what is the most challenging part to tune ? Sorting
or querying or what else?
//Marcus
>
>
>
>
>
>
> - Original Message
> > From: Marcus Herou
> > To: java-user@lucene.apache.org
> > Sent: Monday, 29 June, 2009 9:47:
On Sat, 2009-06-27 at 00:00 +0200, Marcus Herou wrote:
> We currently have about 90M documents and it is increasing rapidly so
> getting into the G+ document range is not going to be too far away.
We've performed fairly extensive tests regarding hardware for searches
and some minor tests on hardwa
> From: Marcus Herou
> To: java-user@lucene.apache.org
> Sent: Monday, 29 June, 2009 9:47:13
> Subject: Re: Scaling out/up or a mix
>
> Thanks for the answer.
>
> Don't you think that part 1 of the email would give you a hint of nature of
> the index ?
>
>
Thanks for the answer.
Don't you think that part 1 of the email would give you a hint of nature of
the index ?
Index size(and growing): 16Gx8 = 128G
Doc size (data): 20k
Num docs: 90M
Num users: Few hundred but most critical is that the admin staff which is
using the index all day long.
Query typ
There is no single answer -- this is always application specific.
Without knowing anything about what you are doing:
1. disk i/o is probably the most critical. Go SSD or even RAM disk if
you can, if performance is absolutely critical
2. Sometimes CPU can become an issue, but 8 cores is probably
Hi. I think I need to be more specific.
What I am trying to find out is if I should aim for:
CPU (2x4 cores, 2.0-3.0Ghz)? or perhaps just a 4 cores is enough.
Fast disk IO: 8 disks, RAID1+0 ? or perhaps 2 disks is enough...
RAM - if the index does not fit into RAM how much RAM should I then buy ?
Hi.
I currently have an index which is 16GB per machine (8 machines = 128GB)
(data is stored externally, not in index) and is growing like crazy (we are
indexing blogs which is crazy by nature) and have only allocated 2GB per
machine to the Lucene app since we are running some other stuff there in
17 matches
Mail list logo