needs to skip large
>> amount bytes to locate the exact location. In other words the search summary
>> retrieval might slow.
>> 3. It is really good for less number of concurrent users going to search at
>> a time.
>>
>> Regards
>> Ganesh
>>
for less number of concurrent users going to search at
> a time.
>
> Regards
> Ganesh
>
>
>
> - Original Message -
> From: "Ganesh"
> To: ;
> Sent: Tuesday, June 14, 2011 3:28 PM
> Subject: Re: Index size and performance degradation
>
>
&
less number of concurrent users going to search at a
time.
Regards
Ganesh
- Original Message -
From: "Ganesh"
To: ;
Sent: Tuesday, June 14, 2011 3:28 PM
Subject: Re: Index size and performance degradation
Is it a bad idea to keep multiple shards in a single system?
Rega
gt; On Tue, Jun 14, 2011 at 5:58 AM, Ganesh wrote:
>> Is it a bad idea to keep multiple shards in a single system?
>>
>> Regards
>> Ganesh
>>
>> - Original Message -
>> From: "Toke Eskildsen"
>> To:
>> Sent: Tuesday, June 14
al Message -
> From: "Toke Eskildsen"
> To:
> Sent: Tuesday, June 14, 2011 12:58 PM
> Subject: Re: Index size and performance degradation
>
>
>> On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko wrote:
>>> The whole point of my question wa
Is it a bad idea to keep multiple shards in a single system?
Regards
Ganesh
- Original Message -
From: "Toke Eskildsen"
To:
Sent: Tuesday, June 14, 2011 12:58 PM
Subject: Re: Index size and performance degradation
> On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko
On Sunday 12 June 2011 22:12:01 Michael McCandless wrote:
> Anyway, I don't think that's a good tradeoff, in general, for our
> users, because very few apps truly require immediate consistency from
> Lucene (can anyone give an example where their app depends on
> immediate consistency...?
For data
ations for that design choice.
Cheers
Mark
- Original Message
From: Itamar Syn-Hershko
To: java-user@lucene.apache.org
Sent: Tue, 14 June, 2011 9:03:15
Subject: Re: Index size and performance degradation
Thanks. Our product is pretty generic and we can't assume much on the
hardware,
Thanks. Our product is pretty generic and we can't assume much on the
hardware, as well as on usage. Some users would want low latency, others
will prefer throughput. My job is to make as little compromise as
possible...
As for SSD, thats generally a good advice, except they seem to be
faili
Ganesh
- Original Message -
From: "Shai Erera"
To:
Sent: Sunday, June 12, 2011 9:13 AM
Subject: Re: Index size and performance degradation
>I agree w/ Erick, there is no cutoff point (index size for that matter)
> above which you start sharding.
>
> What
On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko wrote:
> The whole point of my question was to find out if and how to make
> balancing on the SAME machine. Apparently thats not going to help and at
> a certain point we will just have to prompt the user to buy more hardware...
It really dep
>
> but you'll still cache the results - so again this isn't viable when RT
> search, or even an NRT, is a requirement
>
No I don't cache the results. The Filter is an OpenBitSet of all docs that
match the filter (e.g. have the specified language field's value) and it is
refreshed whenever new seg
> deletions made by readers merely mark it for
> deletion, and once a doc has been marked for deletions it is deleted for all
> intents and purposes, right?
There's the point-in-timeness of a reader to consider.
> Does the N in NRT represent only the cost of reopening a searcher?
Aptly put, and
Since there should only be one writer, I'm not sure why you'd need
transactional storage for that? deletions made by readers merely mark it
for deletion, and once a doc has been marked for deletions it is deleted
for all intents and purposes, right? But perhaps I need to refresh my
memory on th
> I don't think we'd do the post-filtering solution, but instead maybe
> resolve the deletes "live" and store them in a transactional data
I think Michael B. aptly described the sequence ID approach for 'live' deletes?
On Mon, Jun 13, 2011 at 3:00 PM, Michael McCandless
wrote:
> Yes, adding dele
Yes, adding deletes to Twitter's approach will be a challenge!
I don't think we'd do the post-filtering solution, but instead maybe
resolve the deletes "live" and store them in a transactional data
structure of some kind... but even then we will pay a perf hit to
lookup del docs against it.
So, y
Thanks Mike, much appreciated.
Wouldn't Twitter's approach fall for the exact same pit-hole you
described Zoie does (or did) when it'll handle deletes too? I don't
thing there is any other way of handling deletes other than
post-filtering results. But perhaps the IW cache would be smaller tha
On 13/06/2011 06:23, Shai Erera wrote:
A Language filter is one -- different users search in different languages
and want to view pages in those languages only. If you have a field attach
to your documents that identifies the language of the document, you can use
it to filter the queries to retur
Here's a blog post describing some details of Twitter's approach:
http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html
And here's a talk Michael did last October (Lucene Revolutions):
http://www.lucidimagination.com/events/revolution2010/video-Realtime-Search-Wit
>
> I'm not sure I understood the filters approach you described. Can you give
> an example?
>
A Language filter is one -- different users search in different languages
and want to view pages in those languages only. If you have a field attach
to your documents that identifies the language of the
Thanks for your detailed answer. We'll have to tackle this and see whats
more important to us then. I'd definitely love to hear Zoie has overcame
all that...
Any pointers to Michael Busch's approach? I take this has something to
do with the core itself or index format, probably using the Flex
>From what I understand of Zoie (and it's been some time since I last
looked... so this could be wrong now), the biggest difference vs NRT
is that Zoie aims for "immediate consistency", ie index changes are
always made visible to the very next query, vs NRT which is
"controlled consistency", a blen
Our problem is a bit different. There aren't always common searches so
if we cache blindly we could end up having too much RAM allocated for
virtually nothing. And we need to allow for real-time search so caching
will hardly help. We enforce some client-side caching, but again - the
real-time r
Mike,
Speaking of NRT, and completely off-topic, I know: Lucene's NRT
apparently isn't fast enough if Zoie was needed, and now that Zoie is
around are there any plans to make it Lucene's default? or: why would
one still use NRT when Zoie seem to work much better?
Itamar.
On 12/06/2011 13
>
> Shai, what would you call a smart app-level cache? remembering frequent
> searches and storing them handy?
Remembering frequent searches is good. If you do this, you can warm up the
cache whenever a new IndexSearcher is opened (e.g., if you use
SearcherManager from LIA2) and besides keeping t
Andrew, no particular hardware setup I'm afraid. That is a general
product which we can't assume anything about the hardware it would run
on. Thanks for the tip on multi-core tho.
On 12/06/2011 11:45, Andrew Kane wrote:
In the literature there is some evidence that sharding of in-memory index
Shai, what would you call a smart app-level cache? remembering frequent
searches and storing them handy? or are there more advanced techniques
for that? any pointers appreciated...
Thanks for all the advice!
On 12/06/2011 11:42, Shai Erera wrote:
isn't there anything that we can do to avoi
Remember that memory-mapping is not a panacea: at the end of the day,
if there just isn't enough RAM on the machine to keep your full
"working set" hot, then the OS will have to hit the disk, regardless
of whether the access is through MMap or a "traditional" IO request.
That said, on Fedora Linux
In the literature there is some evidence that sharding of in-memory indexes
on multi-core machines might be better. Has anyone tried this lately?
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4228359
Single disk machines (HDD or SSD) would be slower. Multi-disk or RAID type
setups
>
> isn't there anything that we can do to avoid that?
>
That was my point :) --> you can optimize your search application, use mmap
files, smart caches etc., until it reaches a point where you need to shard.
But it's still application dependent, not much of an OS thing. You can count
on the OS to
Thanks.
The whole point of my question was to find out if and how to make
balancing on the SAME machine. Apparently thats not going to help and at
a certain point we will just have to prompt the user to buy more hardware...
Out of curiosity, isn't there anything that we can do to avoid that
I agree w/ Erick, there is no cutoff point (index size for that matter)
above which you start sharding.
What you can do is create a scheduled job in your system that runs a select
list of queries and monitors their performance. Once it degrades, it shards
the index by either splitting it (you can
<<>>
Hmmm, then it's pretty hopeless I think. Problem is that
anything you say about running on a machine with
2G available memory on a single processor is completely
incomparable to running on a machine with 64G of
memory available for Lucene and 16 processors.
There's really no such thing as an
Hi all,
I know Lucene indexes to be at their optimum up to a certain size - said
to be around several GBs. I haven't found a good discussion over this,
but its my understanding that at some point its better to split an index
into parts (a la sharding) than to continue searching on a huge-size
34 matches
Mail list logo