Re: Solr indexes consume a lot of space for largebucket sizes

Jeremiah Peschka Wed, 28 Sep 2011 11:43:14 -0700

Without knowing the exact internals of Riak Search, most of Riak uses log 
structured hash trees. When an "update" occurs, a new copy of the data is 
written. Eventually, during a compaction process, the dead records are cleaned 
up. There are rules for how often compactions run. You can change some of those 
settings for Riak Search: 
http://wiki.basho.com/Riak-Search---Operations-and-Troubleshooting.html


You can think of this as being similar to the ghost record clean up/vacuum 
process that runs in your favorite RDMBS (as long as that RDBMS implements some 
form of MVCC).
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Sep 28, 2011, at 2:26 PM, Harshal Dhir wrote:

> Hi,
> 
> We are performing index on a "heavy" write bucket, and it seems its consuming 
> a lot of diskspace. 
> 
> The logic is something like this:
> 
> "a", "b", "c" are the logical categories, we save combination of these 
> categories 
> 
> "a b", "a c", "b c", "a b c" and so we are looking at lot of data. But, 10G 
> is quiet a huge number. So, we are little perplexed as to why this number is 
> so huge. Is it due to vector clocks, since multiple writes could be happening 
> at the same time.
> 
> Again help is needed on this one?
> 
> Thanks
> Harshal
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Solr indexes consume a lot of space for largebucket sizes

Reply via email to