Re: Solr indexes consume a lot of space for largebucket sizes

2011-09-28 Thread Ryan Zezeski
Harshal, I take it you are using the default backend, bitcask? If so you should know it's an append only log format that is periodically merged. So until a merge is triggered the data will only grow. Even if you delete objects it still grows as a delete is actually just a special tombstone msg

Re: Solr indexes consume a lot of space for largebucket sizes

2011-09-28 Thread Jeremiah Peschka
Without knowing the exact internals of Riak Search, most of Riak uses log structured hash trees. When an "update" occurs, a new copy of the data is written. Eventually, during a compaction process, the dead records are cleaned up. There are rules for how often compactions run. You can change som

Solr indexes consume a lot of space for largebucket sizes

2011-09-28 Thread Harshal Dhir
Hi, We are performing index on a "heavy" write bucket, and it seems its consuming a lot of diskspace. The logic is something like this: "a", "b", "c" are the logical categories, we save combination of these categories "a b", "a c", "b c", "a b c" and so we are looking at lot of data. But, 10G