Without knowing the exact internals of Riak Search, most of Riak uses log structured hash trees. When an "update" occurs, a new copy of the data is written. Eventually, during a compaction process, the dead records are cleaned up. There are rules for how often compactions run. You can change some of those settings for Riak Search: http://wiki.basho.com/Riak-Search---Operations-and-Troubleshooting.html
You can think of this as being similar to the ghost record clean up/vacuum process that runs in your favorite RDMBS (as long as that RDBMS implements some form of MVCC). --- Jeremiah Peschka - Founder, Brent Ozar PLF, LLC Microsoft SQL Server MVP On Sep 28, 2011, at 2:26 PM, Harshal Dhir wrote: > Hi, > > We are performing index on a "heavy" write bucket, and it seems its consuming > a lot of diskspace. > > The logic is something like this: > > "a", "b", "c" are the logical categories, we save combination of these > categories > > "a b", "a c", "b c", "a b c" and so we are looking at lot of data. But, 10G > is quiet a huge number. So, we are little perplexed as to why this number is > so huge. Is it due to vector clocks, since multiple writes could be happening > at the same time. > > Again help is needed on this one? > > Thanks > Harshal > > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com