nodetool compact is the ultimate "running with scissors" solution, far more
people manage to stab themselves in the eye. Customers running with
scissors successfully not withstanding.

My favorite discussions usually tend to result:

   1. "We still have tombstones" ( so they set gc_grace_seconds to 0)
   2. "We added a node after fixing it and now a bunch of records that were
   deleted have come back" (usually after setting gc_grace_seconds to 0 and
   then not blanking nodes that have been offline)
   3. Why are my read latencies so spikey?  (cause they're on STC and now
   have a giant single huge SStable which worked fine when their data set was
   tiny, now they're looking at 100 sstables on STC, which means sllllloooowww
   reads)
   4. "We still have tombstones" (yeah I know this again, but this is
   usually when they've switched to LCS, which basically noops with nodetool
   compact)

All of this is managed when you have a team that understands the tradeoffs
of nodetool compact, but I categorically reject it's a good experience for
new users, as I've unfortunately had about dozen fire drills this year as a
result of nodetool compact alone.

Data modeling around partitions that are truncated when falling out of
scope is typically far more manageable, works with any compaction strategy,
and doesn't require operational awareness at the same scale.

On Fri, Jan 2, 2015 at 2:15 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Jan 2, 2015 at 11:28 AM, Colin <co...@clark.ws> wrote:
>
>> Forcing a major compaction is usually a bad idea.  What is your reason
>> for doing that?
>>
>
> I'd say "often" and not "usually". Lots of people have schema where they
> create way too much garbage, and major compaction can be a good response.
> The docs' historic incoherent FUD notwithstanding.
>
> =Rob
>
>



-- 

Thanks,
Ryan Svihla

Reply via email to