Re: Compaction strategy for update heavy workload

kurt greaves Wed, 13 Jun 2018 21:58:13 -0700

>
> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.


How do you risk having data sticking around forever when everything is
TTL'd?

If you use really large buckets, what's the point of TWCS?

No one said anything about really large buckets. I'd also note that if the
data was so small per partition it would be entirely reasonable to not
bucket by partition key (and window) and thus updates would become
irrelevant.

Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.


While the numbers sound small, there must be some logical reason to have so
many nodes. In my experience STCS and LCS both have their own drawbacks in
regards to updates, more so when you have high data density, which sounds
like it might be the case here. It's not hard to test these things and it's
important to get these things right at the start to save yourself some
serious pain down the track.

On 13 June 2018 at 22:41, Jonathan Haddad <j...@jonhaddad.com> wrote:

> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.  If you use really large buckets, what's the point of TWCS?
>
> Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.
> On Wed, Jun 13, 2018 at 3:34 PM kurt greaves <k...@instaclustr.com> wrote:
> >
> > TWCS is probably still worth trying. If you mean updating old rows in
> TWCS "out of order updates" will only really mean you'll hit more SSTables
> on read. This might add a bit of complexity in your client if your
> bucketing partitions (not strictly necessary), but that's about it. As long
> as you're not specifying "USING TIMESTAMP" you still get the main benefit
> of efficient dropping of SSTables - C* only cares about the write timestamp
> of the data in regards to TTL's, not timestamps stored in your
> partition/clustering key.
> > Also keep in mind that you can specify the window size in TWCS, so if
> you can increase it enough to cover the "out of order" updates then that
> will also solve the problem w.r.t old buckets.
> >
> > In regards to LCS, the only way to really know if it'll be too much
> compaction overhead is to test it, but for the most part you should
> consider your read/write ratio, rather than the total number of
> reads/writes (unless it's so small that it's irrelevant, which it may well
> be).
> >
> > On 13 June 2018 at 19:25, manuj singh <s.manuj...@gmail.com> wrote:
> >>
> >> Hi all,
> >> I am trying to determine compaction strategy for our use case.
> >> In our use case we will have updates on a row a few times. And we have
> a ttl also defined on the table level.
> >> Our typical workload is less then 1000 writes + reads per second. At
> the max it could go up to 2500 per second.
> >> We use SSD and have around 64 gb of ram on each node. Our cluster size
> is around 70 nodes.
> >>
> >> I looked at time series but we cant guarantee that the updates will
> happen within a give time window. And if we have out of order updates it
> might impact on when we remove that data from the disk.
> >>
> >> So i was looking at level tiered, which supposedly is good when you
> have updates. However its io bound and will affect the writes. everywhere i
> read it says its not good for write heavy workload.
> >> But Looking at our write velocity, is it really write heavy ?
> >>
> >> I guess what i am trying to find out is will level tiered compaction
> will impact the writes in our use case or it will be fine given our write
> rate is not that much.
> >> Also is there anything else i should keep in mind while deciding on the
> compaction strategy.
> >>
> >> Thanks!!
> >
> >
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Compaction strategy for update heavy workload

Reply via email to