IMHO, having many tombstones can slow down reads and writes in the
following cases :
 - For reads, it is slow if the requested slice contains many tombstones
 - For writes, it is is slower if the row in the memtable contains many
tombstones.  It's because, if the IntervalTree contains N intervals, and
one tombstone must be added, then a new IntervalTree must be recreated.

But it's true that writes are less impacted than reads.

Sylvain, if you need/want some help/info for CASSANDRA-5677, don't hesitate
to ask.



2013/6/28 Sylvain Lebresne <sylv...@datastax.com>

> As documented at http://cassandra.apache.org/doc/cql3/CQL.html#collections,
> the lists have 3 operations that require a read before a write (and should
> thus be avoided in performance sensitive code), namely setting and deleting
> by index, and removing by value. Outside of that, collections involves no
> read before writes.
>
> But, as you said, if you do overwrite a collection, the previous
> collection is removed (using a range tombstone) while the new one is added.
> This should have almost no impact on the insertion itself however (the
> tombstone is in the same internal mutation than the update itself, it's not
> 2 operations). But yes, if you do often overwrite collections in the same
> partition, this might have some impact on reads due to CASSANDRA-5677, and
> we'll look at fixing that.
>
> So in theory collections should have no special impact on writes, at least
> nothing that is by design. If you do observe differently and have a way to
> reproduce, feel free to open a JIRA issue. But I'm afraid we'll need more
> than "two guys on stackoverflow claims they've seem write performance
> degradation due to collection" to get going.
>
> --
> Sylvain
>
>
> On Fri, Jun 28, 2013 at 7:30 AM, Theo Hultberg <t...@iconara.net> wrote:
>
>> the thing I was doing was definitely triggering the range tombstone
>> issue, this is what I was doing:
>>
>>     UPDATE clocks SET clock = ? WHERE shard = ?
>>
>> in this table:
>>
>>     CREATE TABLE clocks (shard INT PRIMARY KEY, clock MAP<TEXT,
>> TIMESTAMP>)
>>
>> however, from the stack overflow posts it sounds like they aren't
>> necessarily overwriting their collections. I've tried to replicate their
>> problem with these two statements
>>
>>     INSERT INTO clocks (shard, clock) VALUES (?, ?)
>>     UPDATE clocks SET clock = clock + ? WHERE shard = ?
>>
>> the first one should create range tombstones because it overwrites the
>> the map on every insert, and the second should not because it adds to the
>> map. neither of those seems to have any performance issues, at least not on
>> inserts.
>>
>> and it's the slowdown on inserts that confuses me, both the stack
>> overflow questioners say that they saw a drop in insert performance. I
>> never saw that in my application, I just got slow reads (and Fabien's
>> explanation makes complete sense for that). I don't understand how insert
>> performance could be affected at all, and I know that for non-counter
>> columns cassandra doesn't read before it writes, but is it the same for
>> collections too? they are a bit special, but how special are they?
>>
>> T#
>>
>>
>> On Fri, Jun 28, 2013 at 7:04 AM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> Can you provide details of the mutation statements you are running ? The
>>> Stack Overflow posts don't seem to include them.
>>>
>>> Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 27/06/2013, at 5:58 AM, Theo Hultberg <t...@iconara.net> wrote:
>>>
>>> do I understand it correctly if I think that collection modifications
>>> are done by reading the collection, writing a range tombstone that would
>>> cover the collection and then re-writing the whole collection again? or is
>>> it just the modified parts of the collection that are covered by the range
>>> tombstones, but you still get massive amounts of them and its just their
>>> number that is the problem.
>>>
>>> would this explain the slowdown of writes too? I guess it would if
>>> cassandra needed to read the collection before it wrote the new values,
>>> otherwise I don't understand how this affects writes, but that only says
>>> how much I know about how this works.
>>>
>>> T#
>>>
>>>
>>> On Wed, Jun 26, 2013 at 10:48 AM, Fabien Rousseau <fab...@yakaz.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm pretty sure that it's related to this ticket :
>>>> https://issues.apache.org/jira/browse/CASSANDRA-5677
>>>>
>>>> I'd be happy if someone tests this patch.
>>>> It should apply easily on 1.2.5 & 1.2.6
>>>>
>>>> After applying the patch, by default, the current implementation is
>>>> still used, but modify your cassandra.yaml to add the following one :
>>>> interval_tree_provider: IntervalTreeAvlProvider
>>>>
>>>> (Note that implementations should be interchangeable, because they
>>>> share the same serializers and deserializers)
>>>>
>>>> Also, please note that this patch has not been reviewed nor intensively
>>>> tested... So, it may not be "production ready"
>>>>
>>>> Fabien
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2013/6/26 Theo Hultberg <t...@iconara.net>
>>>>
>>>>> Hi,
>>>>>
>>>>> I've seen a couple of people on Stack Overflow having problems with
>>>>> performance when they have maps that they continuously update, and in
>>>>> hindsight I think I might have run into the same problem myself (but I
>>>>> didn't suspect it as the reason and designed differently and by accident
>>>>> didn't use maps anymore).
>>>>>
>>>>> Is there any reason that maps (or lists or sets) in particular would
>>>>> become a performance issue when they're heavily modified? As I've
>>>>> understood them they're not special, and shouldn't be any different
>>>>> performance wise than overwriting regular columns. Is there something
>>>>> different going on that I'm missing?
>>>>>
>>>>> Here are the Stack Overflow questions:
>>>>>
>>>>>
>>>>> http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981
>>>>>
>>>>>
>>>>> http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236
>>>>>
>>>>> yours,
>>>>> Theo
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Fabien Rousseau
>>>> *
>>>> *
>>>>  <aur...@yakaz.com>www.yakaz.com
>>>>
>>>
>>>
>>>
>>
>


-- 
Fabien Rousseau
*
*
 <aur...@yakaz.com>www.yakaz.com

Reply via email to