Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-22 Thread Nate McCall
What was the resolution on this?

Looks like we resolved/Fixed CASSANDRA-13058. Can we re-roll and go again?

On Tue, Jan 17, 2017 at 4:26 AM, Sylvain Lebresne  wrote:
> I'm a bit sorry about it, but I'm kind of -1 on the account of
> https://issues.apache.org/jira/browse/CASSANDRA-13025. It's a genuine
> regression during upgrade that we should really fix before it's released in
> the wild. I apologize for not having bump the priority on this ticket
> sooner but I think we need the fix in.
>
> On Mon, Jan 16, 2017 at 2:25 AM, Paulo Motta 
> wrote:
>
>> -1 since CASSANDRA-13058
>>  introduces a
>> regression that prevents successful decommission when the decommissioning
>> node has hints to transfer. While this is relatively minor and there is a
>> workaround (force hint replay before decommission), there is already a
>> patch available so I committed this to cassandra-3.11 and upper branches so
>> we will also have a green testboard for cassandra-3.11_novnode_dtest
>> > cassandra-3.11_novnode_dtest/>
>> .
>>
>> If there are no objections on getting this in, can you re-roll this once
>> again Michael? Sorry for the late update on this, I had other things on my
>> plate and could only get to this now.
>>
>> 2017-01-15 10:48 GMT-02:00 Aleksey Yeschenko :
>>
>> > +1
>> >
>> > --
>> > AY
>> >
>> > On 14 January 2017 at 00:47:08, Michael Shuler (mich...@pbandjelly.org)
>> > wrote:
>> >
>> > I propose the following artifacts for release as 3.10.
>> >
>> > sha1: 9c2ab25556fad06a6a4d58f4bb652719a8a1bc27
>> > Git:
>> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
>> > shortlog;h=refs/tags/3.10-tentative
>> > Artifacts:
>> > https://repository.apache.org/content/repositories/
>> > orgapachecassandra-1136/org/apache/cassandra/apache-cassandra/3.10/
>> > Staging repository:
>> > https://repository.apache.org/content/repositories/
>> > orgapachecassandra-1136/
>> >
>> > The Debian packages are available here: http://people.apache.org/~
>> mshuler
>> >
>> > The vote will be open for 72 hours (longer if needed).
>> >
>> > [1]: (CHANGES.txt) https://goo.gl/WaAEVn
>> > [2]: (NEWS.txt) https://goo.gl/7deAsG
>> >
>> > All of the unit tests passed and the main dtest job passed.
>> >
>> > https://cassci.datastax.com/job/cassandra-3.11_testall/47/
>> > https://cassci.datastax.com/job/cassandra-3.11_utest/55/
>> > https://cassci.datastax.com/job/cassandra-3.11_utest_cdc/25/
>> > https://cassci.datastax.com/job/cassandra-3.11_utest_compression/23/
>> > https://cassci.datastax.com/job/cassandra-3.11_dtest/31/
>> >
>> > --
>> > Kind regards,
>> > Michael Shuler
>> >
>> >
>>


Dropped messages on random nodes.

2017-01-22 Thread Dikang Gu
Hello there,

We have a 100 nodes ish cluster, I find that there are dropped messages on
random nodes in the cluster, which caused error spikes and P99 latency
spikes as well.

I tried to figure out the cause. I do not see any obvious bottleneck in the
cluster, the C* nodes still have plenty of cpu idle/disk io. But I do see
some suspicious gossip events around that time, not sure if it's related.

2017-01-21_16:43:56.71033 WARN  16:43:56 [GossipTasks:1]: Not marking nodes
down due to local pause of 13079498815 > 50
2017-01-21_16:43:56.85532 INFO  16:43:56 [ScheduledTasks:1]: MUTATION
messages were dropped in last 5000 ms: 65 for internal timeout and 10895
for cross node timeout
2017-01-21_16:43:56.85533 INFO  16:43:56 [ScheduledTasks:1]: READ messages
were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross
node timeout
2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: Pool Name
   Active   Pending  Completed   Blocked  All Time Blocked
2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: MutationStage
  128 47794 1015525068 0 0
2017-01-21_16:43:56.85535
2017-01-21_16:43:56.85535 INFO  16:43:56 [ScheduledTasks:1]: ReadStage
   64 20202  450508940 0 0

Any suggestions?

Thanks!

-- 
Dikang


Re: Dropped messages on random nodes.

2017-01-22 Thread Dikang Gu
Btw, the C* version is 2.2.5, with several backported patches.

On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu  wrote:

> Hello there,
>
> We have a 100 nodes ish cluster, I find that there are dropped messages on
> random nodes in the cluster, which caused error spikes and P99 latency
> spikes as well.
>
> I tried to figure out the cause. I do not see any obvious bottleneck in
> the cluster, the C* nodes still have plenty of cpu idle/disk io. But I do
> see some suspicious gossip events around that time, not sure if it's
> related.
>
> 2017-01-21_16:43:56.71033 WARN  16:43:56 [GossipTasks:1]: Not marking
> nodes down due to local pause of 13079498815 > 50
> 2017-01-21_16:43:56.85532 INFO  16:43:56 [ScheduledTasks:1]: MUTATION
> messages were dropped in last 5000 ms: 65 for internal timeout and 10895
> for cross node timeout
> 2017-01-21_16:43:56.85533 INFO  16:43:56 [ScheduledTasks:1]: READ messages
> were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross
> node timeout
> 2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: Pool Name
>Active   Pending  Completed   Blocked  All Time Blocked
> 2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: MutationStage
>   128 47794 1015525068 0 0
> 2017-01-21_16:43:56.85535
> 2017-01-21_16:43:56.85535 INFO  16:43:56 [ScheduledTasks:1]: ReadStage
>64 20202  450508940 0 0
>
> Any suggestions?
>
> Thanks!
>
> --
> Dikang
>
>


-- 
Dikang