Re: Read timeouts when performing rolling restart

2018-09-13 Thread Riccardo Ferrari
Hi Shalom,

It happens almost at every restart, either a single node or a rolling one.
I do agree with you that it is good, at least on my setup, to wait few
minutes to let the rebooted node to cool down before moving to the next.
The more I look at it the more I think is something coming from hint
dispatching, maybe I should try  something around hints throttling.

Thanks!

On Thu, Sep 13, 2018 at 8:55 AM, shalom sagges 
wrote:

> Hi Riccardo,
>
> Does this issue occur when performing a single restart or after several
> restarts during a rolling restart (as mentioned in your original post)?
> We have a cluster that when performing a rolling restart, we prefer to
> wait ~10-15 minutes between each restart because we see an increase of GC
> for a few minutes.
> If we keep restarting the nodes quickly one after the other, the
> applications experience timeouts (probably due to GC and hints).
>
> Hope this helps!
>
> On Thu, Sep 13, 2018 at 2:20 AM Riccardo Ferrari 
> wrote:
>
>> A little update on the progress.
>>
>> First:
>> Thank you Thomas. I checked the code in the patch and briefly skimmed
>> through the 3.0.6 code. Yup it should be fixed.
>> Thank you Surbhi. At the moment we don't need authentication as the
>> instances are locked down.
>>
>> Now:
>> - Unfortunately the start_transport_native trick does not always work. On
>> some nodes works on other don't. What do I mean? I still experience
>> timeouts and dropped messages during startup.
>> - I realized that cutting the concurrent_compactors to 1 was not really a
>> good idea, minimum vlaue should be 2, currently testing 4 (that is the
>> min(n_cores, n_disks))
>> - After rising the compactors to 4 I still see some dropped messages for
>> HINT and MUTATIONS. This happens during startup. Reason is "for internal
>> timeout". Maybe too many compactors?
>>
>> Thanks!
>>
>>
>> On Wed, Sep 12, 2018 at 7:09 PM, Surbhi Gupta 
>> wrote:
>>
>>> Another thing to notice is :
>>>
>>> system_auth WITH replication = {'class': 'SimpleStrategy',
>>> 'replication_factor': '1'}
>>>
>>> system_auth has a replication factor of 1 and even if one node is down
>>> it may impact the system because of the replication factor.
>>>
>>>
>>>
>>> On Wed, 12 Sep 2018 at 09:46, Steinmaurer, Thomas <
>>> thomas.steinmau...@dynatrace.com> wrote:
>>>
 Hi,



 I remember something that a client using the native protocol gets
 notified too early by Cassandra being ready due to the following issue:

 https://issues.apache.org/jira/browse/CASSANDRA-8236



 which looks similar, but above was marked as fixed in 2.2.



 Thomas



 *From:* Riccardo Ferrari 
 *Sent:* Mittwoch, 12. September 2018 18:25
 *To:* user@cassandra.apache.org
 *Subject:* Re: Read timeouts when performing rolling restart



 Hi Alain,



 Thank you for chiming in!



 I was thinking to perform the 'start_native_transport=false' test as
 well and indeed the issue is not showing up. Starting the/a node with
 native transport disabled and letting it cool down lead to no timeout
 exceptions no dropped messages, simply a crystal clean startup. Agreed it
 is a workaround



 # About upgrading:

 Yes, I desperately want to upgrade despite is a long and slow task.
 Just reviewing all the changes from 3.0.6 to 3.0.17
 is going to be a huge pain, top of your head, any breaking change I
 should absolutely take care of reviewing ?



 # describecluster output: YES they agree on the same schema version



 # keyspaces:

 system WITH replication = {'class': 'LocalStrategy'}

 system_schema WITH replication = {'class': 'LocalStrategy'}

 system_auth WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': '1'}

 system_distributed WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': '3'}

 system_traces WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': '2'}



  WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': '3'}

   WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': '3'}



 # Snitch

 Ec2Snitch



 ## About Snitch and replication:

 - We have the default DC and all nodes are in the same RACK

 - We are planning to move to GossipingPropertyFileSnitch configuring
 the cassandra-rackdc accortingly.

 -- This should be a transparent change, correct?



 - Once switched to GPFS, we plan to move to 'NetworkTopologyStrategy'
 with 'us-' DC and replica counts as before

 - Then adding a new DC inside the VPC, but this is another story...



 Any concerns here ?



 # nodetool status 

 --  Address Lo

RE: Cassandra 2.2.7 Compaction after Truncate issue

2018-09-13 Thread David Payne
I was able to resolve the issue with a rolling restart of the cluster.

From: James Shaw 
Sent: Thursday, August 23, 2018 7:52 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.2.7 Compaction after Truncate issue

you may go OS level to delete the files.That's what I did before.  Truncate 
action is frequently failed on some remote nodes in a heavy transactions env.

Thanks,

James

On Thu, Aug 23, 2018 at 8:54 PM, Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
David ,

What CL do you set when running this command?

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 
250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 14, 2018, 11:49 AM -0500, David Payne 
mailto:dav...@cqg.com>>, wrote:

Scenario: Cassandra 2.2.7, 3 nodes, RF=3 keyspace.


Truncate a table.

More than 24 hours later… FileCacheService is still reporting cold readers for 
sstables of truncated data for node 2 and 3, but not node 1.

The output of nodeool compactionstats shows stuck compaction for the truncated 
table for node 2 and 3, but not node 1.

This appears to be a defect that was fixed in 2.1.0. 
https://issues.apache.org/jira/browse/CASSANDRA-7803

Any ideas?

Thanks,
David Payne
| ̄ ̄|
_☆☆☆_
( ´_⊃`)
c. 303-717-0548
dav...@cqg.com




RE: Cassandra 2.2.7 Compaction after Truncate issue

2018-09-13 Thread David Payne
The truncation was performed via OpsCenter, which I believe is ALL by default.

From: Rahul Singh 
Sent: Thursday, August 23, 2018 6:55 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.2.7 Compaction after Truncate issue

David ,

What CL do you set when running this command?

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 14, 2018, 11:49 AM -0500, David Payne 
mailto:dav...@cqg.com>>, wrote:

Scenario: Cassandra 2.2.7, 3 nodes, RF=3 keyspace.


Truncate a table.

More than 24 hours later… FileCacheService is still reporting cold readers for 
sstables of truncated data for node 2 and 3, but not node 1.

The output of nodeool compactionstats shows stuck compaction for the truncated 
table for node 2 and 3, but not node 1.

This appears to be a defect that was fixed in 2.1.0. 
https://issues.apache.org/jira/browse/CASSANDRA-7803

Any ideas?

Thanks,
David Payne
| ̄ ̄|
_☆☆☆_
( ´_⊃`)
c. 303-717-0548
dav...@cqg.com



Large partitions

2018-09-13 Thread Gedeon Kamga
Folks,

Based on the information found here
https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
,
the recommended limit for a partition size is 100MB. Even though, DataStax
clearly states that this is a rule of thumb, some team members are claiming
that our Cassandra *Write *is very slow because the partitions on some
tables are over 100MB. I know for a fact that this rule has changed since
2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
size is *a few hundreds MB*, given the improvement on the architecture.
Now, I am unable to find the reference (maybe I got it at a Cassandra
training by DataStax). I would like to share it with my team. Did anyone
come across this information? If yes, can you please share it?

Thanks!


Re: Large partitions

2018-09-13 Thread Alexander Dejanovski
Hi Gedeon,

you should check Robert Stupp's 2016 talk about large partitions :
https://www.youtube.com/watch?v=N3mGxgnUiRY

Cheers,


On Thu, Sep 13, 2018 at 6:42 PM Gedeon Kamga  wrote:

> Folks,
>
> Based on the information found here
> https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
>  ,
> the recommended limit for a partition size is 100MB. Even though, DataStax
> clearly states that this is a rule of thumb, some team members are claiming
> that our Cassandra *Write *is very slow because the partitions on some
> tables are over 100MB. I know for a fact that this rule has changed since
> 2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
> size is *a few hundreds MB*, given the improvement on the architecture.
> Now, I am unable to find the reference (maybe I got it at a Cassandra
> training by DataStax). I would like to share it with my team. Did anyone
> come across this information? If yes, can you please share it?
>
> Thanks!
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Large partitions

2018-09-13 Thread Mun Dega
I disagree.

We had several over 150MB in 3.11 and we were able to break cluster doing
r/w from these partitions in a short period of time.

On Thu, Sep 13, 2018, 12:42 Gedeon Kamga  wrote:

> Folks,
>
> Based on the information found here
> https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
>  ,
> the recommended limit for a partition size is 100MB. Even though, DataStax
> clearly states that this is a rule of thumb, some team members are claiming
> that our Cassandra *Write *is very slow because the partitions on some
> tables are over 100MB. I know for a fact that this rule has changed since
> 2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
> size is *a few hundreds MB*, given the improvement on the architecture.
> Now, I am unable to find the reference (maybe I got it at a Cassandra
> training by DataStax). I would like to share it with my team. Did anyone
> come across this information? If yes, can you please share it?
>
> Thanks!
>


Re: Large partitions

2018-09-13 Thread Jonathan Haddad
It depends on a number of factors, such as compaction strategy and read
patterns.  I recommend sticking to the 100MB per partition limit (and I aim
for significantly less than that).

If you're doing time series with TWCS & TTL'ed data and small enough
windows, and you're only querying for a small subset of the data, sure, you
could do it.  Outside of that, I don't see a reason why you'd want to.  I
wrote a blog post on how to scale time series workloads in Cassandra a ways
back, might be worth a read:
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Regarding your write performance, since you're only bound by commit log
performance + memtable inserts, if your writes are slow there's a good
chance you're hitting long GC pauses.  Those *could* be caused by
compaction.  If your compaction throughput is too high you could see high
rates of object allocation which lead to long GC pauses, slowing down your
writes.  There's other things that can cause long GC pauses, sometimes you
just need some basic tuning. I recommend reading up on it:
http://thelastpickle.com/blog/2018/04/11/gc-tuning.html

Jon



On Thu, Sep 13, 2018 at 9:47 AM Mun Dega  wrote:

> I disagree.
>
> We had several over 150MB in 3.11 and we were able to break cluster doing
> r/w from these partitions in a short period of time.
>
> On Thu, Sep 13, 2018, 12:42 Gedeon Kamga  wrote:
>
>> Folks,
>>
>> Based on the information found here
>> https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
>>  ,
>> the recommended limit for a partition size is 100MB. Even though, DataStax
>> clearly states that this is a rule of thumb, some team members are claiming
>> that our Cassandra *Write *is very slow because the partitions on some
>> tables are over 100MB. I know for a fact that this rule has changed since
>> 2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
>> size is *a few hundreds MB*, given the improvement on the architecture.
>> Now, I am unable to find the reference (maybe I got it at a Cassandra
>> training by DataStax). I would like to share it with my team. Did anyone
>> come across this information? If yes, can you please share it?
>>
>> Thanks!
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Corrupt insert during ALTER TABLE add

2018-09-13 Thread Max C.
I ran “alter table” today to add the “task_output_capture_state” column (see 
below), and we found a few rows inserted around the time of the ALTER TABLE did 
not contain the same values when selected as when they were inserted.

When the row was selected, what we saw was:
- test_id —> OK (same as insert)
- test_instance_group_id —> inserted as UNSET_ID, selected it contained the 
same value as test_id
- test_name —> OK (same as insert)
- ti_exec_flow —> inserted as “STD”, selected it contained the same value as 
test_name

Does this ring a bell?  Is there a JIRA for this (hopefully fixed in a newer 
C*)?  We’re running 3.0.6.  Thanks everyone.

CREATE TABLE mars.test_instances_by_run_submission (
run_submission_id timeuuid,
rs_bucket_num int,
id timeuuid, 
active_task_id timeuuid,
active_task_name_path text,
exec_end timestamp,
exec_gpath text,
exec_host_id uuid,
exec_start timestamp,
exec_state text,
exec_subbuild_id timeuuid,
exec_vco text,
fstatus text,
grid_job_id uuid,
legacy_assert boolean,
legacy_core boolean,
source_gpath text,
task_output_capture_state text,   # NEW
test_id uuid,
test_instance_group_id uuid,
test_name text,
ti_exec_flow text,
PRIMARY KEY ((run_submission_id, rs_bucket_num), id)
) WITH CLUSTERING ORDER BY (id ASC)

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Corrupt insert during ALTER TABLE add

2018-09-13 Thread Max C.
Correction — we’re running C* 3.0.8.  DataStax Python driver 3.4.1.

> On Sep 13, 2018, at 1:11 pm, Max C.  wrote:
> 
> I ran “alter table” today to add the “task_output_capture_state” column (see 
> below), and we found a few rows inserted around the time of the ALTER TABLE 
> did not contain the same values when selected as when they were inserted.
> 
> When the row was selected, what we saw was:
> - test_id —> OK (same as insert)
> - test_instance_group_id —> inserted as UNSET_ID, selected it contained the 
> same value as test_id
> - test_name —> OK (same as insert)
> - ti_exec_flow —> inserted as “STD”, selected it contained the same value as 
> test_name
> 
> Does this ring a bell?  Is there a JIRA for this (hopefully fixed in a newer 
> C*)?  We’re running 3.0.6.  Thanks everyone.
> 
> CREATE TABLE mars.test_instances_by_run_submission (
>run_submission_id timeuuid,
>rs_bucket_num int,
>id timeuuid, 
>active_task_id timeuuid,
>active_task_name_path text,
>exec_end timestamp,
>exec_gpath text,
>exec_host_id uuid,
>exec_start timestamp,
>exec_state text,
>exec_subbuild_id timeuuid,
>exec_vco text,
>fstatus text,
>grid_job_id uuid,
>legacy_assert boolean,
>legacy_core boolean,
>source_gpath text,
>task_output_capture_state text,   # NEW
>test_id uuid,
>test_instance_group_id uuid,
>test_name text,
>ti_exec_flow text,
>PRIMARY KEY ((run_submission_id, rs_bucket_num), id)
> ) WITH CLUSTERING ORDER BY (id ASC)
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Corrupt insert during ALTER TABLE add

2018-09-13 Thread Jeff Jirsa
CASSANDA-13004 (fixed in recent 3.0 and 3.11 builds)


On Thu, Sep 13, 2018 at 1:12 PM Max C.  wrote:

> I ran “alter table” today to add the “task_output_capture_state” column
> (see below), and we found a few rows inserted around the time of the ALTER
> TABLE did not contain the same values when selected as when they were
> inserted.
>
> When the row was selected, what we saw was:
> - test_id —> OK (same as insert)
> - test_instance_group_id —> inserted as UNSET_ID, selected it contained
> the same value as test_id
> - test_name —> OK (same as insert)
> - ti_exec_flow —> inserted as “STD”, selected it contained the same value
> as test_name
>
> Does this ring a bell?  Is there a JIRA for this (hopefully fixed in a
> newer C*)?  We’re running 3.0.6.  Thanks everyone.
>
> CREATE TABLE mars.test_instances_by_run_submission (
> run_submission_id timeuuid,
> rs_bucket_num int,
> id timeuuid,
> active_task_id timeuuid,
> active_task_name_path text,
> exec_end timestamp,
> exec_gpath text,
> exec_host_id uuid,
> exec_start timestamp,
> exec_state text,
> exec_subbuild_id timeuuid,
> exec_vco text,
> fstatus text,
> grid_job_id uuid,
> legacy_assert boolean,
> legacy_core boolean,
> source_gpath text,
> task_output_capture_state text,   # NEW
> test_id uuid,
> test_instance_group_id uuid,
> test_name text,
> ti_exec_flow text,
> PRIMARY KEY ((run_submission_id, rs_bucket_num), id)
> ) WITH CLUSTERING ORDER BY (id ASC)
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Corrupt insert during ALTER TABLE add

2018-09-13 Thread Max C.
Yep, that’s the problem!  Thanks Jeff (and Alex Petrov for fixing it).

- Max

> On Sep 13, 2018, at 1:24 pm, Jeff Jirsa  wrote:
> 
> CASSANDA-13004 (fixed in recent 3.0 and 3.11 builds)

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



cold vs hot data

2018-09-13 Thread Alaa Zubaidi (PDF)
Hi,

We are using Apache Cassandra 3.11.2 on RedHat 7
The data can grow to +100TB however the hot data will be in most cases less
than 10TB but we still need to keep the rest of data accessible.
Anyone has this problem?
What is the best way to make the cluster more efficient?
Is there a way to somehow automatically move the old data to different
storage (rack, dc, etc)?
Any ideas?

Regards,

-- 

Alaa

-- 
This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error 
and then immediately permanently delete it and all attachments to it from 
your systems. If you are not the intended recipient, do not read, copy, 
disclose or otherwise use this message or any attachments to it. The sender 
disclaims any liability for such unauthorized use.  PLEASE NOTE that all 
incoming e-mails sent to PDF e-mail accounts will be archived and may be 
scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”).  If you have any 
concerns about this process, please contact us at legal.departm...@pdf.com 
.


Re: cold vs hot data

2018-09-13 Thread Ben Slater
Not quite a solution but you will probably be interested in the discussion
on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8460

On Fri, 14 Sep 2018 at 10:46 Alaa Zubaidi (PDF) 
wrote:

> Hi,
>
> We are using Apache Cassandra 3.11.2 on RedHat 7
> The data can grow to +100TB however the hot data will be in most cases
> less than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?
>
> Regards,
>
> --
>
> Alaa
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.departm...@pdf.com* *.*

-- 


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: cold vs hot data

2018-09-13 Thread Mateusz
On piątek, 14 września 2018 02:46:43 CEST Alaa Zubaidi (PDF) wrote:
> The data can grow to +100TB however the hot data will be in most cases less
> than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?

We solved it using lvmcache.

-- 
Mateusz 
(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa."
Nikos Kazantzakis - "Grek Zorba"




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org