Re: Bootstrapping fails with < 128GB RAM ...

2018-02-16 Thread Jürgen Albersdorfer
Hi Jon,
I was able to get a heapdump. - I created a JIRA and attached as much as
details as possible.

https://issues.apache.org/jira/browse/CASSANDRA-14239

The heapdump is 42GB in Size. I will keep it - if you need more information
please don't hesitate to let me know.

thanks,
Jürgen

2018-02-09 12:07 GMT+01:00 Jürgen Albersdorfer :

> Hi Jon,
> should I register to the JIRA and open an Issue or will you do so?
> I'm currently trying to bootstrap another node - with 100GB RAM, this
> time, and I'm recording Java Heap Memory over time via Jconsole, Top
> Threads and do monitoring the debug.log.
>
> There, in the debug.log, I can see, that the other nodes seem to
> immediatelly start hinting to the joining node, indicated by the following
> logs, which I have hundrets per second in my debug.log:
>
> DEBUG [MutationStage-27] 2018-02-09 12:06:03,241 HintVerbHandler.java:95 -
> Failed to apply hint
> java.util.concurrent.CompletionException: 
> org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647)
> ~[na:1.8.0_151]
> at java.util.concurrent.CompletableFuture$UniAccept.
> tryFire(CompletableFuture.java:632) ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> ~[na:1.8.0_151]
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:523)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:538)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_151]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
> ... 6 common frames omitted
>
> Could this be connected? Maybe causing the extensive RAM requirement?
>
> Thanks so far, regards
> Juergen
>
> 2018-02-07 19:49 GMT+01:00 Jon Haddad :
>
>> It would be extremely helpful to get some info about your heap.  At a
>> bare minimum, a histogram of the heap dump would be useful, but ideally a
>> full heap dump would be best.
>>
>> jmap  -dump:live,format=b,file=heap.bin PID
>>
>> Taking a look at that in YourKit should give some pretty quick insight
>> into what kinds of objects are allocated then we can get to the bottom of
>> the issue.  This should be moved to a JIRA (https://issues.apache.org/jir
>> a/secure/Dashboard.jspa) in order to track and fix it, if you could
>> attach that heap dump it would be very helpful.
>>
>> Jon
>>
>>
>> On Feb 7, 2018, at 6:11 AM, Nicolas Guyomar 
>> wrote:
>>
>> Ok then, following up on the wild guess : because you have quite a lot of
>> concurrent compactors, maybe it is too much concurrent compactions for the
>> jvm to deal with (taking into account that your load average of 106 seems
>> really high IMHO)
>>
>> 55Gb of data is not that much, you can try to reduce those concurrent
>> compactor to make sure your box is not under too much stress (how many
>> compaction do you have in parallel during boostrap ? )
>>
>> In the end, it does seem that you're gonna have to share some heap dump
>> for further investigation (sorry I'm not gonna help lot on this matter)
>>
>> On 7 February 2018 at 14:43, Jürgen Albersdorfer > > wrote:
>>
>>> Hi Nicolas,
>>>
>>> Do you know how many sstables is this new node suppose to receive ?
>>>
>>>
>>> If I can find out this via nodetool netstats, then this would be 619 as
>>> following:
>>>
>>> # nodetool netstats
>>> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
>>> /192.168.1.215 - Receiving 71 files, 7744612158 <(774)%20461-2158>
>>> bytes total. Already received 0 files, 893897583 bytes total
>>> /192.168.1.214 - Receiving 58 files, 5693392001 bytes total.
>>> Already received 0 files, 1078372756 bytes total
>>> /192.168.1.206 - Receiving 52 files, 3389096409 bytes total.
>>> Already received 3 files, 508592758 bytes total
>>> /192.168.1.213 - Receiving 59 files, 6041633329 bytes total.
>>> Already received 0 files, 1038760653 bytes total
>>> /192.168.1.231 - Receiving 79 files, 7579181689 <(757)%20918-1689>
>

RE: overhead of empty tables

2018-02-16 Thread Jacques-Henri Berthemet
The main overhead is that each table locks 1MB of Java heap, so if you have 
1000 tables it will use 1GB of RAM just for managing the tables, even if they 
are empty.

--
Jacques-Henri Berthemet

From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com]
Sent: Friday, February 16, 2018 1:05 AM
To: user@cassandra.apache.org; Dinesh Joshi 
Subject: Re: overhead of empty tables

Thanks Dinesh,
We have 36332 files under the data folder

On Thu, Feb 15, 2018 at 3:45 PM, Dinesh Joshi 
mailto:dinesh.jo...@yahoo.com.invalid>> wrote:
Each table in a keyspace is stored as a separate directory in the data 
directory. If you many tables you'll have a lot of files. Some file systems 
have issues dealing with a lot of files in a single directory. Other than that, 
there will likely be some book keeping overhead within the Cassandra process. 
How many tables are we talking about here?

Here's more information about it: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html

Dinesh


On Thursday, February 15, 2018, 3:34:49 PM PST, Alaa Zubaidi (PDF) 
mailto:alaa.zuba...@pdf.com>> wrote:


Is there any overhead if my keyspace contains many empty tables?
Thanks
Alaa

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.com.



--

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zuba...@pdf.com

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.com.


"nodetool getendpoints" is inconsistent when cluster restarts

2018-02-16 Thread Marcel Villet
Hi All

I saw something curious when I restarted our cluster following a full shutdown 
whereby the result of calling nodetool getendpoints showed inconsistent results 
as each node was started in turn.

Prior to shutdown, we had the following:

$ nodetool status
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
UN  10.0.3.11  46.53 GiB  256  51.1% 
35dcbc33-efb3-4c98-9063-aa451e6e0c7f  rack1
UN  10.0.3.12  44.44 GiB  256  48.9% 
8c13e94a-2c8e-43a3-9fb0-11d4b5b4606f  rack1
UN  10.0.3.21  41.42 GiB  256  45.4% 
6b619cc8-e398-4e48-8542-09d84c3710f6  rack2
UN  10.0.3.22  49.55 GiB  256  54.6% 
962b04ee-e135-4ed5-87f9-7463c0b18860  rack2
UN  10.0.3.31  45.76 GiB  256  50.2% 
2a90ac41-a741-472f-8c74-a382a24eb65f  rack3
UN  10.0.3.32  45.2 GiB   256  49.8% 
e7dc5e68-bbde-4f6b-9426-43cf13c679c8  rack3

$ nodetool getendpoints ospreypro Users 39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.12
10.0.3.22
10.0.3.32

As each node was started up, the results differed as follows:

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.11

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.11
10.0.3.12

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.11
10.0.3.12
10.0.3.21

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.11
10.0.3.12
10.0.3.22

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.12
10.0.3.22
10.0.3.31

administrator@OspreyProSRV1-R1N1:~$ nodetool getendpoints ospreypro Users 
39befa1f-dee1-48c4-9702-e8e8332b84df
10.0.3.12
10.0.3.22
10.0.3.32

Eventually, the endpoints settles on the expected values.

Seeing as the token ranges are stored in system.peers (stored on each node), 
why would I see different endpoints as I do?

Regards
Marcel Villet
Osprey Pro Team Leader and Lead Architect
CT LAB (PTY) LTD - Bringing Quality To Power
T  +27 (0)21-880-9915  |   F  +27 (0)21-880-1088  |  
www.ctlab.com  |  
www.pq-portal.com
15 Termo Lane, Techno Park, Stellenbosch, South Africa, 7600  |  PO Box 897, 
Stellenbosch,7599.
[cid:E9CDFBB1-258E-48CA-8E20-3ED934168E78]





This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system 
manager:administra...@ctlab.com


Help needed to enbale Client-to-node encryption(SSL)

2018-02-16 Thread Prachi Rath
Hi,

I am using cassandra version  2.0 . My goal is to do cassandra client to
node security using SSL with my self-signed CA.

What would be the recommended procedure for enabling SSL on  cassandra
version 2.0.17 .

Thanks,
Prachi


Re: Help needed to enbale Client-to-node encryption(SSL)

2018-02-16 Thread Jeff Jirsa
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

https://www.youtube.com/watch?v=CKt0XVPogf4

(2.0 is getting pretty old and isn't supported, you may want to consider
upgrading; 2.1 would be the smallest change and least risk, but it, too, is
near end of life)



On Fri, Feb 16, 2018 at 8:05 AM, Prachi Rath  wrote:

> Hi,
>
> I am using cassandra version  2.0 . My goal is to do cassandra client to
> node security using SSL with my self-signed CA.
>
> What would be the recommended procedure for enabling SSL on  cassandra
> version 2.0.17 .
>
> Thanks,
> Prachi
>


[RELEASE] Apache Cassandra 2.1.20 released - PLEASE READ NOTICE

2018-02-16 Thread Michael Shuler
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092)
--

The maximum expiration timestamp that can be represented by the storage
engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL
thatl expire after this date are not currently supported. By default,
INSERTS with TTL exceeding the maximum supported date are rejected, but
it's possible to choose a different expiration overflow policy. See
CASSANDRA-14092.txt for more details.

Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection
against INSERTS with TTL expiring after the maximum supported date,
causing the expiration time field to overflow and the records to expire
immediately. Clusters in the 2.X and lower series are not subject to
this when assertions are enabled. Backed up SSTables can be potentially
recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.

If you use or plan to use very large TTLS (10 to 20 years), read
CASSANDRA-14092.txt for more information.
--

The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.1.20.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/5M7w4X
[2]: (NEWS.txt) https://goo.gl/Kd2kF3
[3]: https://issues.apache.org/jira/browse/CASSANDRA



signature.asc
Description: OpenPGP digital signature


[RELEASE] Apache Cassandra 2.2.12 released - PLEASE READ NOTICE

2018-02-16 Thread Michael Shuler
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092)
--

The maximum expiration timestamp that can be represented by the storage
engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL
thatl expire after this date are not currently supported. By default,
INSERTS with TTL exceeding the maximum supported date are rejected, but
it's possible to choose a different expiration overflow policy. See
CASSANDRA-14092.txt for more details.

Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection
against INSERTS with TTL expiring after the maximum supported date,
causing the expiration time field to overflow and the records to expire
immediately. Clusters in the 2.X and lower series are not subject to
this when assertions are enabled. Backed up SSTables can be potentially
recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.

If you use or plan to use very large TTLS (10 to 20 years), read
CASSANDRA-14092.txt for more information.
--

The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.2.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/a6L1TE
[2]: (NEWS.txt) https://goo.gl/M9jhdZ
[3]: https://issues.apache.org/jira/browse/CASSANDRA



signature.asc
Description: OpenPGP digital signature


Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-16 Thread Carl Mueller
Thanks. Yeah, it appears this would only be doable if we didn't have vnodes
and used old single token clusters. I guess Priam has something where you
increase the cluster by whole number multiples. Then there's the issue of
doing quorum read/writes if there suddenly is a new replica range with
grey-area ownership/responsiblity for the range, like where LOCAL_QUORUM
becomes a bit illdefined if more than one node is being added to a cluster.

I guess the only way that would work is if the nodes were some multiple of
the vnode count and vnodes distributed themselves consistently, so that
expansions of RF multiples might be consistent and precomputable for
responsible ranges.

I will read that talk.

On Thu, Feb 15, 2018 at 7:39 PM, kurt greaves  wrote:

> Ben did a talk
> 
> that might have some useful information. It's much more complicated with
> vnodes though and I doubt you'll be able to get it to be as rapid as you'd
> want.
>
> sets up schema to match
>
> This shouldn't be necessary. You'd just join the node as usual but with
> auto_bootstrap: false and let the schema be propagated.
>
> Is there an issue if the vnodes tokens for two nodes are identical? Do
>> they have to be distinct for each node?
>
> Yeah. This is annoying I know. The new node will take over the tokens of
> the old node, which you don't want.
>
>
>> Basically, I was wondering if we just use this to double the number of
>> nodes with identical copies of the node data via snapshots, and then later
>> on cassandra can pare down which nodes own which data.
>
> There wouldn't be much point to adding nodes with the same (or almost the
> same) tokens. That would just be shifting load. You'd essentially need a
> very smart allocation algorithm to come up with good token ranges, but then
> you still have the problem of tracking down the relevant SSTables from the
> nodes. Basically, bootstrap does this for you ATM and only streams the
> relevant sections of SSTables for the new node. If you were doing it from
> backups/snapshots you'd need to either do the same thing (eek) or copy all
> the SSTables from all the relevant nodes.
>
> With single token nodes this becomes much easier. You can likely get away
> with only copying around double/triple the data (depending on how you add
> tokens to the ring and RF and node count).
>
> I'll just put it out there that C* is a database and really isn't designed
> to be rapidly scalable. If you're going to try, be prepared to invest A LOT
> of time into it.
> ​
>