Re: Does Java driver v3.1.x degrade cluster connect/close performance?

2017-03-07 Thread Andrew Tolbert
Hi Satoshi,

One correction on my previous email, at 2.1.8 of the driver, Netty 4.0 was
>> in use, so please disregard my comments about the netty dependency changing
>> from 3.9 to 4.0, there is a different in version, but it's only at the
>> patch level (4.0.27 to 4.0.37)
>>
>
Does your comment mean with 2.1.8 of the driver takes at least 2 seconds at
> Cluster#close? If so, it is strange because the response time of
> Cluster#close was around 20ms with v2.1.8 of the driver in my test.
>

The real reason for the two second delay was the change made for JAVA-914
 which was introduced
in 2.1.9 and 3.0.x, not the netty 3.9 to 4.0 version change which I was
incorrect about as that change was made earlier (at driver 2.1.6).

Thanks,
Andy


On Mon, Mar 6, 2017 at 11:11 PM, Satoshi Hikida  wrote:

> Hi Matija, Andrew
>
> Thank you for your reply.
>
> Matija:
> > Do you plan to misuse it and create a new cluster object and open a new
> connection for each request?
> No, My app never create a new cluster for each request. Meanwhile its each
> unit tests creates a new cluster and close it every time.
> Of course I can change the creating and closing a cluster to at once or a
> few times in the test. But I just wondered why the connection/close
> performance is degraded if I update the driver version.
>
>
> Andrew:
> Thanks for your information about driver's ML. I'll use it from next time.
>
> > One correction on my previous email, at 2.1.8 of the driver, Netty 4.0
> was in use, so please disregard my comments about the netty dependency
> changing from 3.9 to 4.0, there is a different in version, but it's only at
> the patch level (4.0.27 to 4.0.37)
> Does your comment mean with 2.1.8 of the driver takes at least 2 seconds
> at Cluster#close? If so, it is strange because the response time of
> Cluster#close was around 20ms with v2.1.8 of the driver in my test.
>
> > I'd be interested to see if running the same test in your environment
> creates different results.
> I'll run the test in my test environment and share the result. Thank you
> again.
>
> Regards,
> Satoshi
>
> On Tue, Mar 7, 2017 at 12:38 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
>> One correction on my previous email, at 2.1.8 of the driver, Netty 4.0
>> was in use, so please disregard my comments about the netty dependency
>> changing from 3.9 to 4.0, there is a different in version, but it's only at
>> the patch level (4.0.27 to 4.0.37)
>>
>> Just to double check, I reran that connection initialization test (source
>> )
>> where I got my previous numbers from (as that was from nearly 2 years ago)
>> and compared driver version 2.1.8 against 3.1.3.  I first ran against a
>> single node that is located in California, where my client is in Minnesota,
>> so roundtrip latency is a factor:
>>
>> v2.1.8:
>>
>> Single attempt took 1837ms.
>>
>> 10 warmup iterations (first 10 attempts discarded), 100 trials
>>
>>
>> -- Timers 
>> --
>> connectTimer
>>  count = 100
>>min = 458.40 milliseconds
>>max = 769.43 milliseconds
>>   mean = 493.45 milliseconds
>> stddev = 38.54 milliseconds
>> median = 488.38 milliseconds
>>   75% <= 495.71 milliseconds
>>   95% <= 514.73 milliseconds
>>   98% <= 724.05 milliseconds
>>   99% <= 769.02 milliseconds
>> 99.9% <= 769.43 milliseconds
>>
>> v3.1.3:
>>
>> Single attempt took 1781ms.
>>
>> 10 warmup iterations (first 10 attempts discarded), 100 trials
>>
>>  -- Timers 
>> --
>> connectTimer
>>  count = 100
>>min = 457.32 milliseconds
>>max = 539.77 milliseconds
>>   mean = 485.68 milliseconds
>> stddev = 10.76 milliseconds
>> median = 485.52 milliseconds
>>   75% <= 490.39 milliseconds
>>   95% <= 499.83 milliseconds
>>   98% <= 511.52 milliseconds
>>   99% <= 535.56 milliseconds
>> 99.9% <= 539.77 milliseconds
>>
>> As you can see, at least for this test, initialization times are pretty
>> much identical.
>>
>> I ran another set of trials using a local C* node (running on same host
>> as client) to limit the impact of round trip time:
>>
>> v2.1.8:
>>
>> Single attempt took 477ms.
>>
>> 10 warmup iterations 100 trials
>>
>> -- Timers 
>> --
>> connectTimer
>>  count = 100
>>min = 2.38 milliseconds
>>max = 32.69 milliseconds
>>   mean = 3.79 milliseconds
>> stddev = 3.49 milliseconds
>> median = 3.05 milliseconds
>>   75% <=

Re: Changed node ID?

2017-03-07 Thread Vladimir Yudovin
Hi,

Why did the host ID change?



probably this node data folder (at least system keyspace) was erased. Or nodes 
changed their IP, do you use dynamic IPs?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 06 Mar 2017 22:44:50 -0500 Joe Olson 
 wrote 






I have a 9 node cluster I had shut down (cassandra stopped on all nodes, all 
nodes shutdown) that I just tried to start back up. I have done this several 
times successfully. However, on this attempt, one of the nodes failed to join 
the cluster. Upon inspection of /var/log/cassandra/system.log, I found the 
following:



WARN  [GossipStage:1] 2017-03-06 21:06:36,648 TokenMetadata.java:252 - Changing 
/192.168.211.82's host ID from cff3ef25-9a47-4ea4-9519-b85d20bef3ee to 
59f2da9f-0b85-452f-b61a-fa990de53e4b



further down:



ERROR [main] 2017-03-06 21:20:14,718 CassandraDaemon.java:747 - Exception 
encountered during startup

java.lang.RuntimeException: A node with address /192.168.211.82 already exists, 
cancelling join. Use cassandra.replace_address if you want to replace this node.

at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:491)
 ~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:778)
 ~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:648) 
~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:548) 
~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:385) 
[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) 
[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730) 
[apache-cassandra-3.9.0.jar:3.9.0]




nodetool status:



UN  192.168.211.88  2.58 TiB   256  32.0% 
9de2d3ef-5ae1-4c7f-8560-730757a6d1ae  rack1

UN  192.168.211.80  2.26 TiB   256  33.9% 
d83829d3-a1d3-4e6c-b014-7cfe45e22d67  rack1

UN  192.168.211.81  2.91 TiB   256  34.1% 
0cafd24e-d3ed-4e51-b586-0b496835a931  rack1

DN  192.168.211.82  551.45 KiB  256  31.9% 
59f2da9f-0b85-452f-b61a-fa990de53e4b  rack1

UN  192.168.211.83  2.32 TiB   256  32.7% 
db006e31-03fa-486a-8512-f88eb583bd0c  rack1

UN  192.168.211.84  2.54 TiB   256  34.3% 
a9a50a74-2fc2-4866-a03a-ec95a7866183  rack1

UN  192.168.211.85  2.4 TiB256  35.9% 
733e6703-c18f-432f-a787-3731f80ba42d  rack1

UN  192.168.211.86  2.34 TiB   256  32.1% 
0daa06fa-708f-4ff8-a15e-861f1a53113a  rack1

UN  192.168.211.87  4.07 TiB   256  33.1% 
2aa578c6-1332-4b94-81c6-c3ce005a52ef  rack1




My questions:

1. Why did the host ID change?

2. If I modify cassandra-env.sh to include 

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=192.168.211.82", will I recover 
the data on the original node? It is still on the node's hard drive.I really 
don't want to have to restream 2.6TB of data onto a "new" node.










Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Ney, Richard
We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node 
Cassandra cluster split between two AWS availability zones.

Are there any behaviors we need to take into account to ensure the Cassandra 
cluster stability with this configuration?

RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
UNITED STATES
richard@aspect.com
aspect.com

[mailSigLogo-rev.jpg]
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Andrey Ilinykh
I'd recommend three availability zones. In this case if you loose one AZ
you still have a quorum (assuming replication factor of 3)

Andrey

On Tue, Mar 7, 2017 at 9:05 AM, Ney, Richard  wrote:

> We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node
> Cassandra cluster split between two AWS availability zones.
>
>
>
> Are there any behaviors we need to take into account to ensure the
> Cassandra cluster stability with this configuration?
>
>
>
> *RICHARD NEY*
>
> TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
>
> *UNITED STATES*
>
> *richard@aspect.com *
>
> *aspect.com *
>
>
>
> [image: mailSigLogo-rev.jpg]
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Re: Frozen type sin cassandra.

2017-03-07 Thread Tyler Hobbs
On Sun, Mar 5, 2017 at 11:53 PM, anuja jain  wrote:

> Is there is difference between creating column of type
> frozen> and frozen where list_double is UDT of
> type frozen> ?
>

Yes, there is a difference in serialization format: the first will be
serialized directly as a list, the second will be serialized as a
single-field UDT containing a list.

Additionally, the second form supports altering the type by adding fields
to the UDT.  This can't be done with the first form.  If you don't need
this capability, I recommend going with the simpler option of
frozen>.


> Also how to create a solr index on such columns?
>

I have no idea, sorry.

-- 
Tyler Hobbs
DataStax 


Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread tommaso barbugli
Hi Richard,

It depends on the snitch and the replication strategy in use.

Here's a link to a blogpost about how we deployed C* on 3AZ

http://highscalability.com/blog/2016/8/1/how-to-setup-a-highly-available-multi-az-cassandra-cluster-o.html


Best,
Tommaso


On Mar 7, 2017 18:05, "Ney, Richard"  wrote:

We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node
Cassandra cluster split between two AWS availability zones.



Are there any behaviors we need to take into account to ensure the
Cassandra cluster stability with this configuration?



*RICHARD NEY*

TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT

*UNITED STATES*

*richard@aspect.com *

*aspect.com *



[image: mailSigLogo-rev.jpg]
This email (including any attachments) is proprietary to Aspect Software,
Inc. and may contain information that is confidential. If you have received
this message in error, please do not read, copy or forward this message.
Please notify the sender immediately, delete it from your system and
destroy any copies. You may not further disclose or distribute this email
or its attachments.


Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Romain Hardouin
Hi,
Before: 1 cluster with 2 DC. 3 nodes in each DCNow: 1 cluster with 1 DC. 6 
nodes in this DC
Is it right?
If yes, depending on the RF - and assuming NetworkTopologyStrategy - I would 
do: - RF = 2  => 2 C* rack, one rack in each AZ - RF = 3  => 3 C* rack, one 
rack in each AZ
In other words, I would align C* rack and AZ.Note that AWS charges for inter AZ 
traffic a.k.a Regional Data Transfer.
Best,
Romain 

Le Mardi 7 mars 2017 18h36, tommaso barbugli  a écrit :
 

 Hi Richard,
It depends on the snitch and the replication strategy in use.
Here's a link to a blogpost about how we deployed C* on 3AZ
http://highscalability.com/blog/2016/8/1/how-to-setup-a-highly-available-multi-az-cassandra-cluster-o.html

Best,Tommaso 


On Mar 7, 2017 18:05, "Ney, Richard"  wrote:

We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node 
Cassandra cluster split between two AWS availability zones. Are there any 
behaviors we need to take into account to ensure the Cassandra cluster 
stability with this configuration? RICHARD NEYTECHNICAL DIRECTOR, RESEARCH & 
DEVELOPMENTUNITED statesrichard@aspect.comaspect.com This email (including 
any attachments) is proprietary to Aspect Software, Inc. and may contain 
information that is confidential. If you have received this message in error, 
please do not read, copy or forward this message. Please notify the sender 
immediately, delete it from your system and destroy any copies. You may not 
further disclose or distribute this email or its attachments.



   

unsubscribe

2017-03-07 Thread Daniel Rathbone
-- 
sent from Dan Rathbone's tech/work email account
http://rathboneventures.com - my company
http://danrathbone.com -- personal site


Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa


On 2017-03-04 07:23 (-0800), "Thakrar, Jayesh"  
wrote: 
> LCS does not rule out frequent updates - it just says that there will be more 
> frequent compaction, which can potentially increase compaction activity 
> (which again can be throttled as needed).
> But STCS will guarantee OOM when you have large datasets.
> Did you have a look at the offheap + onheap size of our jvm using "nodetool 
> -info" ?
> 
> 

STCS does not guarantee you OOM when you have large datasets, unless by large 
datasets you mean in the tens-of-terabytes range, which is already something we 
typically recommend against.




Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa


On 2017-03-03 09:18 (-0800), Shravan Ch  wrote: 
> 
> nodetool compactionstats -H
> pending tasks: 3
> compaction typekeyspace  table   
> completed  totalunit   progress
> Compaction  system  hints 
> 28.5 GB   92.38 GB   bytes 30.85%
> 
> 

The hint buildup is also something that could have caused OOMs, too. Hints are 
stored for a given host in a single partition, which means it's common for a 
single row/partition to get huge if you have a single host flapping.

If you see "Compacting large row" messages for the hint rows, I suspect you'll 
find that one of the hosts/rows is responsible for most of that 92GB of hints, 
which means when you try to deliver the hints, you'll read from a huge 
partition, which creates memory pressure (see: CASSANDRA-9754) leading to GC 
pauses (or ooms), which then causes you to flap, which causes you to create 
more hints, which causes an ugly spiral.

In 3.0, hints were rewritten to avoid this problem, but short term, you may 
need to truncate your hints to get healthy (assuming it's safe for you to do 
so, where 'safe' is based on your read+write consistency levels).




Re: Any way to control/limit off-heap memory?

2017-03-07 Thread Jeff Jirsa


On 2017-03-06 07:04 (-0800), "Thakrar, Jayesh"  
wrote: 
> Thanks Hannu - also considered that option.
> However, that's a trial and error and will have to play with the 
> collision/false-positive fraction.
> And each iteration will most likely result in a compaction storm - so I was 
> hoping for a way to throttle/limit the max off-heap size.
> 
> The reason I was thinking of eliminating bloom filters is because due to 
> application design, we search for data using a partial key (prefix columns),
> hence am thinking of completely eliminating the bloom filters as they do not 
> add any value in such a use case.
> 

If you dont want to use the bloom filters, don't set the FP ratio to 0, set it 
to something like 0.1 or 0.5. A fp ratio of 0 says "no false positives", which 
is only possible with HUGE bloom filters. A high FP ratio (since you're not 
using them) basically says "don't try very hard" which corresponds to small 
arrays, which means low accuracy, but low offheap usage. 

We probably shouldn't even allow 0. FP ratio. 





Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Shravan C
In fact I truncated hints table to stabilize the cluster. Through the heap 
dumps I was able to identify the table on which there were numerous queries. 
Then I focused on system_traces.session table around the time OOM occurred. It 
turned out to be a full table scan on a large table which caused OOM.


Thanks everyone of you.

From: Jeff Jirsa 
Sent: Tuesday, March 7, 2017 1:19 PM
To: user@cassandra.apache.org
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time



On 2017-03-03 09:18 (-0800), Shravan Ch  wrote:
>
> nodetool compactionstats -H
> pending tasks: 3
> compaction typekeyspace  table   
> completed  totalunit   progress
> Compaction  system  hints 
> 28.5 GB   92.38 GB   bytes 30.85%
>
>

The hint buildup is also something that could have caused OOMs, too. Hints are 
stored for a given host in a single partition, which means it's common for a 
single row/partition to get huge if you have a single host flapping.

If you see "Compacting large row" messages for the hint rows, I suspect you'll 
find that one of the hosts/rows is responsible for most of that 92GB of hints, 
which means when you try to deliver the hints, you'll read from a huge 
partition, which creates memory pressure (see: CASSANDRA-9754) leading to GC 
pauses (or ooms), which then causes you to flap, which causes you to create 
more hints, which causes an ugly spiral.

In 3.0, hints were rewritten to avoid this problem, but short term, you may 
need to truncate your hints to get healthy (assuming it's safe for you to do 
so, where 'safe' is based on your read+write consistency levels).