Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-28 Thread Marcus Olsson

Hi,

We encountered the same problem and created a JIRA for it: 
https://issues.apache.org/jira/browse/CASSANDRA-8387 .


/Marcus O

On 11/27/2014 04:19 PM, DuyHai Doan wrote:


Hello Peter

For safe concurrent table creation, use CREATE TABLE xxx IF NOT 
EXISTS. It will use light weight transaction and you'll have to pay 
some penalty in term of performance but at least the table creation 
will be linearizable


Le 27 nov. 2014 14:26, "Peter Lange" > a écrit :


Hi,

We use a four-node Cassandra-Cluster in Version 2.1.2. Our
Client-Applications creates Tables dynamically. At one point two
(or more) of our Clients connected to two (or more) different
Cassandra-Nodes will create the same table simultaneously. We get
the "Column family ID mismatch"-Error-Messages on every node. Why
is this simultanous schema modification not possible? How can we
handle this? Every Help is appreciated.

The lengthy Error-Messages from two nodes follows:

On Node1 we got:

INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987
MigrationManager.java:248 - Create new ColumnFamily:

org.apache.cassandra.config.CFMetaData@7edad3a3[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,607
DefsTables.java:373 - Loading

org.apache.cassandra.config.CFMetaData@7adc8efd[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,629
ColumnFamilyStore.java:284 - Initializing myplayground.test_table
ERROR [MigrationStage:1] 2014-11-26 13:37:30,282
CassandraDaemon.java:153 - Exception in thread
Thread[MigrationStage:1,5,main]
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ConfigurationException: Column
family ID mismatch (
found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at
org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at

org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_25]
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_25]
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_25]
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_25]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: org.apache.cassandra.exceptions.ConfigurationException:
Column family ID mismatch (found
fbd275c0-7568-11e4-b9ea-3934eddce895; expected
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at

org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186)
~[apache-cassandra-2.1.1.jar:2.1.1]
at
org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167) 
~[apache-cassandra-2.1.1.jar:2.1.1]
... 11 common frames omitted

On Node2 we got:

INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989
MigrationManager.java:248 - Create new ColumnFamily:

org.apache.cassandra.config.CFMetaData@16d0bc0d[cfId=fbd275c0-7568-11e4-b9ea-3934eddce895,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,539
DefsTables.java:373 - Loading

org.apache.cassandra.config.CFMetaData@3777e24b[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,541
ColumnFamilyStore.java:284 - Initializing myplayground.test_table
ERROR [SharedPool-Worker-1] 2014-11-26 13:37:29,984
QueryMessage.java:130 - Unexpected error during query
java.lang.RuntimeException:
java.util.concurrent.ExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ConfigurationException: Column
family ID mismatc

Date Tiered Compaction Strategy and collections

2014-11-28 Thread Batranut Bogdan
Hello all,
If one has a table like this:id text,ts timestampvalues list 
PK (id,ts) 

How will the DTCS work? I am asking this because the writeTime() function does 
not work on collections.

Re: mysql based columnar DB to Cassandra DB - Migration

2014-11-28 Thread Jack Krupansky
Planet Cassandra has some resource pages related to migrations to Cassandra. 
For HBase:
http://planetcassandra.org/hbase-to-cassandra-migration/

There are pages for migration from Oracle, MySQL, MongoDB, and Redis, as well.

-- Jack Krupansky

From: Akshay Ballarpure 
Sent: Friday, November 28, 2014 12:06 AM
To: user@cassandra.apache.org 
Subject: Re: mysql based columnar DB to Cassandra DB - Migration

Thanks Kiran for reply. 
How about other column based databases like infobright , hbase .. can we really 
migrate it to cassandra ? 




From:Kiran Ayyagari  
To:user@cassandra.apache.org 
Date:11/28/2014 08:27 AM 
Subject:Re: mysql based columnar DB to Cassandra DB - Migration 







On Wed, Nov 26, 2014 at 2:15 PM, Akshay Ballarpure  
wrote: 
Hello Folks, 
I have one mysql based columnar DB, i want to migrate it to Cassandra. How its 
possible ? 

see if Troop[1] helps, it was only tested with mysql 5.x and Cassandra 2.0.10
[1] https://github.com/kayyagari/troop 
Best Regards
Akshay Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting
 



From:Akshay Ballarpure/HYD/TCS 
To:user@cassandra.apache.org 
Date:11/18/2014 09:00 PM 
Subject:mysql based columnar DB to Cassandra DB - Migration 





I have one mysql based columnar DB, i want to migrate it to Cassandra. How its 
possible ? 

Best Regards
Akshay Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty. IT Services
Business Solutions
Consulting
 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you 



-- 
Kiran Ayyagari
http://keydap.com 



Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-28 Thread Robert Wille
I would suggest that dynamic table creation is, in general, not a great idea, 
regardless of the database. I would seriously consider altering your approach 
to use a fixed set of tables.

On Nov 28, 2014, at 1:53 AM, Marcus Olsson 
mailto:marcus.ols...@ericsson.com>> wrote:

Hi,

We encountered the same problem and created a JIRA for it: 
https://issues.apache.org/jira/browse/CASSANDRA-8387 .

/Marcus O

On 11/27/2014 04:19 PM, DuyHai Doan wrote:

Hello Peter

For safe concurrent table creation, use CREATE TABLE xxx IF NOT EXISTS. It will 
use light weight transaction and you'll have to pay some penalty in term of 
performance but at least the table creation will be linearizable

Le 27 nov. 2014 14:26, "Peter Lange" mailto:pla...@nde.ag>> a 
écrit :
Hi,

We use a four-node Cassandra-Cluster in Version 2.1.2. Our Client-Applications 
creates Tables dynamically. At one point two (or more) of our Clients connected 
to two (or more) different Cassandra-Nodes will create the same table 
simultaneously. We get the "Column family ID mismatch"-Error-Messages on every 
node. Why is this simultanous schema modification not possible? How can we 
handle this? Every Help is appreciated.

The lengthy Error-Messages from two nodes follows:

On Node1 we got:

INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987 MigrationManager.java:248 - 
Create new ColumnFamily: 
org.apache.cassandra.config.CFMetaData@7edad3a3[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,607 DefsTables.java:373 - Loading 
org.apache.cassandra.config.CFMetaData@7adc8efd[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,629 ColumnFamilyStore.java:284 - 
Initializing myplayground.test_table
ERROR [MigrationStage:1] 2014-11-26 13:37:30,282 CassandraDaemon.java:153 - 
Exception in thread Thread[MigrationStage:1,5,main]
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (
found fbd275c0-7568-11e4-b9ea-3934eddce895; expected 
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_25]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_25]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_25]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_25]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column 
family ID mismatch (found fbd275c0-7568-11e4-b9ea-3934eddce895; expected 
fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
at 
org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167) 
~[apache-cassandra-2.1.1.jar:2.1.1]
... 11 common frames omitted

On Node2 we got:

INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989 MigrationManager.java:248 - 
Create new ColumnFamily: 
org.apache.cassandra.config.CFMetaData@16d0bc0d[cfId=fbd275c0-7568-11e4-b9ea-3934eddce895,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,539 DefsTables.java:373 - Loading 
org.apache.cassandra.config.CFMetaData@3777e24b[cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
INFO  [MigrationStage:1] 2014-11-26 13:37:29,541 ColumnFamilyStore.java:284 - 
Initializing myplayground.test_table
ERROR [SharedPool-Worker-1] 2014-11-26 13:37:29,984 QueryMessage.java:130 - 
Unexpected error during query
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (
found fbd275c0-7568-11e4-b9ea-3934eddce

open source cassandra and hadoop

2014-11-28 Thread Tim Dunphy
Hey all,

 I have a 3 node Cassandra cluster I would like to  hook into hadoop for
processing the information in the Cassandra DB. I know that Datastax
version of Cassandra includes support for Hadoop right out of the box. But
I've been googling around and I don't see any good information on how to do
this.

The Cassandra wiki does mention that there is a way to do this.

http://wiki.apache.org/cassandra/HadoopSupport

But the information is old. It only covers version 0.7. And there's still
not a lot of information to go on in that wiki page.

So I was wondering if anyone has ever heard of someone connecting a recent
version of the community edition of Cassandra to Hadoop. And does anybody
know of a guide I can use to do this?

Thanks
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread Paulo Ricardo Motta Gomes
Hello,

This is a recurrent behavior of JVM GC in Cassandra that I never completely
understood: when a node is UP for many days (or even months), or receives a
very high load spike (3x-5x normal load), CMS GC pauses start becoming very
frequent and slow, causing periodic timeouts in Cassandra. Trying to run GC
manually doesn't free up memory. The only solution when a node reaches this
state is to restart the node.

We restart the whole cluster every 1 or 2 months, to avoid machines getting
into this crazy state. We tried tuning GC size and parameters, different
cassandra versions (1.1, 1.2, 2.0), but this behavior keeps happening. More
recently, during black friday, we received about 5x our normal load, and
some machines started presenting this behavior. Once again, we restart the
nodes an the GC behaves normal again.

I'm attaching a few pictures comparing the heap of "healthy" and "sick"
nodes: http://imgur.com/a/Tcr3w

You can clearly notice some memory is actually reclaimed during GC in
healthy nodes, while in sick machines very little memory is reclaimed.
Also, since GC is executed more frequently in sick machines, it uses about
2x more CPU than non-sick nodes.

Have you ever observed this behavior in your cluster? Could this be related
to heap fragmentation? Would using the G1 collector help in this case? Any
GC tuning or monitoring advice to troubleshoot this issue?

Any advice or pointers will be kindly appreciated.

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


Re: Date Tiered Compaction Strategy and collections

2014-11-28 Thread Eric Stevens
The underlying write time is still tracked for each value in the collection
- it's part of how conflict resolution is managed - but it's not exposed
through CQL.

On Fri Nov 28 2014 at 4:18:47 AM Batranut Bogdan  wrote:

> Hello all,
>
> If one has a table like this:
> id text,
> ts timestamp
> values list
>
> PK (id,ts)
>
>
> How will the DTCS work? I am asking this because the writeTime() function
> does not work on collections.
>


Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-28 Thread Eric Stevens
@Jens,

> will "inactive" CFs be released from C*'s memory after i.e. a few days
> or when under resource pressure?

No, certain memory structures are allocated and will remain resident on
each node for as long as the table exists.

> These CFs are used as "time buckets", but are to be kept for speedy
recovery

I would recommend a structure where you include time bucket as part of your
primary key and use a single column family for all time buckets.  Use TTL's
if you want this old data to expire automatically after some certain amount
of time.

On Fri Nov 28 2014 at 10:04:38 AM Robert Wille  wrote:

>  I would suggest that dynamic table creation is, in general, not a great
> idea, regardless of the database. I would seriously consider altering your
> approach to use a fixed set of tables.
>
>  On Nov 28, 2014, at 1:53 AM, Marcus Olsson 
> wrote:
>
>  Hi,
>
> We encountered the same problem and created a JIRA for it:
> https://issues.apache.org/jira/browse/CASSANDRA-8387 .
>
> /Marcus O
>
> On 11/27/2014 04:19 PM, DuyHai Doan wrote:
>
> Hello Peter
>
> For safe concurrent table creation, use CREATE TABLE xxx IF NOT EXISTS. It
> will use light weight transaction and you'll have to pay some penalty in
> term of performance but at least the table creation will be linearizable
> Le 27 nov. 2014 14:26, "Peter Lange"  a écrit :
>
>> Hi,
>>
>> We use a four-node Cassandra-Cluster in Version 2.1.2. Our
>> Client-Applications creates Tables dynamically. At one point two (or more)
>> of our Clients connected to two (or more) different Cassandra-Nodes will
>> create the same table simultaneously. We get the "Column family ID
>> mismatch"-Error-Messages on every node. Why is this simultanous schema
>> modification not possible? How can we handle this? Every Help is
>> appreciated.
>>
>> The lengthy Error-Messages from two nodes follows:
>>
>> On Node1 we got:
>>
>> INFO  [SharedPool-Worker-2] 2014-11-26 13:37:28,987
>> MigrationManager.java:248 - Create new ColumnFamily:
>> org.apache.cassandra.config.CFMetaData@7edad3a3
>> [cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,607 DefsTables.java:373 -
>> Loading org.apache.cassandra.config.CFMetaData@7adc8efd
>> [cfId=fbd24eb0-7568-11e4-bd04-b3ae3abaeff4,ksName=myplayground,cfName=test_table,
>> INFO  [MigrationStage:1] 2014-11-26 13:37:29,629
>> ColumnFamilyStore.java:284 - Initializing myplayground.test_table
>> ERROR [MigrationStage:1] 2014-11-26 13:37:30,282 CassandraDaemon.java:153
>> - Exception in thread Thread[MigrationStage:1,5,main]
>> java.lang.RuntimeException:
>> org.apache.cassandra.exceptions.ConfigurationException: Column family ID
>> mismatch (
>> found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
>> at
>> org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1171)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_25]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_25]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_25]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_25]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
>> Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column
>> family ID mismatch (found fbd275c0-7568-11e4-b9ea-3934eddce895; expected
>> fbd24eb0-7568-11e4-bd04-b3ae3abaeff4)
>> at
>> org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1254)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1186)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> at
>> org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1167)
>> ~[apache-cassandra-2.1.1.jar:2.1.1]
>> ... 11 common frames omitted
>>
>> On Node2 we got:
>>
>> INFO  [SharedPool-Worker-1] 2014-11-26 13:37:28,989
>> MigrationManager.java:248 - Create new ColumnFamily:
>

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
Your GC settings would be helpful, though you can see guesstimate by eyeballing 
(assuming settings are the same across all 4 images)

Bursty load can be a big cause of old gen fragmentation (as small working set 
objects tends to get spilled (promoted) along with memtable slabs which aren’t 
flushed quickly enough). That said, empty fragmentation holes wouldn’t show up 
as “used” in your graph, and that clearly looks like you are above your 
CMSIniatingOccupancyFraction and CMS is running continuously, so they probably 
aren’t the issue here.

Other than trying a slightly larger heap to give you more head room, I’d also 
suggest from eyeballing that you have probably let the JVM pick its own new gen 
size, and I’d suggest it is too small. What to set it to really depends on your 
workload, but you could try something in the 0.5gig range unless that makes 
your young gen pauses too long. In that case (or indeed anyway) make sure you 
also have the latest GC settings (e.g. -XX:+CMSParallelInitialMarkEnabled 
-XX:+CMSEdenChunksRecordAlways) on newer JVMs (to help the young gc pauses)

> On Nov 28, 2014, at 2:55 PM, Paulo Ricardo Motta Gomes 
>  wrote:
> 
> Hello,
> 
> This is a recurrent behavior of JVM GC in Cassandra that I never completely 
> understood: when a node is UP for many days (or even months), or receives a 
> very high load spike (3x-5x normal load), CMS GC pauses start becoming very 
> frequent and slow, causing periodic timeouts in Cassandra. Trying to run GC 
> manually doesn't free up memory. The only solution when a node reaches this 
> state is to restart the node.
> 
> We restart the whole cluster every 1 or 2 months, to avoid machines getting 
> into this crazy state. We tried tuning GC size and parameters, different 
> cassandra versions (1.1, 1.2, 2.0), but this behavior keeps happening. More 
> recently, during black friday, we received about 5x our normal load, and some 
> machines started presenting this behavior. Once again, we restart the nodes 
> an the GC behaves normal again.
> 
> I'm attaching a few pictures comparing the heap of "healthy" and "sick" 
> nodes: http://imgur.com/a/Tcr3w 
> 
> You can clearly notice some memory is actually reclaimed during GC in healthy 
> nodes, while in sick machines very little memory is reclaimed. Also, since GC 
> is executed more frequently in sick machines, it uses about 2x more CPU than 
> non-sick nodes.
> 
> Have you ever observed this behavior in your cluster? Could this be related 
> to heap fragmentation? Would using the G1 collector help in this case? Any GC 
> tuning or monitoring advice to troubleshoot this issue?
> 
> Any advice or pointers will be kindly appreciated.
> 
> Cheers,
> 
> -- 
> Paulo Motta
> 
> Chaordic | Platform
> www.chaordic.com.br 
> +55 48 3232.3200



smime.p7s
Description: S/MIME cryptographic signature


Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
I should note that the young gen size is just a tuning suggestion, not directly 
related to your problem at hand.

You might want to make sure you don’t have issues with key/row cache.

Also, I’m assuming that your extra load isn’t hitting tables that you wouldn’t 
normally be hitting.

> On Nov 28, 2014, at 6:54 PM, graham sanderson  wrote:
> 
> Your GC settings would be helpful, though you can see guesstimate by 
> eyeballing (assuming settings are the same across all 4 images)
> 
> Bursty load can be a big cause of old gen fragmentation (as small working set 
> objects tends to get spilled (promoted) along with memtable slabs which 
> aren’t flushed quickly enough). That said, empty fragmentation holes wouldn’t 
> show up as “used” in your graph, and that clearly looks like you are above 
> your CMSIniatingOccupancyFraction and CMS is running continuously, so they 
> probably aren’t the issue here.
> 
> Other than trying a slightly larger heap to give you more head room, I’d also 
> suggest from eyeballing that you have probably let the JVM pick its own new 
> gen size, and I’d suggest it is too small. What to set it to really depends 
> on your workload, but you could try something in the 0.5gig range unless that 
> makes your young gen pauses too long. In that case (or indeed anyway) make 
> sure you also have the latest GC settings (e.g. 
> -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways) on newer 
> JVMs (to help the young gc pauses)
> 
>> On Nov 28, 2014, at 2:55 PM, Paulo Ricardo Motta Gomes 
>> mailto:paulo.mo...@chaordicsystems.com>> 
>> wrote:
>> 
>> Hello,
>> 
>> This is a recurrent behavior of JVM GC in Cassandra that I never completely 
>> understood: when a node is UP for many days (or even months), or receives a 
>> very high load spike (3x-5x normal load), CMS GC pauses start becoming very 
>> frequent and slow, causing periodic timeouts in Cassandra. Trying to run GC 
>> manually doesn't free up memory. The only solution when a node reaches this 
>> state is to restart the node.
>> 
>> We restart the whole cluster every 1 or 2 months, to avoid machines getting 
>> into this crazy state. We tried tuning GC size and parameters, different 
>> cassandra versions (1.1, 1.2, 2.0), but this behavior keeps happening. More 
>> recently, during black friday, we received about 5x our normal load, and 
>> some machines started presenting this behavior. Once again, we restart the 
>> nodes an the GC behaves normal again.
>> 
>> I'm attaching a few pictures comparing the heap of "healthy" and "sick" 
>> nodes: http://imgur.com/a/Tcr3w 
>> 
>> You can clearly notice some memory is actually reclaimed during GC in 
>> healthy nodes, while in sick machines very little memory is reclaimed. Also, 
>> since GC is executed more frequently in sick machines, it uses about 2x more 
>> CPU than non-sick nodes.
>> 
>> Have you ever observed this behavior in your cluster? Could this be related 
>> to heap fragmentation? Would using the G1 collector help in this case? Any 
>> GC tuning or monitoring advice to troubleshoot this issue?
>> 
>> Any advice or pointers will be kindly appreciated.
>> 
>> Cheers,
>> 
>> -- 
>> Paulo Motta
>> 
>> Chaordic | Platform
>> www.chaordic.com.br 
>> +55 48 3232.3200
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: open source cassandra and hadoop

2014-11-28 Thread Jason Wee
There are two examples of hadoop with cassandra in the examples codes,
https://github.com/apache/cassandra/tree/trunk/examples/hadoop_word_count
https://github.com/apache/cassandra/tree/trunk/examples/hadoop_cql3_word_count


Does these help?

Jason

On Sat, Nov 29, 2014 at 2:30 AM, Tim Dunphy  wrote:

> Hey all,
>
>  I have a 3 node Cassandra cluster I would like to  hook into hadoop for
> processing the information in the Cassandra DB. I know that Datastax
> version of Cassandra includes support for Hadoop right out of the box. But
> I've been googling around and I don't see any good information on how to do
> this.
>
> The Cassandra wiki does mention that there is a way to do this.
>
> http://wiki.apache.org/cassandra/HadoopSupport
>
> But the information is old. It only covers version 0.7. And there's still
> not a lot of information to go on in that wiki page.
>
> So I was wondering if anyone has ever heard of someone connecting a recent
> version of the community edition of Cassandra to Hadoop. And does anybody
> know of a guide I can use to do this?
>
> Thanks
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


Re: open source cassandra and hadoop

2014-11-28 Thread Tim Dunphy
Great! Thanks for these suggestions. I'll look into these tomorrow.

Tim

Sent from my iPhone

> On Nov 28, 2014, at 10:31 PM, Jason Wee  wrote:
> 
> There are two examples of hadoop with cassandra in the examples codes, 
> https://github.com/apache/cassandra/tree/trunk/examples/hadoop_word_count 
> https://github.com/apache/cassandra/tree/trunk/examples/hadoop_cql3_word_count
>  
> 
> Does these help?
> 
> Jason
> 
>> On Sat, Nov 29, 2014 at 2:30 AM, Tim Dunphy  wrote:
>> Hey all,
>> 
>>  I have a 3 node Cassandra cluster I would like to  hook into hadoop for 
>> processing the information in the Cassandra DB. I know that Datastax version 
>> of Cassandra includes support for Hadoop right out of the box. But I've been 
>> googling around and I don't see any good information on how to do this.
>> 
>> The Cassandra wiki does mention that there is a way to do this. 
>> 
>> http://wiki.apache.org/cassandra/HadoopSupport
>> 
>> But the information is old. It only covers version 0.7. And there's still 
>> not a lot of information to go on in that wiki page. 
>> 
>> So I was wondering if anyone has ever heard of someone connecting a recent 
>> version of the community edition of Cassandra to Hadoop. And does anybody 
>> know of a guide I can use to do this?
>> 
>> Thanks
>> Tim
>> 
>> -- 
>> GPG me!!
>> 
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>