Read-repair working, repair not working?

2013-02-10 Thread Brian Fleming
 **

Hi,

** **

I have a 20 node cluster running v1.0.7 split between 5 data centres, each
with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 

** **

I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
brought online late last year with data consistency & availability: I’d
request data, nothing would be returned, I would then re-request the data
and it would correctly be returned: i.e. read-repair appeared to be
occurring.  However running repairs on the nodes didn’t resolve this (I
tried general ‘*repair’* commands as well as targeted keyspace commands) –
this didn’t alter the behaviour.

** **

After a lot of fruitless investigation, I decided to wipe &
re-install/re-populate the nodes.  The re-install & repair operations are
now complete: I see the expected amount of data on the nodes, however I am
still seeing the same behaviour, i.e. I only get data after one failed
attempt.

** **

When I run repair commands, I don’t see any errors in the logs. 

I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during
repair sessions.

I see a number of dropped ‘MUTATION’ operations : just under 5% of the
total ‘MutationStage’ count.

** **

Questions :

**-  **Could anybody suggest anything specific to look at to see
why the repair operations aren’t having the desired effect? 

**-  **Would increasing logging level to ‘DEBUG’ show read-repair
activity (to confirm that this is happening, when & for what proportion of
total requests)?

**-  **Is there something obvious that I could be missing here?

** **

Many thanks,

Brian

**


High CPU usage during repair

2013-02-10 Thread Tamar Fraenkel
Hi!
I run repair weekly, using a scheduled cron job.
During repair I see high CPU consumption, and messages in the log file
"INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
3894411264"
>From time to time, there are also messages of the form
"INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
(line 607) 1 READ messages dropped in last 5000ms"

Using opscenter, jmx and nodetool compactionstats I can see that during the
time the CPU consumption is high, there are compactions waiting.

I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
I have the default settings:
compaction_throughput_mb_per_sec: 16
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_preheat_key_cache: true

I am thinking on the following solution, and wanted to ask if I am on the
right track:
I thought of adding a call to my repair script, before repair starts to do:
nodetool setcompactionthroughput 0
and then when repair finishes call
nodetool setcompactionthroughput 16

Is this a right solution?
Thanks,
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Netflix/Astynax Client for Cassandra

2013-02-10 Thread Renato Marroquín Mogrovejo
Sorry to hijack into this email thread, but what are the use
cases/benefits of using the new binary protocol? and why doesn't
Cassandra offer a drive as a project driver?


Renato M.

2013/2/8 aaron morton :
> I'm going to guess Netflix are running Astynax in production with Cassandra
> 1.1.
>
> cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/02/2013, at 6:50 AM, Cassa L  wrote:
>
> Thank you all for the responses to this thread. I am  planning to use
> Cassandra 1.1.9 with Astynax. Does anyone has Cassandra 1.x version running
> in production with astynax? Did you come across any show-stopper issues?
>
> Thanks
> LCassa
>
>
> On Thu, Feb 7, 2013 at 8:50 AM, Bartłomiej Romański  wrote:
>>
>> Hi,
>>
>> Does anyone know how about virtual nodes support in Astynax? Are they
>> handled correctly? Especially with ConnectionPoolType.TOKEN_AWARE?
>>
>> Thanks,
>> BR
>
>
>


Re: High CPU usage during repair

2013-02-10 Thread aaron morton
> During repair I see high CPU consumption, 
Repair reads the data and computes a hash, this is a CPU intensive operation.
Is the CPU over loaded or is just under load?

> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
What machine size?

> there are compactions waiting.
That's normally ok. How many are waiting?

> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
That will remove throttling on compaction and the validation compaction used 
for the repair. Which may in turn add additional IO load, CPU load and GC 
pressure. You probably do not want to do this. 

Try reducing the compaction throughput to say 12 normally and see the effect.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 1:01 AM, Tamar Fraenkel  wrote:

> Hi!
> I run repair weekly, using a scheduled cron job.
> During repair I see high CPU consumption, and messages in the log file
> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) 
> GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264"
> From time to time, there are also messages of the form
> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 
> 607) 1 READ messages dropped in last 5000ms"
> 
> Using opscenter, jmx and nodetool compactionstats I can see that during the 
> time the CPU consumption is high, there are compactions waiting.
> 
> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> I have the default settings:
> compaction_throughput_mb_per_sec: 16
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_preheat_key_cache: true
> 
> I am thinking on the following solution, and wanted to ask if I am on the 
> right track:
> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
> 
> Is this a right solution?
> Thanks,
> Tamar
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> 
> 
> ta...@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 



Re: Read-repair working, repair not working?

2013-02-10 Thread aaron morton
> I’d request data, nothing would be returned, I would then re-request the data 
> and it would correctly be returned:
> 
What CL are you using for reads and writes?

> I see a number of dropped ‘MUTATION’ operations : just under 5% of the total 
> ‘MutationStage’ count.
> 
Dropped mutations in a multi DC setup may be a sign of network congestion or 
overloaded nodes. 


> -  Could anybody suggest anything specific to look at to see why the 
> repair operations aren’t having the desired effect? 
> 
I would first build a test case to ensure correct operation when using strong 
consistency. i.e. QUOURM write and read. Because you are using RF 2 per DC I 
assume you are not using LOCAL_QUOURM because that is 2 and you would not have 
any redundancy in the DC. 

 
> 
> -  Would increasing logging level to ‘DEBUG’ show read-repair 
> activity (to confirm that this is happening, when & for what proportion of 
> total requests)?
It would, but the INFO logging for the AES is pretty good. I would hold off for 
now. 

> 
> -  Is there something obvious that I could be missing here?
When a new AES session starts it logs this

logger.info(String.format("[repair #%s] new session: will sync %s 
on range %s for %s.%s", getName(), repairedNodes(), range, tablename, 
Arrays.toString(cfnames)));

When it completes it logs this

logger.info(String.format("[repair #%s] session completed successfully", 
getName()));

Or this on failure 

logger.error(String.format("[repair #%s] session completed with the following 
error", getName()), exception);


Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 9:56 PM, Brian Fleming  wrote:

> 
>  
> 
> Hi,
> 
>  
> 
> I have a 20 node cluster running v1.0.7 split between 5 data centres, each 
> with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 
> 
>  
> 
> I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I 
> brought online late last year with data consistency & availability: I’d 
> request data, nothing would be returned, I would then re-request the data and 
> it would correctly be returned: i.e. read-repair appeared to be occurring.  
> However running repairs on the nodes didn’t resolve this (I tried general 
> ‘repair’ commands as well as targeted keyspace commands) – this didn’t alter 
> the behaviour.
> 
>  
> 
> After a lot of fruitless investigation, I decided to wipe & 
> re-install/re-populate the nodes.  The re-install & repair operations are now 
> complete: I see the expected amount of data on the nodes, however I am still 
> seeing the same behaviour, i.e. I only get data after one failed attempt.
> 
>  
> 
> When I run repair commands, I don’t see any errors in the logs. 
> 
> I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during 
> repair sessions.
> 
> I see a number of dropped ‘MUTATION’ operations : just under 5% of the total 
> ‘MutationStage’ count.
> 
>  
> 
> Questions :
> 
> -  Could anybody suggest anything specific to look at to see why the 
> repair operations aren’t having the desired effect? 
> 
> -  Would increasing logging level to ‘DEBUG’ show read-repair 
> activity (to confirm that this is happening, when & for what proportion of 
> total requests)?
> 
> -  Is there something obvious that I could be missing here?
> 
>  
> 
> Many thanks,
> 
> Brian
> 
>  
> 



Re: High CPU usage during repair

2013-02-10 Thread Tamar Fraenkel
Hi!
Thanks for the response.
See my answers and questions below.
Thanks!
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Sun, Feb 10, 2013 at 10:04 PM, aaron morton wrote:

> During repair I see high CPU consumption,
>
> Repair reads the data and computes a hash, this is a CPU intensive
> operation.
> Is the CPU over loaded or is just under load?
>
 Usually just load, but in the past two weeks I have seen CPU of over 90%!

> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
>
> What machine size?
>
m1.large

>
> there are compactions waiting.
>
> That's normally ok. How many are waiting?
>
> I have seen 4 this morning

> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
>
> That will remove throttling on compaction and the validation compaction
> used for the repair. Which may in turn add additional IO load, CPU load and
> GC pressure. You probably do not want to do this.
>
> Try reducing the compaction throughput to say 12 normally and see the
> effect.
>
> Just to make sure I understand you correctly, you suggest that I change
throughput to 12 regardless of whether repair is ongoing or not. I will do
it using nodetool and change the yaml file in case a restart will occur in
the future?

> Cheers
>
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/02/2013, at 1:01 AM, Tamar Fraenkel  wrote:
>
> Hi!
> I run repair weekly, using a scheduled cron job.
> During repair I see high CPU consumption, and messages in the log file
> "INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
> 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
> 3894411264"
> From time to time, there are also messages of the form
> "INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
> (line 607) 1 READ messages dropped in last 5000ms"
>
> Using opscenter, jmx and nodetool compactionstats I can see that during
> the time the CPU consumption is high, there are compactions waiting.
>
> I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
> I have the default settings:
> compaction_throughput_mb_per_sec: 16
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_preheat_key_cache: true
>
> I am thinking on the following solution, and wanted to ask if I am on the
> right track:
> I thought of adding a call to my repair script, before repair starts to do:
> nodetool setcompactionthroughput 0
> and then when repair finishes call
> nodetool setcompactionthroughput 16
>
> Is this a right solution?
> Thanks,
> Tamar
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> 
>
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
<>

Re: Issues with writing data to Cassandra column family using a Hive script

2013-02-10 Thread aaron morton
Don't use the variable length Cassandra integer, use the Int32Type. It also 
sounds like you want to use a DoubleType rather than FloatType. 
http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping
 
Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi  wrote:

> Hi All,
> 
> Data was originally stored in column family called "test_cf". Definition of 
> column family is as follows:
> 
> CREATE COLUMN FAMILY test_cf 
> WITH COMPARATOR = 'IntegerType' 
>  AND key_validation_class = UTF8Type 
>  AND default_validation_class = FloatType;  
> 
> And, following is the sample data set that contains in "test_cf".
> 
> cqlsh:temp_ks> select * from test_cf;
>  key| column1| value
> --++---
>  localhost:8282 | 1350468600 |76
>  localhost:8282 | 1350468601 |76
> 
> 
> Hive script (shown in the end of mail) is use to take the data from above 
> column family "test_cf" and insert into a new column family called 
> "cpu_avg_5min_new7". Column family description of "cpu_avg_5min_new7" is also 
> same as the test_cf. Issue is, data written in to "cpu_avg_5min_new7" column 
> family after executing the hive script is as follows. It's not in the format  
> of data present in the original column family "test_cf". Any explanations 
> would highly appreciate..
> 
> 
> cqlsh:temp_ks> select * from cpu_avg_5min_new7;
>  key| column1  | value
> --+--+--
>  localhost:8282 | 232340574229062170849328 | 1.09e-05
>  localhost:8282 | 232340574229062170849329 | 1.09e-05
> 
> 
> Hive script:
> 
> drop table cpu_avg_5min_new7_hive;
> CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, 
> start_time INT, cpu_avg FLOAT) STORED BY 
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH 
> SERDEPROPERTIES (
>  "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , 
> "cassandra.ks.name" = "temp_ks" , 
>  "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" , 
>  "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" = 
> "cpu_avg_5min_new7" ); 
> 
> drop table xxx;
> CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, 
> cpu_avg FLOAT) STORED BY
>  'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH 
> SERDEPROPERTIES ( 
>  "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , 
> "cassandra.ks.name" = "temp_ks" ,
>   "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
>"cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" 
> = "test_cf" );
> 
> insert overwrite table cpu_avg_5min_new7_hive select 
> src_id,start_time,cpu_avg from xxx;
> 
> Regards,
> Dinusha.
> 
> 



Re: Cassandra 1.1.2 -> 1.1.8 upgrade

2013-02-10 Thread aaron morton
I would do #1.

You can play with nodetool setcompactionthroughput to speed things up, but 
beware nothing comes for free.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 6:40 AM, Mike  wrote:

> Thank you,
> 
> Another question on this topic.
> 
> Upgrading from 1.1.2->1.1.9 requires running upgradesstables, which will take 
> many hours on our dataset (about 12).  For this upgrade, is it recommended 
> that I:
> 
> 1) Upgrade all the DB nodes to 1.1.9 first, then go around the ring and run a 
> staggered upgrade of the sstables over a number of days.
> 2) Upgrade one node at a time, running the clustered in a mixed 1.1.2->1.1.9 
> configuration for a number of days.
> 
> I would prefer #1, as with #2, streaming will not work until all the nodes 
> are upgraded.
> 
> I appreciate your thoughts,
> -Mike
> 
> On 1/16/2013 11:08 AM, Jason Wee wrote:
>> always check NEWS.txt for instance for cassandra 1.1.3 you need to run 
>> nodetool upgradesstables if your cf has counter.
>> 
>> 
>> On Wed, Jan 16, 2013 at 11:58 PM, Mike  wrote:
>> Hello,
>> 
>> We are looking to upgrade our Cassandra cluster from 1.1.2 -> 1.1.8 (or 
>> possibly 1.1.9 depending on timing).  It is my understanding that rolling 
>> upgrades of Cassandra is supported, so as we upgrade our cluster, we can do 
>> so one node at a time without experiencing downtime.
>> 
>> Has anyone had any gotchas recently that I should be aware of before 
>> performing this upgrade?
>> 
>> In order to upgrade, is the only thing that needs to change are the JAR 
>> files?  Can everything remain as-is?
>> 
>> Thanks,
>> -Mike
>> 
> 



Re: Cassandra flush spin?

2013-02-10 Thread aaron morton
Sounds like flushing due to memory consumption. 

The flush log messages include the number of ops, so you can see if this node 
was processing more mutations that the others. Try to see if there was more 
(serialised) data being written or more operations being processed. 

Also just for fun check the JVM and yaml settings are as expected. 

Cheers 


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 6:29 AM, Mike  wrote:

> Hello,
> 
> We just hit a very odd issue in our Cassandra cluster.  We are running 
> Cassandra 1.1.2 in a 6 node cluster.  We use a replication factor of 3, and 
> all operations utilize LOCAL_QUORUM consistency.
> 
> We noticed a large performance hit in our application's maintenance 
> activities and I've been investigating.  I discovered a node in the cluster 
> that was flushing a memtable like crazy.  It was flushing every 2->3 minutes, 
> and has been apparently doing this for days. Typically, during this time of 
> day, a flush would happen every 30 minutes or so.
> 
> alldb.sh "cat /var/log/cassandra/system.log | grep \"flushing high-traffic 
> column family CFS(Keyspace='open', ColumnFamily='msgs')\" | grep 02-08 | wc 
> -l"
> [1] 18:41:04 [SUCCESS] db-1c-1
> 59
> [2] 18:41:05 [SUCCESS] db-1c-2
> 48
> [3] 18:41:05 [SUCCESS] db-1a-1
> 1206
> [4] 18:41:05 [SUCCESS] db-1d-2
> 54
> [5] 18:41:05 [SUCCESS] db-1a-2
> 56
> [6] 18:41:05 [SUCCESS] db-1d-1
> 52
> 
> 
> I restarted the database node, and, at least for now, the problem appears to 
> have stopped.
> 
> There are a number of things that don't make sense here.  We use a 
> replication factor of 3, so if this was being caused by our application, I 
> would have expected 3 nodes in the cluster to have issues.  Also, I would 
> have expected the issue to continue once the node restarted.
> 
> Another information point of interest, and I'm wondering if its exposed a 
> bug, was this node was recently converted to use ephemeral storage on EC2, 
> and was restored from a snapshot.  After the restore, a nodetool repair was 
> run.  However, repair was going to run into some heavy activity for our 
> application, and we canceled that validation compaction (2 of the 3 
> anti-entropy sessions had completed).  The spin appears to have started at 
> the start of the second session.
> 
> Any hints?
> 
> -Mike
> 
> 
> 
> 
> 



Re: persisted ring state

2013-02-10 Thread aaron morton
>  Is that the right way to do?
No. 
If you want to change the token for a node use nodetool move. 

Changing it like this will not make the node change it's token. Because after 
startup the token is stored in the System.LocationInfo CF. 

> or -Dcassandra.load_ring_state=false|true is only limited to changes to 
> seed/listen_address ?
it's used when a node somehow as a bad view of the ring, and you want it to 
forget things. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 3:35 AM, S C  wrote:

> In one of the scenarios that I encountered, I needed to change the token on 
> the node. I added new token and started the node with
> -Dcassandra.load_ring_state=false in anticipation that the node will not pick 
> from the locally persisted data. Is that the right way to do? or 
> -Dcassandra.load_ring_state=false|true is only limited to changes to 
> seed/listen_address ?
> 
> 
> Thanks,
> SC



CQL 3 compound row key error

2013-02-10 Thread Shahryar Sedghi
I am moving my application from 1.1 to 1.2.1  to utilize secondary index
and simplify the data model. In 1.1 I was concentrating some fields into
one separated by ":" for the row key and it was a big string. In V1.2 I use
compound rows key showed in the following test case (interval and seq):


CREATE TABLE  test(
interval text,
seq int,
id int,
severity int,
PRIMARY KEY ((interval, seq), id))
WITH CLUSTERING ORDER BY (id DESC);
--
CREATE INDEX ON test(severity);


 select * from test where severity = 3 and  interval = 't' and seq =1;

results:

Bad Request: Start key sorts after end key. This is not allowed; you
probably should not specify end key at all under random partitioner

If I define the table as this:

CREATE TABLE  test(
interval text,
id int,
severity int,
PRIMARY KEY (interval, id))
WITH CLUSTERING ORDER BY (id DESC);

 select * from test where severity = 3 and  interval = 't1';

Works fine. Is it a bug?

Thanks in Advance

Shahryar



-- 
"Life is what happens while you are making other plans." ~ John Lennon


Re: Issues with writing data to Cassandra column family using a Hive script

2013-02-10 Thread Dinusha Dilrukshi
Hi Aaron,

Thanks for the reply.. I ll try out your suggestion.

Regards,
Dinusha.

On Mon, Feb 11, 2013 at 1:55 AM, aaron morton wrote:

> Don't use the variable length Cassandra integer, use the Int32Type. It
> also sounds like you want to use a DoubleType rather than FloatType.
>
> http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi 
> wrote:
>
> Hi All,
>
> Data was originally stored in column family called "test_cf". Definition
> of column family is as follows:
>
> CREATE COLUMN FAMILY test_cf
> WITH COMPARATOR = 'IntegerType'
>  AND key_validation_class = UTF8Type
>  AND default_validation_class = FloatType;
>
> And, following is the sample data set that contains in "test_cf".
>
> cqlsh:temp_ks> select * from test_cf;
>  key| column1| value
> --++---
>  localhost:8282 | 1350468600 |76
>  localhost:8282 | 1350468601 |76
>
>
> Hive script (shown in the end of mail) is use to take the data from above
> column family "test_cf" and insert into a new column family
> called "cpu_avg_5min_new7". Column family description
> of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written
> in to "cpu_avg_5min_new7" column family after executing the hive script is
> as follows. It's not in the format  of data present in the original column
> family "test_cf". Any explanations would highly appreciate..
>
>
> cqlsh:temp_ks> select * from cpu_avg_5min_new7;
>  key| column1  | value
> --+--+--
>  localhost:8282 | 232340574229062170849328 | 1.09e-05
>  localhost:8282 | 232340574229062170849329 | 1.09e-05
>
>
> Hive script:
> 
> drop table cpu_avg_5min_new7_hive;
> CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING,
> start_time INT, cpu_avg FLOAT) STORED BY
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
> SERDEPROPERTIES (
>  "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
> cassandra.ks.name" = "temp_ks" ,
>  "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
>  "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
> = "cpu_avg_5min_new7" );
>
> drop table xxx;
> CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT,
> cpu_avg FLOAT) STORED BY
>  'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
> SERDEPROPERTIES (
>  "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
> cassandra.ks.name" = "temp_ks" ,
>   "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
>"cassandra.columns.mapping" = ":key,:column,:value" , "
> cassandra.cf.name" = "test_cf" );
>
> insert overwrite table cpu_avg_5min_new7_hive select
> src_id,start_time,cpu_avg from xxx;
>
> Regards,
> Dinusha.
>
>
>
>


Querying composite keys

2013-02-10 Thread Rishabh Agrawal
Hello

I have key and columns defined in following fashion:



HotelName1:RoomNum1

HotelName2:RoomNum2

HotelName3:RoomNum3

Key1:TimeStamp:VersionNum









Is there a way that I can query this schema by only 'key' or 'HotelName' i.e. 
querying using a part of composite key and not the full key.


Thanks and Regards
Rishabh Agrawal









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Querying composite keys

2013-02-10 Thread Vivek Mishra
You can query over composite columns as:
1) Partition key
2) First part of clustered key(using EQ ops).

Secondary indexes over non composite columns are not possible.

-Vivek
On Mon, Feb 11, 2013 at 12:06 PM, Rishabh Agrawal <
rishabh.agra...@impetus.co.in> wrote:

>  Hello
>
>
>
> I have key and columns defined in following fashion:
>
>
>
>
>
> HotelName1:RoomNum1
>
> HotelName2:RoomNum2
>
> HotelName3:RoomNum3
>
> Key1:TimeStamp:VersionNum
>
>
>
>
>
>
>
>
>
>
>
> Is there a way that I can query this schema by only ‘key’ or ‘HotelName’
> i.e. querying using a part of composite key and not the full key.
>
>
>
>
>
> Thanks and Regards
>
> Rishabh Agrawal
>
>
>
> --
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: Cassandra 1.1.2 -> 1.1.8 upgrade

2013-02-10 Thread Michal Michalski



2) Upgrade one node at a time, running the clustered in a mixed
1.1.2->1.1.9 configuration for a number of days.


I'm about to upgrade my 1.1.0 cluster and
http://www.datastax.com/docs/1.1/install/upgrading#info says:

"If you are upgrading to Cassandra 1.1.9 from a version earlier than 
1.1.7, all nodes must be upgraded before any streaming can take place. 
Until you upgrade all nodes, you cannot add version 1.1.7 nodes or later 
to a 1.1.7 or earlier cluster."


Which one is correct then? Can I run mixed 1.1.2 (in my case 1.1.0) & 
1.1.9 cluster or not?


M.