Re: is the select result grouped by the value of the partition key?

2013-04-14 Thread aaron morton
> 
>>> Is it guaranteed that the rows are grouped by the value of the
>>> partition key? That is, is it guaranteed that I'll get
yes.


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 7:24 PM, Sorin Manolache  wrote:

> On 2013-04-11 22:10, aaron morton wrote:
>>> Is it guaranteed that the rows are grouped by the value of the
>>> partition key? That is, is it guaranteed that I'll get
>> Your primary key (k1, k2) is considered in type parts (partition_key ,
>> grouping_columns). In your case the primary_key is key and the grouping
>> column in k2. Columns are ordered by the grouping columns, k2.
>> 
>> See http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
> 
> Thank you for the answer.
> 
> However my question was about the _grouping_ (not ordering) of _rows_ (not 
> columns).
> 
> Sorin
> 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/04/2013, at 3:19 AM, Sorin Manolache > > wrote:
>> 
>>> Hello,
>>> 
>>> Let us consider that we have a table t created as follows:
>>> 
>>> create table t(k1 vachar, k2 varchar, value varchar, primary key (k1,
>>> k2));
>>> 
>>> Its contents is
>>> 
>>> a m x
>>> a n y
>>> z 0 9
>>> z 1 8
>>> 
>>> and I perform a
>>> 
>>> select * from p where k1 in ('a', 'z');
>>> 
>>> Is it guaranteed that the rows are grouped by the value of the
>>> partition key? That is, is it guaranteed that I'll get
>>> 
>>> a m x
>>> a n y
>>> z 0 9
>>> z 1 8
>>> 
>>> or
>>> 
>>> a n y
>>> a m x
>>> z 1 8
>>> z 0 9
>>> 
>>> or even
>>> 
>>> z 0 9
>>> z 1 8
>>> a n y
>>> a m x
>>> 
>>> but NEVER
>>> 
>>> a m x
>>> z 0 9
>>> a n y
>>> z 1 8
>>> 
>>> 
>>> Thank you,
>>> Sorin
>> 
> 



Re: multiple Datacenter values in PropertyFileSnitch

2013-04-14 Thread aaron morton
> So that 2 apps with same and very high load pattern are not clashing.
I'm not sure what the advantage is of putting two apps in the same cluster, but 
using the replication strategy properties so they are on different nodes. The 
reason to put the apps in the same cluster is to share the resources. 
 
Having a different number of nodes in different DC's and mixing the RF between 
them can get complicated. 

What sort of load are you considering? IMHO the simple thing to do do some 
capacity planning and when in doubt start with one multi DC cluster with the 
same RF in both. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 7:33 PM, Andras Szerdahelyi 
 wrote:

> I would replicate your different keyspaces to different DCs and scale those 
> appropriately 
> So, for example, HighLoad KS replicates to really-huge-dc, which would have, 
> 10 nodes, LowerLoad KS replicates to smaller-dc with 5 nodes.
> The idea is , you do not mix your different keyspaces in the same datacenter 
> ( this is possible with NetworkTopology ) or for redundancy/HA purposes you 
> place a single replica in the other keyspace's DC but you direct your 
> applications to the "primary" DC of the keyspace, with LOCAL_QUORUM or ONE 
> reads.
> 
> Regards,
> Andras
> 
> From: Matthias Zeilinger 
> Reply-To: "user@cassandra.apache.org" 
> Date: Friday 12 April 2013 07:57
> To: "user@cassandra.apache.org" 
> Subject: RE: multiple Datacenter values in PropertyFileSnitch
> 
> I´m using for each application it´s own keyspace.
> What I want is to split up for different load patterns.
> So that 2 apps with same and very high load pattern are not clashing.
>  
> For other load patterns I want to use another splitting.
>  
> Is there any best practice or should I scale out, so that the complete load 
> can be distributed to on all nodes?
>  
> Br,
> Matthias Zeilinger
> Production Operation – Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Donnerstag, 11. April 2013 20:48
> To: user@cassandra.apache.org
> Subject: Re: multiple Datacenter values in PropertyFileSnitch
>  
> A node can only exist in one DC and one rack. 
>  
> Use different keyspaces as suggested. 
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/04/2013, at 1:47 AM, Jabbar Azam  wrote:
> 
> 
> Hello,
> 
> I'm not an expert but I don't think you can do what you want. The way to 
> separate data for applications on the same cluster is to use different tables 
> for different applications or use multiple keyspaces, a keyspace per 
> application. The replication factor you specify for each keyspace specifies 
> how many copies of the data are stored in each datacenter.
> 
> You can't specify that data for a particular application is stored on a 
> specific node, unless that node is in its own cluster.
> 
> I think of a cassandra cluster as a shared resource where all the 
> applications have access to all the nodes in the cluster.
>  
> 
> Thanks
> 
> Jabbar Azam
>  
> 
> On 11 April 2013 14:13, Matthias Zeilinger  
> wrote:
> Hi,
>  
> I would like to create big cluster for many applications.
> Within this cluster I would like to separate the data for each application, 
> which can be easily done via different virtual datacenters and the correct 
> replication strategy.
> What I would like to know, if I can specify for 1 node multiple values in the 
> PropertyFileSnitch configuration, so that I can use 1 node for more 
> applications?
> For example:
> 6 nodes:
> 3 for App A
> 3 for App B
> 4 for App C
>  
> I want to have such a configuration:
> Node 1 – DC-A& DC-C
> Node 2 – DC-B & DC-C
> Node 3 – DC-A & DC-C
> Node 4 – DC-B & DC-C
> Node 5 – DC-A
> Node 6 – DC-B
>  
> Is this possible or does anyone have another solution for this?
>  
>  
> Thx & br matthias
>  
>  



Re: Exception for version 1.1.0

2013-04-14 Thread aaron morton
Always read the news.txt guide 
https://github.com/apache/cassandra/blob/cassandra-1.2/NEWS.txt

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 8:54 PM, Winsdom Chen  wrote:

> Hi Aaron,
> Thanks for your reply! I've checked with release note, 
> the patch has applied in 1.2.3. If upgrade from 1.1.0 to
> 1.2.3, any data migration or other efforts?



Re: running cassandra on 8 GB servers

2013-04-14 Thread aaron morton
> ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
> Exception in thread Thread[Thrift:641,5,main]
> java.lang.OutOfMemoryError: Java heap space
It's easier for people to help if you provide the error stack. Does this happen 
at startup or after it has been running for a while.

What are the full JVM startup params? 
How many CF's do you have and how many rows per node ? 
Are you using the key cache and what is it set to?
Double check you are using the serialising row cache provider (in the yaml 
file). 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 8:53 AM, Nikolay Mihaylov  wrote:

> I am using 1.2.3, used default heap - 2 GB without JNA installed, 
> then modified heap to 4 GB / 400 MB young generation. + JNA installed.
> bloom filter on the CF's is lowered (more false positives, less disk space).
> 
>  WARN [ScheduledTasks:1] 2013-04-11 11:09:41,899 GCInspector.java (line 142) 
> Heap is 0.9885574036095974 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2013-04-11 11:09:41,906 StorageService.java (line 
> 3541) Flushing CFS(Keyspace='CRAWLER', ColumnFamily='counters') to relieve 
> memory pressure
>  INFO [ScheduledTasks:1] 2013-04-11 11:09:41,949 ColumnFamilyStore.java (line 
> 637) Enqueuing flush of Memtable-counters@862481781(711504/6211531 
> serialized/live bytes, 11810 ops)
> ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
> Exception in thread Thread[Thrift:641,5,main]
> java.lang.OutOfMemoryError: Java heap space
> 
> 
> On Thu, Apr 11, 2013 at 11:26 PM, aaron morton  
> wrote:
> > The data will be huge, I am estimating 4-6 TB per server. I know this is 
> > best, but those are my resources.
> You will have a very unhappy time.
> 
> The general rule of thumb / guideline for a HDD based system with 1G 
> networking is 300GB to 500Gb per node. See previous discussions on this topic 
> for reasons.
> 
> > ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
> > Exception in thread Thread[Thrift:641,5,main]
> > ...
> >  INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915 
> > ThriftServer.java (line 116) Stop listening to thrift clients
> What was the error ?
> 
> What version are you using?
> If you have changed any defaults for memory in cassandra-env.sh or 
> cassandra.yaml revert them. Generally C* will do the right thing and not OOM, 
> unless you are trying to store a lot of data on a node that does not have 
> enough memory. See this thread for background 
> http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/04/2013, at 7:35 AM, Nikolay Mihaylov  wrote:
> 
> > For one project I will need to run cassandra on following dedicated servers:
> >
> > Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally 
> > attached HDD's in some kind of RAID, visible as single HDD.
> >
> > I can do cluster of 20-30 such servers, may be even more.
> >
> > The data will be huge, I am estimating 4-6 TB per server. I know this is 
> > best, but those are my resources.
> >
> > Currently I am testing with one of such servers, except HDD is 300 GB. 
> > Every 15-20 hours, I get out of heap memory, e.g. something like:
> >
> > ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
> > Exception in thread Thread[Thrift:641,5,main]
> > ...
> >  INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915 
> > ThriftServer.java (line 116) Stop listening to thrift clients
> >  INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943 Gossiper.java 
> > (line 1077) Announcing shutdown
> >  INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613 
> > MessagingService.java (line 682) Waiting for messaging service to quiesce
> >  INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655 MessagingService.java 
> > (line 888) MessagingService shutting down server thread.
> > ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java 
> > (line 217) Error occurred during processing of message.
> > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
> > shut down
> >
> > Anyone have some advices about better utilization of such servers?
> >
> > Nick.
> 
> 



Re: running cassandra on 8 GB servers

2013-04-14 Thread aaron morton
> Hmmm, what is the recommendation for a 10G network if 1G was 300G to
> 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
> squeak out 600G to 1T?
Best thing to do would be run a test on how long it takes to repair or 
bootstrap a node. The 300GB to 500Gb was just a guideline.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 12:02 AM, "Hiller, Dean"  wrote:

> Hmmm, what is the recommendation for a 10G network if 1G was 300G to
> 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
> squeak out 600G to 1T?
> 
> Thanks,
> Dean
> 
> On 4/11/13 2:26 PM, "aaron morton"  wrote:
> 
>>> The data will be huge, I am estimating 4-6 TB per server. I know this
>>> is best, but those are my resources.
>> You will have a very unhappy time.
>> 
>> The general rule of thumb / guideline for a HDD based system with 1G
>> networking is 300GB to 500Gb per node. See previous discussions on this
>> topic for reasons.
>> 
>>> ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
>>> 164) Exception in thread Thread[Thrift:641,5,main]
>>> ...
>>> INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
>>> ThriftServer.java (line 116) Stop listening to thrift clients
>> What was the error ?
>> 
>> What version are you using?
>> If you have changed any defaults for memory in cassandra-env.sh or
>> cassandra.yaml revert them. Generally C* will do the right thing and not
>> OOM, unless you are trying to store a lot of data on a node that does not
>> have enough memory. See this thread for background
>> http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/04/2013, at 7:35 AM, Nikolay Mihaylov  wrote:
>> 
>>> For one project I will need to run cassandra on following dedicated
>>> servers:
>>> 
>>> Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
>>> attached HDD's in some kind of RAID, visible as single HDD.
>>> 
>>> I can do cluster of 20-30 such servers, may be even more.
>>> 
>>> The data will be huge, I am estimating 4-6 TB per server. I know this
>>> is best, but those are my resources.
>>> 
>>> Currently I am testing with one of such servers, except HDD is 300 GB.
>>> Every 15-20 hours, I get out of heap memory, e.g. something like:
>>> 
>>> ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
>>> 164) Exception in thread Thread[Thrift:641,5,main]
>>> ...
>>> INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
>>> ThriftServer.java (line 116) Stop listening to thrift clients
>>> INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943
>>> Gossiper.java (line 1077) Announcing shutdown
>>> INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
>>> MessagingService.java (line 682) Waiting for messaging service to quiesce
>>> INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655
>>> MessagingService.java (line 888) MessagingService shutting down server
>>> thread.
>>> ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
>>> (line 217) Error occurred during processing of message.
>>> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
>>> shut down
>>> 
>>> Anyone have some advices about better utilization of such servers?
>>> 
>>> Nick.
>> 
> 



Re: Repair hanges on 1.1.4

2013-04-14 Thread aaron morton
The errors from Hints are not concerned with repair. Increasing the rpc_timeout 
may help with those. If it's logging about 0 hints you may be seeing this 
https://issues.apache.org/jira/browse/CASSANDRA-5068

How did repair hang ? Check for progress with nodetool compactionstats and 
nodetool netstats. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 3:01 AM, Alexis Rodríguez  wrote:

> Adeel,
> 
> It may be a problem in the remote node, could you check the system.log?
> 
> Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an 
> increase in this parameter helps.
> 
> 
> 
> 
> 
> On Fri, Apr 12, 2013 at 9:17 AM,  wrote:
> Hi,
> 
> I have started repair on newly added node with -pr and this nodes exist on 
> another data center. I have 5MB internet connection and configured 
> setstreamthroughput 1. After some time repair goes hang and following meesage 
> found in logs;
> 
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
> Address DC  RackStatus State   Load
> Effective-Ownership Token
>   
>  169417178424467235000914166253263322299
> 10.0.0.3DC1 RAC1Up Normal  93.26 GB66.67% 
>  0
> 10.0.0.4DC1 RAC1Up Normal  89.1 GB 66.67% 
>  56713727820156410577229101238628035242
> 10.0.0.15   DC1 RAC1Up Normal  72.87 GB66.67% 
>  113427455640312821154458202477256070484
> 10.40.1.103 DC2 RAC1Up Normal  48.59 GB
> 100.00% 169417178424467235000914166253263322299
> 
> 
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java 
> (line 372) Timed out replaying hints to /10.40.1.103; aborting further 
> deliveries
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java 
> (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
> 
> Why we getting this message and how I prevent repair from this error.
> 
> Regards,
> 
> Adeel Akbar
> 



Re: unexplained hinted handoff

2013-04-14 Thread aaron morton
>> Do slow reads trigger hint storage?
No. 
But dropped read messages is often an indicator that the node is overwhelmed.

>>  If hints are being stored, doesn't that imply DOWN nodes, and why don't I 
>> see that in the logs?
Hints are stored for two reasons. First if the node is down when the write 
request starts, second if the node does not reply to the coordinator before 
rpc_timeout. If you are not seeing dropped write messages it may indicate 
network issues between the nodes. 

>> I'm seeing hinted handoff kick in on all our nodes during periods of
>> high activity,
Are you seeing log messages about hints been sent to nodes?

Cheers

  
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 8:23 AM, Dane Miller  wrote:

> On Fri, Apr 12, 2013 at 1:12 PM, Dane Miller  wrote:
>> I'm seeing hinted handoff kick in on all our nodes during periods of
>> high activity, but all the nodes seem to be up (according to the logs
>> and nodetool status).  The pattern in the logs is something like this:
>> 
>> 18:10:45 194 READ messages dropped in last 5000ms
>> 18:11:10 Started hinted handoff for host:
>> 7668c813-41a9-4d42-b362-5420528fefa0 with IP: /10
>> 18:11:11 Finished hinted handoff of 13 rows to endpoint /10
>> 
>> This happens on all the nodes every 10 min, and with a different
>> endpoint each time.  tpstats shows thousands of dropped reads, but no
>> other types of messages are dropped.
>> 
>> Do slow reads trigger hint storage?  If hints are being stored,
>> doesn't that imply DOWN nodes, and why don't I see that in the logs?
> 
> Sorry, meant to add: Cassandra 1.2.3, Ubuntu 12.04 x64



Re: CQL3 And ReversedTypes Question

2013-04-14 Thread aaron morton
> Bad Request: Type error: 
> org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318 cannot 
> be passed as argument 0 of function dateof of type timeuuid
> 
> Is there something I am missing here or should I open a new ticket?
Yes please. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 4:40 PM, Gareth Collins  wrote:

> OK, trying out 1.2.4. The previous issue seems to be fine, but I am 
> experiencing a new one:
> 
> cqlsh:location> create table test_y (message_id timeuuid, name text, PRIMARY 
> KEY (name,message_id));
> cqlsh:location> insert into test_y (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_y (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_y (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_y (message_id,name) VALUES (now(),'foo');
> cqlsh:location> select dateOf(message_id) from test_y;
> 
>  dateOf(message_id)
> --
>  2013-04-13 00:33:42-0400
>  2013-04-13 00:33:43-0400
>  2013-04-13 00:33:43-0400
>  2013-04-13 00:33:44-0400
> 
> cqlsh:location> create table test_x (message_id timeuuid, name text, PRIMARY 
> KEY (name,message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
> cqlsh:location> insert into test_x (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_x (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_x (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_x (message_id,name) VALUES (now(),'foo');
> cqlsh:location> insert into test_x (message_id,name) VALUES (now(),'foo');
> cqlsh:location> select dateOf(message_id) from test_x;
> Bad Request: Type error: 
> org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318 cannot 
> be passed as argument 0 of function dateof of type timeuuid
> 
> Is there something I am missing here or should I open a new ticket?
> 
> thanks in advance,
> Gareth
> 
> 
> On Tue, Mar 26, 2013 at 3:30 PM, Gareth Collins  
> wrote:
> Added:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5386
> 
> Thanks very much for the quick answer!
> 
> regards,
> Gareth
> 
> On Tue, Mar 26, 2013 at 3:55 AM, Sylvain Lebresne  
> wrote:
> > You aren't missing anything obvious. That's a bug really. Would you mind
> > opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA?
> >
> > --
> > Sylvain
> >
> >
> > On Tue, Mar 26, 2013 at 2:48 AM, Gareth Collins 
> > wrote:
> >>
> >> Hi,
> >>
> >> I created a table with the following structure in cqlsh (Cassandra
> >> 1.2.3 - cql 3):
> >>
> >> CREATE TABLE mytable ( column1 text,
> >>   column2 text,
> >>   messageId timeuuid,
> >>   message blob,
> >>   PRIMARY KEY ((column1, column2), messageId));
> >>
> >> I can quite happily add values to this table. e.g:
> >>
> >> insert into client_queue (column1,column2,messageId,message) VALUES
> >> ('string1','string2',now(),'ABCCDCC123');
> >>
> >> Yet if I decide I want to set the clustering order on messageId DESC:
> >>
> >> CREATE TABLE mytable ( column1 text,
> >>   column2 text,
> >>   messageId timeuuid,
> >>   message blob,
> >>   PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
> >> ORDER BY (messageId DESC);
> >>
> >> and try to do an insert:
> >>
> >> insert into client_queue2 (column1,column2,messageId,message) VALUES
> >> ('string1','string2',now(),'ABCCDCC123');
> >>
> >> I get the following error:
> >>
> >> Bad Request: Type error: cannot assign result of function now (type
> >> timeuuid) to messageid (type
> >>
> >> 'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')
> >>
> >> I am sure I am missing something obvious here, but I don't understand.
> >> Why am I getting an error? What do I need
> >> to do to be able to add an entry to this table?
> >>
> >> thanks in advance,
> >> Gareth
> >
> >
> 



Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-14 Thread aaron morton
That's better. 

The SSD size is a bit small, and be warned that you will want to leave 50Gb to 
100GB free to allow room for compaction (using the default size tiered). 

On the ram side you will want to run about 4GB (assuming cass 1.2) for the JVM 
the rest can be off heap Cassandra structures. This may not leave too much free 
space for the os page cache, but SSD may help there.

Cheers
  
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 4:47 PM, Jabbar Azam  wrote:

> What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb 
> ssds?
> 
> I know it will depend on our workload but will be better than a dual core 
> CPU. I think
> 
> Jabbar Azam
> 
> On 13 Apr 2013 01:05, "Edward Capriolo"  wrote:
> Duel core not the greatest you might run into GC issues before you run out of 
> IO from your ssd devices. Also cassandra has other concurrency settings that 
> are tuned roughly around the number of processors/cores. It is not uncommon 
> to see 4-6 cores of cpu (600 % in top dealing with young gen garbage managing 
> lots of sockets whatever.
> 
> 
> On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam  wrote:
> That's my guess. My colleague is still looking at CPU's so I'm hoping he can 
> get quad core CPU's for the servers.
> 
> Thanks
> 
> Jabbar Azam
> 
> 
> On 12 April 2013 16:48, Colin Blower  wrote:
> If you have not seen it already, checkout the Netflix blog post on their 
> performance testing of AWS SSD instances.
> 
> http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
> 
> My guess, based on very little experience, is that you will be CPU bound.
> 
> 
> On 04/12/2013 03:05 AM, Jabbar Azam wrote:
>> Hello,
>> 
>> I'm going to be building a 20 node cassandra cluster in one datacentre. The 
>> spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB 
>> RAM and two nics.
>> 
>> 
>> Has anybody done any performance testing with this setup or have any 
>> gotcha's I should be aware of wrt to the hardware?
>> 
>>  I do realise the CPU is fairly low computational power but I'm going to 
>> assume the system is going to be IO bound hence the RAM and SSD's.
>> 
>> 
>> Thanks
>> 
>> Jabbar Azam
> 
> 
> -- 
> Colin Blower
> Software Engineer
> Barracuda Networks Inc.
> +1 408-342-5576 (o)
> 
> 



Re: Extracting data from SSTable files with MapReduce

2013-04-14 Thread aaron morton
> The SSTable files are in the -f- format from 0.8.10.
If you can upgrade to the latest version it will make things easier. 
Start a node and use nodetool upgradesstables. 

The org.apache.cassandra.tools.SSTableExport class provides a blue print for 
reading rows from disk.

hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 7:58 PM, Jasper K.  wrote:

> Hi,
> 
> Does anyone have any experience with running a MapReduce directly against a 
> CF's SSTable files?
> 
> I have a use case where this seems to be an option. I want to export all data 
> from a CF to a flat file format for statistical analysis.
> 
> Some factors that make it (more) doable in my case:
> -The Cassandra instance is not 'on-line' (no writes- no reads)
> -The .db files were exported from another instance. I got them all in one 
> place now
> 
> The SSTable files are in the -f- format from 0.8.10.
> 
> Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it 
> should be possible to write a Hadoop RecordReader for Cassandra rowkeys.
> 
> But maybe I am not fully aware of what I am up to.
> 
> -- 
> 
> Jasper 



Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-14 Thread Jabbar Azam
Thanks Aaron.

Thanks

Jabbar Azam


On 14 April 2013 19:39, aaron morton  wrote:

> That's better.
>
> The SSD size is a bit small, and be warned that you will want to leave
> 50Gb to 100GB free to allow room for compaction (using the default size
> tiered).
>
> On the ram side you will want to run about 4GB (assuming cass 1.2) for the
> JVM the rest can be off heap Cassandra structures. This may not leave too
> much free space for the os page cache, but SSD may help there.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/04/2013, at 4:47 PM, Jabbar Azam  wrote:
>
> What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb
> ssds?
>
> I know it will depend on our workload but will be better than a dual core
> CPU. I think
>
> Jabbar Azam
> On 13 Apr 2013 01:05, "Edward Capriolo"  wrote:
>
>> Duel core not the greatest you might run into GC issues before you run
>> out of IO from your ssd devices. Also cassandra has other concurrency
>> settings that are tuned roughly around the number of processors/cores. It
>> is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
>> gen garbage managing lots of sockets whatever.
>>
>>
>> On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam  wrote:
>>
>>> That's my guess. My colleague is still looking at CPU's so I'm hoping he
>>> can get quad core CPU's for the servers.
>>>
>>> Thanks
>>>
>>> Jabbar Azam
>>>
>>>
>>> On 12 April 2013 16:48, Colin Blower  wrote:
>>>
  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)

>>>
>>>
>>
>


Re: Problems with shuffle

2013-04-14 Thread aaron morton
> How does Cassandra with vnodes exactly decide how many vnodes to move?
The num_tokens setting in the yaml file. What did you set this to?

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 11:56 AM, Rustam Aliyev  wrote:

> Just a followup on this issue. Due to the cost of shuffle, we decided not to 
> do it. Recently, we added new node and ended up in not well balanced cluster:
> 
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host ID   
> Rack
> UN  10.0.1.8  52.28 GB   260 18.3%  
> d28df6a6-c888-4658-9be1-f9e286368dce  rack1
> UN  10.0.1.11 55.21 GB   256 9.4%   
> 7b0cf3c8-0c42-4443-9b0c-68f794299443  rack1
> UN  10.0.1.2  49.03 GB   259 17.9%  
> 2d308bc3-1fd7-4fa4-b33f-cbbbdc557b2f  rack1
> UN  10.0.1.4  48.51 GB   255 18.4%  
> c253dcdf-3e93-495c-baf1-e4d2a033bce3  rack1
> UN  10.0.1.1  67.14 GB   253 17.9%  
> 4f77fd70-b134-486b-9c25-cfea96b6d412  rack1
> UN  10.0.1.3  47.65 GB   253 18.0%  
> 4d03690d-5363-42c1-85c2-5084596e09fc  rack1
> 
> It looks like new node took from each other node equal amount of vnodes - 
> which is good. However, it's not clear why it decided to have twice less than 
> other nodes.
> 
> How does Cassandra with vnodes exactly decide how many vnodes to move?
> 
> Btw, during JOINING nodetool status command does not show any information 
> about joining node. It appears only when join finished (on v1.2.3).
> 
> -- Rustam
> 
> 
> On 08/04/2013 22:33, Rustam Aliyev wrote:
>> After 2 days of endless compactions and streaming I had to stop this and 
>> cancel shuffle. One of the nodes even complained that there's no free disk 
>> space (grew from 30GB to 400GB). After all these problems number of the 
>> moved tokens were less than 40 (out of 1280!). 
>> 
>> Now, when nodes start they report duplicate ranges. I wonder how bad is that 
>> and how do I get rid of that? 
>> 
>>  INFO [GossipStage:1] 2013-04-09 02:16:37,920 StorageService.java (line 
>> 1386) Nodes /10.0.1.2 and /10.0.1.1 have the same token 
>> 99027485685976232531333625990885670910.  Ignoring /10.0.1.2 
>>  INFO [GossipStage:1] 2013-04-09 02:16:37,921 StorageService.java (line 
>> 1386) Nodes /10.0.1.2 and /10.0.1.4 have the same token 
>> 4319990986300976586937372945998718.  Ignoring /10.0.1.2 
>> 
>> Overall, I'm not sure how bad it is to leave data unshuffled (I read 
>> DataStax blog post, not clear). When adding new node wouldn't it be assigned 
>> ranges randomly from all nodes? 
>> 
>> Some other notes inline below: 
>> 
>> On 08/04/2013 15:00, Eric Evans wrote: 
>>> [ Rustam Aliyev ] 
 Hi, 
 
 After upgrading to the vnodes I created and enabled shuffle 
 operation as suggested. After running for a couple of hours I had to 
 disable it because nodes were not catching up with compactions. I 
 repeated this process 3 times (enable/disable). 
 
 I have 5 nodes and each of them had ~35GB. After shuffle operations 
 described above some nodes are now reaching ~170GB. In the log files 
 I can see same files transferred 2-4 times to the same host within 
 the same shuffle session. Worst of all, after all of these I had 
 only 20 vnodes transferred out of 1280. So if it will continue at 
 the same speed it will take about a month or two to complete 
 shuffle. 
>>> As Edward says, you'll need to issue a cleanup post-shuffle if you expect 
>>> to see disk usage match your expectations. 
>>> 
 I had few question to better understand shuffle: 
 
 1. Does disabling and re-enabling shuffle starts shuffle process from 
 scratch or it resumes from the last point? 
>>> It resumes. 
>>> 
 2. Will vnode reallocations speedup as shuffle proceeds or it will 
 remain the same? 
>>> The shuffle proceeds synchronously, 1 range at a time; It's not going to 
>>> speed up as it progresses. 
>>> 
 3. Why I see multiple transfers of the same file to the same host? e.g.: 
 
 INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038 
 StreamReplyVerbHandler.java (line 44) Successfully sent 
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db 
 to /10.0.1.8 
 INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427 
 StreamReplyVerbHandler.java (line 44) Successfully sent 
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db 
 to /10.0.1.8 
>>> I'm not sure, but perhaps that file contained data for two different 
>>> ranges? 
>> Does it mean that if I have huge file (e.g. 20GB) which contain a lot of 
>> ranges (let's say 100) it will be transferred each time (20GB*100)? 
>>> 
 4. When I enable/disable shuffle I receive warning messa

Re: Problems with shuffle

2013-04-14 Thread Rustam Aliyev

How does Cassandra with vnodes exactly decide how many vnodes to move?

The num_tokens setting in the yaml file. What did you set this to?

256, same as on all other nodes.



Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 11:56 AM, Rustam Aliyev  wrote:


Just a followup on this issue. Due to the cost of shuffle, we decided not to do 
it. Recently, we added new node and ended up in not well balanced cluster:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host ID 
  Rack
UN  10.0.1.8  52.28 GB   260 18.3%  
d28df6a6-c888-4658-9be1-f9e286368dce  rack1
UN  10.0.1.11 55.21 GB   256 9.4%   
7b0cf3c8-0c42-4443-9b0c-68f794299443  rack1
UN  10.0.1.2  49.03 GB   259 17.9%  
2d308bc3-1fd7-4fa4-b33f-cbbbdc557b2f  rack1
UN  10.0.1.4  48.51 GB   255 18.4%  
c253dcdf-3e93-495c-baf1-e4d2a033bce3  rack1
UN  10.0.1.1  67.14 GB   253 17.9%  
4f77fd70-b134-486b-9c25-cfea96b6d412  rack1
UN  10.0.1.3  47.65 GB   253 18.0%  
4d03690d-5363-42c1-85c2-5084596e09fc  rack1

It looks like new node took from each other node equal amount of vnodes - which 
is good. However, it's not clear why it decided to have twice less than other 
nodes.

How does Cassandra with vnodes exactly decide how many vnodes to move?

Btw, during JOINING nodetool status command does not show any information about 
joining node. It appears only when join finished (on v1.2.3).

-- Rustam


On 08/04/2013 22:33, Rustam Aliyev wrote:

After 2 days of endless compactions and streaming I had to stop this and cancel 
shuffle. One of the nodes even complained that there's no free disk space (grew 
from 30GB to 400GB). After all these problems number of the moved tokens were 
less than 40 (out of 1280!).

Now, when nodes start they report duplicate ranges. I wonder how bad is that 
and how do I get rid of that?

  INFO [GossipStage:1] 2013-04-09 02:16:37,920 StorageService.java (line 1386) 
Nodes /10.0.1.2 and /10.0.1.1 have the same token 
99027485685976232531333625990885670910.  Ignoring /10.0.1.2
  INFO [GossipStage:1] 2013-04-09 02:16:37,921 StorageService.java (line 1386) 
Nodes /10.0.1.2 and /10.0.1.4 have the same token 
4319990986300976586937372945998718.  Ignoring /10.0.1.2

Overall, I'm not sure how bad it is to leave data unshuffled (I read DataStax 
blog post, not clear). When adding new node wouldn't it be assigned ranges 
randomly from all nodes?

Some other notes inline below:

On 08/04/2013 15:00, Eric Evans wrote:

[ Rustam Aliyev ]

Hi,

After upgrading to the vnodes I created and enabled shuffle
operation as suggested. After running for a couple of hours I had to
disable it because nodes were not catching up with compactions. I
repeated this process 3 times (enable/disable).

I have 5 nodes and each of them had ~35GB. After shuffle operations
described above some nodes are now reaching ~170GB. In the log files
I can see same files transferred 2-4 times to the same host within
the same shuffle session. Worst of all, after all of these I had
only 20 vnodes transferred out of 1280. So if it will continue at
the same speed it will take about a month or two to complete
shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.


I had few question to better understand shuffle:

1. Does disabling and re-enabling shuffle starts shuffle process from
 scratch or it resumes from the last point?

It resumes.


2. Will vnode reallocations speedup as shuffle proceeds or it will
 remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.


3. Why I see multiple transfers of the same file to the same host? e.g.:

 INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
 StreamReplyVerbHandler.java (line 44) Successfully sent
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
 to /10.0.1.8
 INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
 StreamReplyVerbHandler.java (line 44) Successfully sent
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
 to /10.0.1.8

I'm not sure, but perhaps that file contained data for two different
ranges?

Does it mean that if I have huge file (e.g. 20GB) which contain a lot of ranges 
(let's say 100) it will be transferred each time (20GB*100)?

4. When I enable/disable shuffle I receive warning message such as
 below. Do I need to worry about it?

 cassandra-shuffle -h localhost disable
 Failed to enable shuffling on 10.0.1.1!
 Failed to enable shuffling on 10.0.1.3!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?

Yes, this is verbatim output. It reports failure 

Re: Rename failed while cassandra is starting up

2013-04-14 Thread aaron morton
> From the log messages, it looked like the table/keyspace was
> opened before the scrubDataDirectories was executed. This created a race
> condition between two threads.
Seems odd. 
AFAIK that startup is single threaded and the scrub runs before the tables are 
opened. See AbstractCassandraDaemon.setup()

> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
> (line 184) Creating new index :
> ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_mobilityZoneUUID'}
> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db

Looks like a secondary index is being created at startup and there is an error 
renaming the file. 
OR
The node was shut down before the index was built and it's been rebuilt at 
startup.

Both of these are async operations and cause a race with scrubDirectories(). 

Probably not the log replaying because it looks like the sstables have not been 
opened. 

I *think* the way around this is to um…. 
* move all existing data and commit log out of the way 
* start with node with -Dcassandra.join_ring=false JVM option in 
cassandra-env.sh
* check that all indexes are built using nodetool cfstats
* shut it down
* put the commit log and data dirs back in place. 

All we want to do is get the system KS updated, but in 1.0 that's a serialised 
object and not easy to poke. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 3:50 PM, Boris Yen  wrote:

> Hi All,
> 
> Recently, we encountered an error on 1.0.12 that prevented cassandra from
> starting up. From the log messages, it looked like the table/keyspace was
> opened before the scrubDataDirectories was executed. This created a race
> condition between two threads. One was trying to rename files while the
> other was trying to remove tmp files. I was wondering if anyone could
> provide us some information or workaround for this.
> 
> INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186)
> CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is
> 3.7553409423470883 (just-counted was 3.1413828689370487).  calculation took
> 2ms for 265 columns
> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes)
> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes)
> INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874
> ColumnFamilyStore.java (line 705) Enqueuing flush of
> Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live
> bytes, 275 ops)
> INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java
> (line 184) Creating new index : ColumnDefinition{name=6d65736853534944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_meshSSID'}
> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes)
> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes)
> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
> (line 184) Creating new index :
> ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_mobilityZoneUUID'}
> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
> at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
> at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276)
> at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
> at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.IOException: rename failed of
> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
> at
> org.

1.1.9 to 1.2.3 upgrade issue

2013-04-14 Thread John Watson
Started doing a rolling upgrade of nodes from 1.1.9 to 1.2.3 and nodes on
1.1.9 started flooding this error:

Exception in thread Thread[RequestResponseStage:19496,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
... 6 more

As I understand the Hints CF changed from 1.1.x to 1.2.x so I assume that's
the cause of the 1.2.3 nodes flooding (for various IPs still being 1.1.9):

Unable to store hint for host with missing ID, /10.37.62.71 (old node?)

Is this a known issue? Or rolling upgrade form 1.1.x to 1.2.x not possible?

Thanks,

John


re-execution of failed queries with rpc_timeout

2013-04-14 Thread Moty Kosharovsky
Hello,

I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
Our application constantly writes large updates with cql. Once in a while,
an rpc_time will occur.

Since a lot of the information is counters, its impossible for me to
understand if the updates complete partially on rpc_timeout, or cassandra
somehow rolls back the change completely, and hence I can't tell if I
should re-execute the query on rpc_timeout (with double processing being a
bigger concern than missing updates).

I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
rpc_timeout will always mean that the update was not processes as a whole.
In all other cases, the rpc_timeout might be thrown from a remote node (not
the one I'm connected to), and hence some parts of the update will be
performed and others parts will not.

Anyone solved this issue before?

Kind Regards,
Kosha


Re: re-execution of failed queries with rpc_timeout

2013-04-14 Thread Moty Kosharovsky
Sorry, not LOCAL QUORUM, I meant "ANY" quorum.


On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky wrote:

> Hello,
>
> I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
> Our application constantly writes large updates with cql. Once in a while,
> an rpc_time will occur.
>
> Since a lot of the information is counters, its impossible for me to
> understand if the updates complete partially on rpc_timeout, or cassandra
> somehow rolls back the change completely, and hence I can't tell if I
> should re-execute the query on rpc_timeout (with double processing being a
> bigger concern than missing updates).
>
> I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
> rpc_timeout will always mean that the update was not processes as a whole.
> In all other cases, the rpc_timeout might be thrown from a remote node (not
> the one I'm connected to), and hence some parts of the update will be
> performed and others parts will not.
>
> Anyone solved this issue before?
>
> Kind Regards,
> Kosha
>


Re: Rename failed while cassandra is starting up

2013-04-14 Thread Boris Yen
Hi Aaron,

"startup is single threaded and the scrub runs before the tables are opened
".

This is what I was thinking too. However, after using the debugger to trace
the code, I realized that MeteredFlusher (see the "countFlushBytes" method)
might open the sstables before the scrub is completed. I suppose this is
the cause of the exceptions I saw.

My plan is to add a boolean flag named "scrubCompleted" at
AbstractCassandraDaemon or StorageService. By default, it is false, after
the scrub is completed the AbstractCassandraDaemon needs to set it to true.
The MeterdFluster needs to make sure the scrub is completed by checking
this boolean value and starts to do all the calculation.

Is this a good plan? or it might have side effects?

Thanks and Regards,
Boris


On Mon, Apr 15, 2013 at 4:26 AM, aaron morton wrote:

> From the log messages, it looked like the table/keyspace was
> opened before the scrubDataDirectories was executed. This created a race
> condition between two threads.
>
> Seems odd.
> AFAIK that startup is single threaded and the scrub runs before the tables
> are opened. See AbstractCassandraDaemon.setup()
>
> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
>
> (line 184) Creating new index :
> ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_mobilityZoneUUID'}
> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
>
>
> Looks like a secondary index is being created at startup and there is an
> error renaming the file.
> OR
> The node was shut down before the index was built and it's been rebuilt at
> startup.
>
> Both of these are async operations and cause a race with
> scrubDirectories().
>
> Probably not the log replaying because it looks like the sstables have not
> been opened.
>
> I *think* the way around this is to um….
> * move all existing data and commit log out of the way
> * start with node with -Dcassandra.join_ring=false JVM option in
> cassandra-env.sh
> * check that all indexes are built using nodetool cfstats
> * shut it down
> * put the commit log and data dirs back in place.
>
> All we want to do is get the system KS updated, but in 1.0 that's a
> serialised object and not easy to poke.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/04/2013, at 3:50 PM, Boris Yen  wrote:
>
> Hi All,
>
> Recently, we encountered an error on 1.0.12 that prevented cassandra from
> starting up. From the log messages, it looked like the table/keyspace was
> opened before the scrubDataDirectories was executed. This created a race
> condition between two threads. One was trying to rename files while the
> other was trying to remove tmp files. I was wondering if anyone could
> provide us some information or workaround for this.
>
> INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186)
> CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is
> 3.7553409423470883 (just-counted was 3.1413828689370487).  calculation took
> 2ms for 265 columns
> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes)
> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes)
> INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874
> ColumnFamilyStore.java (line 705) Enqueuing flush of
> Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live
> bytes, 275 ops)
> INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java
> (line 184) Creating new index : ColumnDefinition{name=6d65736853534944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_meshSSID'}
> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes)
> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line
> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes)
> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
> (line 184) Creating new index :
> ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
> index_name='fmzd_ap_mobilityZoneUUID'}
> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db

AUTO : Samuel CARRIERE is out of the office (retour 22/04/2013)

2013-04-14 Thread Samuel CARRIERE


Je suis absent(e) du bureau jusqu'au 22/04/2013




Remarque : ceci est une réponse automatique à votre message  "re-execution
of failed queries with rpc_timeout" envoyé le 15/04/2013 3:12:45.

C'est la seule notification que vous recevrez pendant l'absence de cette
personne.

Added extra column as composite key while creation counter column family

2013-04-14 Thread Kuldeep Mishra
Hi,
   While I creating counter column family a extra column is being added
what I do ?
Table creation script
 CREATE TABLE counters (
  key text,
  value counter,
  PRIMARY KEY (key)
 ) WITH COMPACT STORAGE

after describing column family I am getting following
CREATE TABLE counters (
  key text,
 * column1 text,*
  value counter,
  PRIMARY KEY (key,* column1*)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

extra column column1 is added

Please help

-- 
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199