Re: Consistency level 256

2013-11-19 Thread Sylvain Lebresne
256 is clearly not a valid CL code. It's of course always possible that the
client sends something perfectly valid and the server interprets it badly
for some reason, but it's a lot more likely a priori that the driver just
sends something wrong. In any case, since as far as I know no-one has seen
that with any other driver, you'd probably want to track that down with the
gocql authors.

--
Sylvain


On Tue, Nov 19, 2013 at 2:13 AM, Ben Hood <0x6e6...@gmail.com> wrote:

> I'm not sure that this is entirely causal, but the error I was getting
> occurred when the batch size I was accumulating was greater than 130K,
> so by cutting the batch size down, I made the issue go away for now.
> Having a such a large batch size is probably not such a good idea, but
> I'm not really sure it that really the cause of this issue.
>
> On Tue, Nov 19, 2013 at 12:56 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> > Hi,
> >
> > Using 2.0.2 with the gocql driver, I'm getting this intermittent error:
> >
> > "Unknown code 256 for a consistency level"
> >
> > Is this something that the server could be returning, or is this maybe
> > only a client side issue?
> >
> > Cheers,
> >
> > Ben
>


Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

2013-11-19 Thread Laing, Michael
We had a similar problem when our nodes could not sync using ntp due to VPC
ACL settings. -ml


On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt wrote:

> Hi all,
>
> I am attempting to bring up our new app on a 3-node cluster and am having
> problems with frequent read timeouts and slow inter-node replication.
> Initially, these errors were mostly occurring in our app server, affecting
> 0.02%-1.0% of our queries in an otherwise unloaded cluster. No exceptions
> were logged on the servers in this case, and reads in a single node
> environment with the same code and client driver virtually never see
> exceptions like this, so I suspect problems with the inter-cluster
> communication between nodes.
>
> The 3 nodes are deployed in a single AWS VPC, and are all in a common
> subnet. The Cassandra version is 2.0.2 following an upgrade this past
> weekend due to NPEs in a secondary index that were affecting certain
> queries under 2.0.1. The servers are m1.large instances running AWS Linux
> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
> All database contents are CQL tables with replication factor of 3, and the
> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>
> In testing with the application, I noticed this afternoon that the
> contents of the 3 nodes differed in their respective copies of the same
> table for newly written data, for time periods exceeding several minutes,
> as reported by cqlsh on each node. Specifying different hosts from the same
> server using cqlsh also exhibited timeouts on multiple attempts to connect,
> and on executing some queries, though they eventually succeeded in all
> cases, and eventually the data in all nodes was fully replicated.
>
> The AWS servers have a security group with only ports 22, 7000, 9042, and
> 9160 open.
>
> At this time, it seems that either I am still missing something in my
> cluster configuration, or maybe there are other ports that are needed for
> inter-node communication.
>
> Any advice/suggestions would be appreciated.
>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobe...@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>


CassandraStorage problem in Pig

2013-11-19 Thread Michael Spertus
 I am trying to use CassandraStorage in pig, and I am getting the error
"Invalid token information returned by describe_ring: {}" whenever I try to
output to Cassandra. What could I be doing wrong?

Thanks,

Mike


Wiki popularity

2013-11-19 Thread Jonathan Ellis
We've started counting visits to the wiki pages so we can use that
information to prioritize which pages to improve.  Here's what that
looks like, for the past ~24h:

1,431 wiki.apache.org/cassandra/GettingStarted
366   wiki.apache.org/cassandra/FAQ
284   wiki.apache.org/cassandra/Operations
238   wiki.apache.org/cassandra/FrontPage
209   wiki.apache.org/cassandra/HadoopSupport
209   wiki.apache.org/cassandra/NodeTool
206   wiki.apache.org/cassandra/DebianPackaging
168   wiki.apache.org/cassandra/CassandraCli
159   wiki.apache.org/cassandra/ArchitectureOverview
149   wiki.apache.org/cassandra/ClientOptions
135   wiki.apache.org/cassandra/DataModel
117   wiki.apache.org/cassandra/API
90wiki.apache.org/cassandra/CassandraLimitations
85wiki.apache.org/cassandra/SecondaryIndexes
74wiki.apache.org/cassandra/StorageConfiguration
71wiki.apache.org/cassandra/MemtableSSTable
66wiki.apache.org/cassandra/Administration%20Tools
61wiki.apache.org/cassandra/RunningCassandra

(GettingStarted is by far the most viewed, which is not surprising
since it's linked from the cassanra.a.o front page.)

If you'd like to help improve any of these, and aren't already on the
wiki contributors whitelist, please contact me.  We had to add the
whitelist to stop spam.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Managing vnodes cluster and configuring rpc_address with EC2MultiRegionSnitch

2013-11-19 Thread Alain RODRIGUEZ
Hi

We recently switched to vnodes on our production cluster. I have some
operational questions about managing a vnodes cluster :

- Can we add multiple nodes at once (The previous rule was no more than one
server per range at the same time...)
- Can we decommission multiple nodes at the same time (I used to do it when
not on the same part of the ring - taking replicas into account).
- Is there a way to distribute smartly heavy operations such as repair or
cleanup through nodes ? (Using a RF of 3, previously I just use to launch a
repair or cleanup on one node out of 3 in token order. With this I was sure
that at least 2 nodes out of 3 weren't performing a repair and were
available to answer queries fast enough, is there a way to reproduce this
behavior using vnodes ?)

If someone could point me to a "best practice in cassandra management with
vnodes" article / blog / tuto, It would be very cool.

Else, just a feed back on these points would be appreciated.

An other question :

How to configure rpc_address, using ec2MultiRegionSnitch, I mean should it
be public ip or private ip ? (I have private IP as listen_address and
public IP as broadcast_address. Seeds are public IPs)

We currently use 0.0.0.0 which seems to be incompatible with new CQLs
drivers.

Thanks for any insight on any of these points.

Alain


Re: Consistency level 256

2013-11-19 Thread Ben Hood
Thanks for the heads up - I'll take a look at the driver.

On Tue, Nov 19, 2013 at 9:57 AM, Sylvain Lebresne  wrote:
> 256 is clearly not a valid CL code. It's of course always possible that the
> client sends something perfectly valid and the server interprets it badly
> for some reason, but it's a lot more likely a priori that the driver just
> sends something wrong. In any case, since as far as I know no-one has seen
> that with any other driver, you'd probably want to track that down with the
> gocql authors.
>
> --
> Sylvain
>
>
> On Tue, Nov 19, 2013 at 2:13 AM, Ben Hood <0x6e6...@gmail.com> wrote:
>>
>> I'm not sure that this is entirely causal, but the error I was getting
>> occurred when the batch size I was accumulating was greater than 130K,
>> so by cutting the batch size down, I made the issue go away for now.
>> Having a such a large batch size is probably not such a good idea, but
>> I'm not really sure it that really the cause of this issue.
>>
>> On Tue, Nov 19, 2013 at 12:56 AM, Ben Hood <0x6e6...@gmail.com> wrote:
>> > Hi,
>> >
>> > Using 2.0.2 with the gocql driver, I'm getting this intermittent error:
>> >
>> > "Unknown code 256 for a consistency level"
>> >
>> > Is this something that the server could be returning, or is this maybe
>> > only a client side issue?
>> >
>> > Cheers,
>> >
>> > Ben
>
>


Cassandra and Pig - CQL maps denormalisation

2013-11-19 Thread Ondřej Černoš
Hi all,

I am solving a issue with pig integration with cassandra using CqlLoader. I
don't know exactly if the problem is in CqlLoader, my low understanding of
Pig (I hope this is actually the case) or some bug in the combination of
Pig and CqlLoader. Sorry if this turns out to be rather a Pig question and
not a Cassandra one.

I have a table using cql maps:

CREATE TABLE test (
  name text PRIMARY KEY,
  sources map
)

I need to denormalise the map in order to perform some sanitary checks on
the rest of the DB (outer join using values from the map with another
tables in cassandra keyspace). I want to create triples containing table
key, map key and map value for further joining. The size of the map is
anything between null and tens of records. The table test itself is pretty
small.

This is what I do:

grunt> data = LOAD 'cql://keyspace/test' USING CqlStorage();
grunt> describe data;
data: {name: chararray,sources: ()}
grunt> data1 = filter data by sources is not null;
grunt> dump data1;
(name1,((k1,s1),(k2,s2)))
grunt> data2 = foreach data1 generate name, flatten(sources);
grunt> dump data2;
(name1,(k1,s1),(k2,s2))
grunt> describe data2;
Schema for data2 unknown.
grunt> data3 = FOREACH data2 generate $0 as name, FLATTEN(TOBAG($1..$100));
// I know there will be max tens of records in the map
grunt> dump data3;
(name1,k1,s1)
(name1,k2,s2)
(name1,)
(name1,)
... 95 more lines here ...
grunt> data4 = FILTER data3 BY $1 IS NOT null;
grunt> dump data4;
(name1,k1,s1)
(name1,k2,s2)
grunt> describe data4;
data4: {name: bytearray,bytearray}
grunt> data5 = foreach data4 generate $0, $1;
grunt> dump data5;
(name1,k1)
(name1,k2)
grunt> p = foreach data4 generate $0, $2;
Details at logfile: //pig_xxx.log
>From the log file:
Pig Stack Trace
---
ERROR 1000:
 Out of bound access. Trying to access non-existent
column: 2. Schema name:bytearray,:bytearray has 2 column(s).

org.apache.pig.impl.plan.PlanValidationException: ERROR 1000:
 Out of bound access. Trying to access non-existent
column: 2. Schema name:bytearray,:bytearray has 2 column(s).
at
org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:197)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.setColumnNumberFromAlias(ProjectExpression.java:174)

Considering the schema - no surprise. What is strange is the fact I see the
map values in dump (see dump data4), but I have no way to get them using
pig latin.

I tried to simulate the situation using PigStorage loader. This is the best
I got (not exactly the same, but roughly):

grunt> data = load 'test.csv' using PigStorage(',');
grunt> dump data;
(key1,mk1,mv1,mk2,mv2)
(key2)
(key3,mk1,mv3,mk2,mv4)
grunt> data1 = foreach data generate $0, TOTUPLE($1, $2), TOTUPLE($3, $4);
grunt> dump data1;
(key1,(mk1,mv1),(mk2,mv2))
(key2,(,),(,))
(key3,(mk1,mv3),(mk2,mv4))
grunt> data2 = FOREACH data1 generate $0 as name, FLATTEN(TOBAG($1..$2));
grunt> dump data2;
(key1,mk1,mv1)
(key1,mk2,mv2)
(key2,,)
(key2,,)
(key3,mk1,mv3)
(key3,mk2,mv4)
grunt> describe data2;
data2: {name: bytearray,bytearray,bytearray}

Which is exactly what I need. The only problem is this simulation doesn't
allow me to specify the arbitrary high value in the FLATTEN(TOBAG()) call -
I need to know in advance what is the size of the row.

Questions:

- is this the correct way to denormalize the data? This is a pig question,
but maybe someone will know (I am a pig newbie).
- couln't there be a problem with internal data representation returned
from CqlStorage? See the difference between data loaded from file and these
loaded from cassandra.

Versions: cassandra 1.2.11, Pig 0.12.

Thanks in advance,

Ondrej Cernos


Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

2013-11-19 Thread Steven A Robenalt
Thanks Michael, I will try that out.


On Tue, Nov 19, 2013 at 5:28 AM, Laing, Michael
wrote:

> We had a similar problem when our nodes could not sync using ntp due to
> VPC ACL settings. -ml
>
>
> On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt 
> wrote:
>
>> Hi all,
>>
>> I am attempting to bring up our new app on a 3-node cluster and am having
>> problems with frequent read timeouts and slow inter-node replication.
>> Initially, these errors were mostly occurring in our app server, affecting
>> 0.02%-1.0% of our queries in an otherwise unloaded cluster. No exceptions
>> were logged on the servers in this case, and reads in a single node
>> environment with the same code and client driver virtually never see
>> exceptions like this, so I suspect problems with the inter-cluster
>> communication between nodes.
>>
>> The 3 nodes are deployed in a single AWS VPC, and are all in a common
>> subnet. The Cassandra version is 2.0.2 following an upgrade this past
>> weekend due to NPEs in a secondary index that were affecting certain
>> queries under 2.0.1. The servers are m1.large instances running AWS Linux
>> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
>> All database contents are CQL tables with replication factor of 3, and the
>> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>>
>> In testing with the application, I noticed this afternoon that the
>> contents of the 3 nodes differed in their respective copies of the same
>> table for newly written data, for time periods exceeding several minutes,
>> as reported by cqlsh on each node. Specifying different hosts from the same
>> server using cqlsh also exhibited timeouts on multiple attempts to connect,
>> and on executing some queries, though they eventually succeeded in all
>> cases, and eventually the data in all nodes was fully replicated.
>>
>> The AWS servers have a security group with only ports 22, 7000, 9042, and
>> 9160 open.
>>
>> At this time, it seems that either I am still missing something in my
>> cluster configuration, or maybe there are other ports that are needed for
>> inter-node communication.
>>
>> Any advice/suggestions would be appreciated.
>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>> HighWire | Stanford University
>> 425 Broadway St, Redwood City, CA 94063
>>
>> srobe...@stanford.edu
>> http://highwire.stanford.edu
>>
>>
>>
>>
>>
>>
>


-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu


Re: sstableloader does not support client encryption on Cassandra 2.0?

2013-11-19 Thread Tyler Hobbs
I think this is just an oversight; would you mind opening a ticket here?
https://issues.apache.org/jira/browse/CASSANDRA


On Mon, Nov 18, 2013 at 12:37 PM, David Laube  wrote:

> Hi All,
>
> We have been testing backup/restore from one ring to another and we
> recently stumbled upon an issue with sstableloader. When client_enc_enable:
> true, the exception below is generated. When client_enc_enable is set to
> false, the sstableloader is able to get to the point where it is discovers
> endpoints, connects to stream data, etc.
>
> ==BEGIN EXCEPTION==
>  sstableloader --debug -d x.x.x.248,x.x.x.108,x.x.x.113
> /tmp/import/keyspace_name/columnfamily_name
> Exception in thread "main" java.lang.RuntimeException: Could not retrieve
> endpoint ranges:
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:226)
> at
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:68)
> Caused by: org.apache.thrift.transport.TTransportException: Frame size
> (352518400) larger than max length (16384000)!
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1292)
> at
> org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1280)
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:199)
> ... 2 more
> ==END EXCEPTION==
>
>
> Has anyone seen this before or can someone confirm that SSL/encryption is
> not supported under the open source project and only with d-stax enterprise?
>
> Thanks,
> -David Laube




-- 
Tyler Hobbs
DataStax 


support for nulls in composite lost in CQL3

2013-11-19 Thread Hiller, Dean
We have wide rows which are composite of integer.byte array where some of our 
columns are {empty}.byte array (ie. The first part of the composite key is 
empty as in 0 length string or 0 length integer(ie. NOT 0, but basically null)

This has worked great when we look up all the entries with a empty prefix in 
thrift, but from what I understand, there is no support for this in CQL3?

Or if there is support, how do I get all the values with an empty prefix in 
CQL3?

Thanks,
Dean


Re: sstableloader does not support client encryption on Cassandra 2.0?

2013-11-19 Thread David Laube
Thank you Tyler. I took your advice and I have opened 
https://issues.apache.org/jira/browse/CASSANDRA-6378

Best regards,
-David Laube

On Nov 19, 2013, at 9:51 AM, Tyler Hobbs  wrote:

> I think this is just an oversight; would you mind opening a ticket here? 
> https://issues.apache.org/jira/browse/CASSANDRA
> 
> 
> On Mon, Nov 18, 2013 at 12:37 PM, David Laube  wrote:
> Hi All,
> 
> We have been testing backup/restore from one ring to another and we recently 
> stumbled upon an issue with sstableloader. When client_enc_enable: true, the 
> exception below is generated. When client_enc_enable is set to false, the 
> sstableloader is able to get to the point where it is discovers endpoints, 
> connects to stream data, etc.
> 
> ==BEGIN EXCEPTION==
>  sstableloader --debug -d x.x.x.248,x.x.x.108,x.x.x.113 
> /tmp/import/keyspace_name/columnfamily_name
> Exception in thread "main" java.lang.RuntimeException: Could not retrieve 
> endpoint ranges:
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:226)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:68)
> Caused by: org.apache.thrift.transport.TTransportException: Frame size 
> (352518400) larger than max length (16384000)!
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1292)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1280)
> at 
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:199)
> ... 2 more
> ==END EXCEPTION==
> 
> 
> Has anyone seen this before or can someone confirm that SSL/encryption is not 
> supported under the open source project and only with d-stax enterprise?
> 
> Thanks,
> -David Laube
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax



Re: Managing vnodes cluster and configuring rpc_address with EC2MultiRegionSnitch

2013-11-19 Thread Robert Coli
On Tue, Nov 19, 2013 at 7:24 AM, Alain RODRIGUEZ wrote:

> - Can we add multiple nodes at once (The previous rule was no more than
> one server per range at the same time...)
>

Vnodes are just like a lot of nodes. So no.


> - Can we decommission multiple nodes at the same time (I used to do it
> when not on the same part of the ring - taking replicas into account).
>

This becomes much more difficult/impossible with 256 virtual nodes per
hardware node.


> - Is there a way to distribute smartly heavy operations such as repair or
> cleanup through nodes ? (Using a RF of 3, previously I just use to launch a
> repair or cleanup on one node out of 3 in token order. With this I was sure
> that at least 2 nodes out of 3 weren't performing a repair and were
> available to answer queries fast enough, is there a way to reproduce this
> behavior using vnodes ?)
>

No, but that's a design feature of vnodes, you repair fewer ranges per node
and the repair is spread among more nodes. May or may not be a win in
practice, as I don't use vnodes I don't know!


> If someone could point me to a "best practice in cassandra management with
> vnodes" article / blog / tuto, It would be very cool.
>

I'll let you know when I deploy vnodes and then write it, lol. :)


> How to configure rpc_address, using ec2MultiRegionSnitch, I mean should it
> be public ip or private ip ? (I have private IP as listen_address and
> public IP as broadcast_address. Seeds are public IPs)
>

I don't know the answer, but are you sure you want to use
EC2MultiRegionSnitch?

https://issues.apache.org/jira/browse/CASSANDRA-3810

=Rob


Re: Read inconsistency after backup and restore to different cluster

2013-11-19 Thread Aaron Morton
> we then take the snapshot archive generated FROM cluster-A_node1 and 
> copy/extract/restore TO cluster-B_node1,  then we 
sounds correct.

> Depending on what additional comments/recommendation you or another member of 
> the list may have (if any) based on the clarification I've made above,

Also if you backup the system data it will bring along the tokens. This can be 
a pain if you want to change the cluster name. 

cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/11/2013, at 10:44 am, David Laube  wrote:

> Thank you for the detailed reply Rob!  I have replied to your comments 
> in-line below;
> 
> On Nov 14, 2013, at 1:15 PM, Robert Coli  wrote:
> 
>> On Thu, Nov 14, 2013 at 12:37 PM, David Laube  wrote:
>> It is almost as if the data only exists on some of the nodes, or perhaps the 
>> token ranges are dramatically different --again, we are using vnodes so I am 
>> not exactly sure how this plays into the equation.
>> 
>> The token ranges are dramatically different, due to vnode random token 
>> selection from not setting initial_token, and setting num_tokens.
>> 
>> You can verify this by listing the tokens per physical node in nodetool 
>> gossipinfo or (iirc) nodetool status.
>>  
>> 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five 
>> nodes in the new cluster-B ring.
>> 
>> I don't understand this at all, do you mean that you are using one source 
>> node's data to load each of of the target nodes? Or are you just saying 
>> there's a 1:1 relationship between source snapshots and target nodes to load 
>> into? Unless you have RF=N, using one source for 5 target nodes won't work.
> 
> We have configured RF=3 for the keyspace in question. Also, from a client 
> perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5 
> nodes total in cluster-A, we snapshot keyspace_name on each of the five nodes 
> which results in a snapshot directory on each of the five nodes that we 
> archive and ship off to s3. We then take the snapshot archive generated FROM 
> cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then we take 
> the snapshot archive FROM cluster-A_node2 and copy/extract/restore TO 
> cluster-B_node2 and so on and so forth.
> 
>> 
>> To do what I think you're attempting to do, you have basically two options.
>> 
>> 1) don't use vnodes and do a 1:1 copy of snapshots
>> 2) use vnodes and
>>a) get a list of tokens per node from the source cluster
>>b) put a comma delimited list of these in initial_token in cassandra.yaml 
>> on target nodes
>>c) probably have to un-set num_tokens (this part is unclear to me, you 
>> will have to test..)
>>d) set auto_bootstrap:false in cassandra.yaml
>>e) start target nodes, they will not-bootstrap into the same ranges as 
>> the source cluster
>>f) load schema / copy data into datadir (being careful of 
>> https://issues.apache.org/jira/browse/CASSANDRA-6245)
>>g) restart node or use nodetool refresh (I'd probably restart the node to 
>> avoid the bulk rename that refresh does) to pick up sstables
>>h) remove auto_bootstrap:false from cassandra.yaml
>>
>> I *believe* this *should* work, but have never tried it as I do not 
>> currently run with vnodes. It should work because it basically makes 
>> implicit vnode tokens explicit in the conf file. If it *does* work, I'd 
>> greatly appreciate you sharing details of your experience with the list. 
> 
> I'll start with parsing out the token ranges that our vnode config ends up 
> assigning in cluster-A, and doing some creative config work on the target 
> cluster-B we are trying to restore to as you have suggested. Depending on 
> what additional comments/recommendation you or another member of the list may 
> have (if any) based on the clarification I've made above, I will absolutely 
> report back my findings here.
> 
> 
>> 
>> General reference on tasks of this nature (does not consider vnodes, but 
>> treat vnodes as "just a lot of physical nodes" and it is mostly relevant) : 
>> http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>> 
>> =Rob



Re: making sense of output from Eclipse Memory Analyzer tool taken from .hprof file

2013-11-19 Thread Aaron Morton
What version of cassandra are you using ?
What are the JVM settings? (check with ps aux | grep cassandra)


OOM in cassandra 1.2+ is rare but there is also 
https://issues.apache.org/jira/browse/CASSANDRA-5706 and 
https://issues.apache.org/jira/browse/CASSANDRA-6087

> One instance of "org.apache.cassandra.db.ColumnFamilyStore" loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8" occupies 984,094,664 
> (11.64%) bytes.
938MB is a bit of memory, the CFS and data tracker are dealing with the 
memtable. This may indicate things are not being flushed from memory correctly. 

> •java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) 
> bytes.
> •java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
> •java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
> •java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
> •java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
maybe very big rows and/or very big mutations. 

hope that helps. 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/11/2013, at 12:34 pm, Mike Koh  wrote:

> I am investigating Java Out of memory heap errors. So I created an .hprof 
> file and loaded it into Eclipse Memory Analyzer Tool which gave some "Problem 
> Suspects".
> 
> First one looks like:
> 
> One instance of "org.apache.cassandra.db.ColumnFamilyStore" loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8" occupies 984,094,664 
> (11.64%) bytes. The memory is accumulated in one instance of 
> "org.apache.cassandra.db.DataTracker$View" loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8".
> 
> 
> If I click around into the verbiage, I believe I can pick out the name of a 
> column family but that is about it. Can someone explain what the above means 
> in more detail and if it is indicative of a problem?
> 
> 
> Next one looks like:
> -
> •java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) 
> bytes.
> •java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
> •java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
> •java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
> •java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
> --
> If I click into the verbiage, they above Compaction and Mutations all seem to 
> be referencing the same column family. Are the above related? Is there a way 
> I can tell more exactly what is being compacted and/or mutated more 
> specifically than which column family?



OpsCenter CQL support

2013-11-19 Thread Techy Teck
Does OpsCenter support CF created using CQL? If yes, then is there any
specific version that we need to use for the OpsCenter?

Currently we have OpsCenter in production which doesn't show the tables
created using CQL..


Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

2013-11-19 Thread Steven A Robenalt
It seems that with NTP properly configured, the replication is now working
as expected, but there are still a lot of read timeouts. The
troubleshooting continues...


On Tue, Nov 19, 2013 at 8:53 AM, Steven A Robenalt wrote:

> Thanks Michael, I will try that out.
>
>
> On Tue, Nov 19, 2013 at 5:28 AM, Laing, Michael  > wrote:
>
>> We had a similar problem when our nodes could not sync using ntp due to
>> VPC ACL settings. -ml
>>
>>
>> On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt > > wrote:
>>
>>> Hi all,
>>>
>>> I am attempting to bring up our new app on a 3-node cluster and am
>>> having problems with frequent read timeouts and slow inter-node
>>> replication. Initially, these errors were mostly occurring in our app
>>> server, affecting 0.02%-1.0% of our queries in an otherwise unloaded
>>> cluster. No exceptions were logged on the servers in this case, and reads
>>> in a single node environment with the same code and client driver virtually
>>> never see exceptions like this, so I suspect problems with the
>>> inter-cluster communication between nodes.
>>>
>>> The 3 nodes are deployed in a single AWS VPC, and are all in a common
>>> subnet. The Cassandra version is 2.0.2 following an upgrade this past
>>> weekend due to NPEs in a secondary index that were affecting certain
>>> queries under 2.0.1. The servers are m1.large instances running AWS Linux
>>> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
>>> All database contents are CQL tables with replication factor of 3, and the
>>> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>>>
>>> In testing with the application, I noticed this afternoon that the
>>> contents of the 3 nodes differed in their respective copies of the same
>>> table for newly written data, for time periods exceeding several minutes,
>>> as reported by cqlsh on each node. Specifying different hosts from the same
>>> server using cqlsh also exhibited timeouts on multiple attempts to connect,
>>> and on executing some queries, though they eventually succeeded in all
>>> cases, and eventually the data in all nodes was fully replicated.
>>>
>>> The AWS servers have a security group with only ports 22, 7000, 9042,
>>> and 9160 open.
>>>
>>> At this time, it seems that either I am still missing something in my
>>> cluster configuration, or maybe there are other ports that are needed for
>>> inter-node communication.
>>>
>>> Any advice/suggestions would be appreciated.
>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>> HighWire | Stanford University
>>> 425 Broadway St, Redwood City, CA 94063
>>>
>>> srobe...@stanford.edu
>>> http://highwire.stanford.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobe...@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>


-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu


Re: CassandraStorage problem in Pig

2013-11-19 Thread Michael Spertus
It looks like the problem is that the endpoints are missing:


Schema Version:b0320912-08ca-36fb-91f8-d777452a8483
TokenRange:
TokenRange(start_token:2028301535693519804,
end_token:2029382165996661429, endpoints:[], rpc_endpoints:[],
endpoint_details:[])
TokenRange(start_token:-5630621428241587195,
end_token:-5602444823060522347, endpoints:[], rpc_endpoints:[],
endpoint_details:[])
TokenRange(start_token:8306980276642571252,
end_token:8312066644979158581, endpoints:[], rpc_endpoints:[],
endpoint_details:[])
...


Any thoughts about why no endpoints might be listed?

Thanks,

Mike



On Tue, Nov 19, 2013 at 8:01 AM, Michael Spertus  wrote:

>  I am trying to use CassandraStorage in pig, and I am getting the error
> "Invalid token information returned by describe_ring: {}" whenever I try to
> output to Cassandra. What could I be doing wrong?
>
> Thanks,
>
> Mike
>