Re: Ec2Snitch to Ec2MultiRegionSnitch

2013-04-23 Thread Alain RODRIGUEZ
Hi,these advice are very welcome.

@Dane, about the rack awareness, we use only one rack per DC, so I guess
using EC2MultiRegionSnitch will do just fine and it doesn't need any
configuration. Does it seem right to you. If we are someday interested on
multi racks I will make sure to use them properly. Thank you for this
insight anyway. You are advising me to test it, what would be a good way of
testing it (I can use AWS EC2 instances if needed) ?

@Aaron

"I recommend using the same number of nodes in both DC's."

Why ? I mean we have maybe only 5% of our customers on the us-east zone,
what in C* require to have the same number of node on each DC ?

"Add the nodes (I recommend 6) with auto_bootstrap: false added to the yaml.
update the keyspace replication strategy to add rf:3 for the new DC.
Use nodetool rebuild on the new nodes to rebuild them from the us-west DC. "

What is better on adding nodes with no data and then rebuild them compared
to using the auto_bootstrap ?

"I prefer to use the offset method. Take the 6 tokens from your us-west DC
and add 100 to them for the new dc. "

Any doc on this ? I am not aware of all the possibilities. Why is this the
best method according to you ?

About seeds => "Yes. Have 3 from each."

What is the point of this ?

I didn't thought this change would be that tricky, thank you guys for these
warnings and your help ;)

Alain


2013/4/23 Dane Miller 

> On Thu, Apr 18, 2013 at 7:41 AM, Alain RODRIGUEZ 
> wrote:
> > I am wondering about the process to grow from one data center to a few of
> > them. First thing is we use EC2Snitch for now. So I guess we have to
> switch
> > to Ec2MultiRegionSnitch.
> >
> > c/ I am using the SimpleStrategy. Is it worth it/mandatory to change this
> > strategy when using multiple DC ?
>
> I suggest you thoroughly read the datastax documentation on cassandra
> replication.  The change you are planning is big - make sure to try it
> in a test environment first.  Also, you might find you don't really
> need Cassandra's rack aware feature, and can operate using
> (Gossiping)PropertyFileSnitch.  The rack feature is listed as an
> "anti-pattern" here:
> http://www.datastax.com/docs/1.2/cluster_architecture/anti_patterns
>
> Here are some recent discussions on this list:
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-tp7586272.html
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-tp7481090.html
>
> Dane
>


RE: ordered partitioner

2013-04-23 Thread Desimpel, Ignace
I got into problems on starting a new database. That starts up ok. Then I add a 
keyspace and it goes wrong. The error came from DefsTable.mergeSchema (working 
on version 1.2.1). It starts by fetching the current keyspace and stores it in 
a map of DecoratedKey elements. Then it applies the mutation and fetches all 
the keyspaces again. But now the fetch gets a map of the new OwnDecoratedKey 
elements. So the difference of these elements contains all the 'old' 
DecoratedKey and the 'new' OwnDecoratedKey elements, each kind of DecoratedKey 
contains all the old existing keyspaces. So the the mergeKeyspaces routine 
thinks that it has to recreate the already existing keyspaces like system_auth 
, etc...

Currently in my schema, I don't use indexes, so don't know the effect. Also, 
the change (local of course) I did, was done without really knowing what to do 
(kind of blind replacement) ...
But the one reference in SecondaryIndex was left as it is, since it was using a 
LocalToken object.

What I changed to make it work is :
--
1) Function Memtable:: resolve(DecoratedKey key, ColumnFamily cf, 
SecondaryIndexManager.Updater indexer)
  With line "previous = columnFamilies.putIfAbsent(new DecoratedKey(key.token, 
allocator.clone(key.key)), empty);"
  Changed to "previous = 
columnFamilies.putIfAbsent(StorageService.getPartitioner().decorateKey(allocator.clone(key.key)),
 empty);"
2) Function SSTable:: getMinimalKey(DecoratedKey key)
  With line " new DecoratedKey(key.token, 
HeapAllocator.instance.clone(key.key))"
  Changed to " 
StorageService.getPartitioner().decorateKey(HeapAllocator.instance.clone(key.key))
 "
3) ColumnFamilyStore:: getSSTablesForKey(String key)
  With line "DecoratedKey dk = new 
DecoratedKey(partitioner.getToken(ByteBuffer.wrap(key.getBytes())), 
ByteBuffer.wrap(key.getBytes()));"
  Changed to DecoratedKey dk = 
partitioner.decorateKey(ByteBuffer.wrap(key.getBytes()));


Regards,

Ignace

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: maandag 22 april 2013 20:12
To: user
Subject: Re: ordered partitioner

Not in general, no.  There are places, like indexing, that need to use a local 
partitioner rather than the global one.

Which uses of the DK constructor looked erroneous to you?

On Mon, Apr 22, 2013 at 10:54 AM, Desimpel, Ignace  
wrote:
> Hi,
>
>
>
> I was trying to implement my own ordered partitioner and got into problems.
>
> The current DecoratedKey is using a ByteBufferUtil.compareUnsigned for 
> comparing the key. I was thinking of having a signed comparison, so I 
> thought of making my own DecoratedKey, Token and Partitioner. That way 
> I would have complete control...
>
> So  made a partitioner whith a function decorateKey(...) returning 
> MyDecoratedKey in stead of DecoratedKey
>
> But when making my own MyDecoratedKey, the database get into trouble 
> when adding a key space due to the fact that some code in Cassandra is 
> using the 'new DecoratedKey(...)' statement and is not using the 
> partitioner function decorateKey(...).
>
>
>
> Would it be logical to always call the partitioner function 
> decorateKey such that the creation of an own partitioner and key decoration 
> is possible?
>
>
>
> Ignace Desimpel
>
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


readable (not hex encoded) column names using sstable2json

2013-04-23 Thread Hans Melgers
Hello,

Using Cassandra 1.0.7 sstable2json on some tables I get readable column
names. This leads to problems (java.lang.NumberFormatException: Non-hex
characters in) when importing later.

We're trying to move data over to another cluster but this prevents us
from doing so. Could it have to do with using a custom Serializer?

Here example output:

D:\Java\apache-cassandra-1.0.7\bin>sstable2json
d:\var\lib\cassandra\data2\depsi\ACCOUNT_RECEIVERS-hc-1-Data.db
{
"236964236561393231626331383534313561616133613637333739623038633
9": [["dep.1205050","",1364383456519006]],
"2369642339633830656236636638336534616238386362323830663863643930343
2": [["dep.1057162","",1364383456664000]],
[GOES ON here]

The value "dep.1205050" is literally what we put in there. It's not hex
encoded.

Kind regards,
Hans Melgers





Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Hi Sorin,

The PreparedQueryNotFoundException is not thrown from
Cassandra.Client>>execute_prepared_cql3_query method.  I created some
prepared statements and then re-started cassandra and received the
following exception:

InvalidRequestException(why: Prepared query with ID 1124421588 not found
(either the query was not prepared on this host (maybe the host has been
restarted?) or you have prepared more than 10 queries and queries
1124421588 has been evicted from the internal cache))

The best I have been able to come up with is the following:

try {
client.execute_prepared_cql3_query(psId, bindValues, ..);
} catch (InvalidRequestException invEx) {
String why = invEx.getWhy();
CLogger.logger().warning(why);
if(why.startsWith("Prepared query with ID")) {
rebuildPreparedStatement(preparedStatement);
client.execute_prepared_cql3_query(psId, bindValues,
..);
} else {
throw invEx;
}
}

Obviously this is pretty fragile and would break if the cassandra message
was changed...but it least it works for now!

Cheers,

Stuart


On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache  wrote:

> On 2013-04-19 13:57, Stuart Broad wrote:
>
>> Hi,
>>
>> I am using Cassandra.Client
>> prepare_cql3_query/execute_**prepared_cql3_query to create and run some
>> prepared statements.  It is working well but I am unclear as to how long
>> the server side 'caches' the prepared statements.  Should a prepared
>> statement be prepared for every new Cassandra.Client?  Based on my
>> limited testing it seems like I can create some prepared statements in
>> one Cassandra.Client and use in another but I am not sure how
>> reliable/lasting this is i.e.  If I called the prepared statement again
>> the next day would it still exist?  What about if cassandra was
>> re-started?
>>
>> _Background:_
>>
>> I am creating prepared statements for batch updates of pre-defined
>> lengths (e.g. 1, 1000, 500, 250, 50, 10, 1) and wanted to know if
>> these could just be set up once.  We felt that using the prepared
>> statements was easier than escaping values within a CQL statement and
>> probably more performant.
>>
>> Thanks in advance for your help.
>>
>>
> I've looked in Cassandra's code (v1.2.3). The cache of prepared statements
> has a size of 100,000. So if you prepare more than 100 thousand statements,
> the least recently used ones will vanish. You'll get the exception
> PreparedQueryNotFoundException**, code 0x2500.
>
> Regards,
> Sorin
>
>
>


Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Sylvain Lebresne
In thrift, a lot of exceptions (like PreparedQueryNotFoundException) are
simply returned as InvalidRequestException. The reason for that was a mix
of not wanting to change the thrift API too much and because we didn't knew
how to return a lot of different exception with thrift without making it
horrible to work with. So you'll probably have to parse strings here indeed.

This will be cleaner/less fragile if you use the binary protocol as
exceptions are more fined grained there.

Though taking a step back (and without saying that you shouldn't handle the
case where a query is not prepared on the node you contact), if you're
really considering preparing more than 10 statements, I'd suggest that
it might be worth benchmarking whether using prepared statements in your
case is really going to be worth the trouble. Just saying.

--
Sylvain



On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad  wrote:

> Hi Sorin,
>
> The PreparedQueryNotFoundException is not thrown from
> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
> prepared statements and then re-started cassandra and received the
> following exception:
>
> InvalidRequestException(why: Prepared query with ID 1124421588 not found
> (either the query was not prepared on this host (maybe the host has been
> restarted?) or you have prepared more than 10 queries and queries
> 1124421588 has been evicted from the internal cache))
>
> The best I have been able to come up with is the following:
>
> try {
> client.execute_prepared_cql3_query(psId, bindValues, ..);
> } catch (InvalidRequestException invEx) {
> String why = invEx.getWhy();
> CLogger.logger().warning(why);
> if(why.startsWith("Prepared query with ID")) {
> rebuildPreparedStatement(preparedStatement);
> client.execute_prepared_cql3_query(psId, bindValues,
> ..);
> } else {
> throw invEx;
> }
> }
>
> Obviously this is pretty fragile and would break if the cassandra message
> was changed...but it least it works for now!
>
> Cheers,
>
> Stuart
>
>
> On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache wrote:
>
>> On 2013-04-19 13:57, Stuart Broad wrote:
>>
>>> Hi,
>>>
>>> I am using Cassandra.Client
>>> prepare_cql3_query/execute_**prepared_cql3_query to create and run some
>>> prepared statements.  It is working well but I am unclear as to how long
>>> the server side 'caches' the prepared statements.  Should a prepared
>>> statement be prepared for every new Cassandra.Client?  Based on my
>>> limited testing it seems like I can create some prepared statements in
>>> one Cassandra.Client and use in another but I am not sure how
>>> reliable/lasting this is i.e.  If I called the prepared statement again
>>> the next day would it still exist?  What about if cassandra was
>>> re-started?
>>>
>>> _Background:_
>>>
>>> I am creating prepared statements for batch updates of pre-defined
>>> lengths (e.g. 1, 1000, 500, 250, 50, 10, 1) and wanted to know if
>>> these could just be set up once.  We felt that using the prepared
>>> statements was easier than escaping values within a CQL statement and
>>> probably more performant.
>>>
>>> Thanks in advance for your help.
>>>
>>>
>> I've looked in Cassandra's code (v1.2.3). The cache of prepared
>> statements has a size of 100,000. So if you prepare more than 100 thousand
>> statements, the least recently used ones will vanish. You'll get the
>> exception PreparedQueryNotFoundException**, code 0x2500.
>>
>> Regards,
>> Sorin
>>
>>
>>
>


Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Hi Sylvain,

Thanks for your response.  I am handling the
'PreparedQueryNotFoundException' more for the case of a cassandra re-start
(rather then expecting to build 10 statements).

I am not familiar with the binary protocol - which class/methods should I
look at?

Regards,

Stuart



On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne wrote:

> In thrift, a lot of exceptions (like PreparedQueryNotFoundException) are
> simply returned as InvalidRequestException. The reason for that was a mix
> of not wanting to change the thrift API too much and because we didn't knew
> how to return a lot of different exception with thrift without making it
> horrible to work with. So you'll probably have to parse strings here indeed.
>
> This will be cleaner/less fragile if you use the binary protocol as
> exceptions are more fined grained there.
>
> Though taking a step back (and without saying that you shouldn't handle
> the case where a query is not prepared on the node you contact), if you're
> really considering preparing more than 10 statements, I'd suggest that
> it might be worth benchmarking whether using prepared statements in your
> case is really going to be worth the trouble. Just saying.
>
> --
> Sylvain
>
>
>
> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad wrote:
>
>> Hi Sorin,
>>
>> The PreparedQueryNotFoundException is not thrown from
>> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
>> prepared statements and then re-started cassandra and received the
>> following exception:
>>
>> InvalidRequestException(why: Prepared query with ID 1124421588 not found
>> (either the query was not prepared on this host (maybe the host has been
>> restarted?) or you have prepared more than 10 queries and queries
>> 1124421588 has been evicted from the internal cache))
>>
>> The best I have been able to come up with is the following:
>>
>> try {
>> client.execute_prepared_cql3_query(psId, bindValues, ..);
>> } catch (InvalidRequestException invEx) {
>> String why = invEx.getWhy();
>> CLogger.logger().warning(why);
>> if(why.startsWith("Prepared query with ID")) {
>> rebuildPreparedStatement(preparedStatement);
>> client.execute_prepared_cql3_query(psId, bindValues,
>> ..);
>> } else {
>> throw invEx;
>> }
>> }
>>
>> Obviously this is pretty fragile and would break if the cassandra message
>> was changed...but it least it works for now!
>>
>> Cheers,
>>
>> Stuart
>>
>>
>> On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache wrote:
>>
>>> On 2013-04-19 13:57, Stuart Broad wrote:
>>>
 Hi,

 I am using Cassandra.Client
 prepare_cql3_query/execute_**prepared_cql3_query to create and run some
 prepared statements.  It is working well but I am unclear as to how long
 the server side 'caches' the prepared statements.  Should a prepared
 statement be prepared for every new Cassandra.Client?  Based on my
 limited testing it seems like I can create some prepared statements in
 one Cassandra.Client and use in another but I am not sure how
 reliable/lasting this is i.e.  If I called the prepared statement again
 the next day would it still exist?  What about if cassandra was
 re-started?

 _Background:_

 I am creating prepared statements for batch updates of pre-defined
 lengths (e.g. 1, 1000, 500, 250, 50, 10, 1) and wanted to know if
 these could just be set up once.  We felt that using the prepared
 statements was easier than escaping values within a CQL statement and
 probably more performant.

 Thanks in advance for your help.


>>> I've looked in Cassandra's code (v1.2.3). The cache of prepared
>>> statements has a size of 100,000. So if you prepare more than 100 thousand
>>> statements, the least recently used ones will vanish. You'll get the
>>> exception PreparedQueryNotFoundException**, code 0x2500.
>>>
>>> Regards,
>>> Sorin
>>>
>>>
>>>
>>
>


Plans for CQL3 (non-compact storage) table support in Cassandra's Pig support

2013-04-23 Thread Ondřej Černoš
Hi all,

is there someone on this list knowledgable enough about the plans for
support on non-compact storage tables (
https://issues.apache.org/jira/browse/CASSANDRA-5234) in Cassandra's Pig
support? Currently Pig cannot be used with Cassandra 1.2 and CQL3-only
tables and this hurts a lot (I found blog posts about this problem, a
stackoverflow question and the related
https://issues.apache.org/jira/browse/CASSANDRA-4421 issue has quite a lot
of watchers and voters).

I need to make a decision about our future development efforts and knowing
whether this issue is on the road map or not would help.

regards,
ondřej černoš


Re: Plans for CQL3 (non-compact storage) table support in Cassandra's Pig support

2013-04-23 Thread cscetbon.ext
+1

We're also waiting for this bugfix :(
--
Cyril SCETBON

On Apr 23, 2013, at 2:42 PM, Ondřej Černoš 
mailto:cern...@gmail.com>> wrote:

Hi all,

is there someone on this list knowledgable enough about the plans for support 
on non-compact storage tables 
(https://issues.apache.org/jira/browse/CASSANDRA-5234) in Cassandra's Pig 
support? Currently Pig cannot be used with Cassandra 1.2 and CQL3-only tables 
and this hurts a lot (I found blog posts about this problem, a stackoverflow 
question and the related https://issues.apache.org/jira/browse/CASSANDRA-4421 
issue has quite a lot of watchers and voters).

I need to make a decision about our future development efforts and knowing 
whether this issue is on the road map or not would help.

regards,
ondřej černoš


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Hi all,

I just realised that the binary protocol is the low-level thrift api that I
was originally using (Cassandra.Client>> get / insert ...).  How can a
prepared statement be called through the thrift api (i.e. not the cql
methods)?

Cheers,

Stuart


On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad  wrote:

> Hi Sylvain,
>
> Thanks for your response.  I am handling the
> 'PreparedQueryNotFoundException' more for the case of a cassandra re-start
> (rather then expecting to build 10 statements).
>
> I am not familiar with the binary protocol - which class/methods should I
> look at?
>
> Regards,
>
> Stuart
>
>
>
> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne 
> wrote:
>
>> In thrift, a lot of exceptions (like PreparedQueryNotFoundException) are
>> simply returned as InvalidRequestException. The reason for that was a mix
>> of not wanting to change the thrift API too much and because we didn't knew
>> how to return a lot of different exception with thrift without making it
>> horrible to work with. So you'll probably have to parse strings here indeed.
>>
>> This will be cleaner/less fragile if you use the binary protocol as
>> exceptions are more fined grained there.
>>
>> Though taking a step back (and without saying that you shouldn't handle
>> the case where a query is not prepared on the node you contact), if you're
>> really considering preparing more than 10 statements, I'd suggest that
>> it might be worth benchmarking whether using prepared statements in your
>> case is really going to be worth the trouble. Just saying.
>>
>> --
>> Sylvain
>>
>>
>>
>> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad wrote:
>>
>>> Hi Sorin,
>>>
>>> The PreparedQueryNotFoundException is not thrown from
>>> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
>>> prepared statements and then re-started cassandra and received the
>>> following exception:
>>>
>>> InvalidRequestException(why: Prepared query with ID 1124421588 not found
>>> (either the query was not prepared on this host (maybe the host has been
>>> restarted?) or you have prepared more than 10 queries and queries
>>> 1124421588 has been evicted from the internal cache))
>>>
>>> The best I have been able to come up with is the following:
>>>
>>> try {
>>> client.execute_prepared_cql3_query(psId, bindValues, ..);
>>> } catch (InvalidRequestException invEx) {
>>> String why = invEx.getWhy();
>>> CLogger.logger().warning(why);
>>> if(why.startsWith("Prepared query with ID")) {
>>> rebuildPreparedStatement(preparedStatement);
>>> client.execute_prepared_cql3_query(psId, bindValues,
>>> ..);
>>> } else {
>>> throw invEx;
>>> }
>>> }
>>>
>>> Obviously this is pretty fragile and would break if the cassandra
>>> message was changed...but it least it works for now!
>>>
>>> Cheers,
>>>
>>> Stuart
>>>
>>>
>>> On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache wrote:
>>>
 On 2013-04-19 13:57, Stuart Broad wrote:

> Hi,
>
> I am using Cassandra.Client
> prepare_cql3_query/execute_**prepared_cql3_query to create and run
> some
> prepared statements.  It is working well but I am unclear as to how
> long
> the server side 'caches' the prepared statements.  Should a prepared
> statement be prepared for every new Cassandra.Client?  Based on my
> limited testing it seems like I can create some prepared statements in
> one Cassandra.Client and use in another but I am not sure how
> reliable/lasting this is i.e.  If I called the prepared statement again
> the next day would it still exist?  What about if cassandra was
> re-started?
>
> _Background:_
>
> I am creating prepared statements for batch updates of pre-defined
> lengths (e.g. 1, 1000, 500, 250, 50, 10, 1) and wanted to know if
> these could just be set up once.  We felt that using the prepared
> statements was easier than escaping values within a CQL statement and
> probably more performant.
>
> Thanks in advance for your help.
>
>
 I've looked in Cassandra's code (v1.2.3). The cache of prepared
 statements has a size of 100,000. So if you prepare more than 100 thousand
 statements, the least recently used ones will vanish. You'll get the
 exception PreparedQueryNotFoundException**, code 0x2500.

 Regards,
 Sorin



>>>
>>
>


Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Edward Capriolo
Thrift has a prepare_cql call which returns an ID. Then it has an
exececute_cql call which takes the id and a map or variable bindings.


On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad  wrote:

> Hi all,
>
> I just realised that the binary protocol is the low-level thrift api that
> I was originally using (Cassandra.Client>> get / insert ...).  How can a
> prepared statement be called through the thrift api (i.e. not the cql
> methods)?
>
> Cheers,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:
>
>> Hi Sylvain,
>>
>> Thanks for your response.  I am handling the
>> 'PreparedQueryNotFoundException' more for the case of a cassandra re-start
>> (rather then expecting to build 10 statements).
>>
>> I am not familiar with the binary protocol - which class/methods should I
>> look at?
>>
>> Regards,
>>
>> Stuart
>>
>>
>>
>> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne 
>> wrote:
>>
>>> In thrift, a lot of exceptions (like PreparedQueryNotFoundException) are
>>> simply returned as InvalidRequestException. The reason for that was a mix
>>> of not wanting to change the thrift API too much and because we didn't knew
>>> how to return a lot of different exception with thrift without making it
>>> horrible to work with. So you'll probably have to parse strings here indeed.
>>>
>>> This will be cleaner/less fragile if you use the binary protocol as
>>> exceptions are more fined grained there.
>>>
>>> Though taking a step back (and without saying that you shouldn't handle
>>> the case where a query is not prepared on the node you contact), if you're
>>> really considering preparing more than 10 statements, I'd suggest that
>>> it might be worth benchmarking whether using prepared statements in your
>>> case is really going to be worth the trouble. Just saying.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad wrote:
>>>
 Hi Sorin,

 The PreparedQueryNotFoundException is not thrown from
 Cassandra.Client>>execute_prepared_cql3_query method.  I created some
 prepared statements and then re-started cassandra and received the
 following exception:

 InvalidRequestException(why: Prepared query with ID 1124421588 not
 found (either the query was not prepared on this host (maybe the host has
 been restarted?) or you have prepared more than 10 queries and queries
 1124421588 has been evicted from the internal cache))

 The best I have been able to come up with is the following:

 try {
 client.execute_prepared_cql3_query(psId, bindValues,
 ..);
 } catch (InvalidRequestException invEx) {
 String why = invEx.getWhy();
 CLogger.logger().warning(why);
 if(why.startsWith("Prepared query with ID")) {
 rebuildPreparedStatement(preparedStatement);
 client.execute_prepared_cql3_query(psId,
 bindValues, ..);
 } else {
 throw invEx;
 }
 }

 Obviously this is pretty fragile and would break if the cassandra
 message was changed...but it least it works for now!

 Cheers,

 Stuart


 On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache wrote:

> On 2013-04-19 13:57, Stuart Broad wrote:
>
>> Hi,
>>
>> I am using Cassandra.Client
>> prepare_cql3_query/execute_**prepared_cql3_query to create and run
>> some
>> prepared statements.  It is working well but I am unclear as to how
>> long
>> the server side 'caches' the prepared statements.  Should a prepared
>> statement be prepared for every new Cassandra.Client?  Based on my
>> limited testing it seems like I can create some prepared statements in
>> one Cassandra.Client and use in another but I am not sure how
>> reliable/lasting this is i.e.  If I called the prepared statement
>> again
>> the next day would it still exist?  What about if cassandra was
>> re-started?
>>
>> _Background:_
>>
>> I am creating prepared statements for batch updates of pre-defined
>> lengths (e.g. 1, 1000, 500, 250, 50, 10, 1) and wanted to know if
>> these could just be set up once.  We felt that using the prepared
>> statements was easier than escaping values within a CQL statement and
>> probably more performant.
>>
>> Thanks in advance for your help.
>>
>>
> I've looked in Cassandra's code (v1.2.3). The cache of prepared
> statements has a size of 100,000. So if you prepare more than 100 thousand
> statements, the least recently used ones will vanish. You'll get the
> exception PreparedQueryNotFoundException**, code 0x2500.
>
> Regards,
> Sorin
>
>
>

>>>
>>
>


Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Hi Edward,

Thanks for your reply - I was already using the prepare/execute cql methods
that you suggested.  My problem is that these methods 'mask' the
PreparedQueryNotFoundException as an InvalidRequestException.  At present I
catch the InvalidRequestException (when cassandra has been re-started) and
check the message text to figure out if I need to rebuild the prepared
queries (rather than building each time I call).

Sylvain had suggested that I use the binary protocol as the exceptions are
more explicit so I am trying to determine how this can be done (I don't see
any obvious methods other than the cql ones for calling prepared
statements).

Regards,

Stuart


On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo wrote:

> Thrift has a prepare_cql call which returns an ID. Then it has an
> exececute_cql call which takes the id and a map or variable bindings.
>
>
> On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:
>
>> Hi all,
>>
>> I just realised that the binary protocol is the low-level thrift api that
>> I was originally using (Cassandra.Client>> get / insert ...).  How can a
>> prepared statement be called through the thrift api (i.e. not the cql
>> methods)?
>>
>> Cheers,
>>
>> Stuart
>>
>>
>> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:
>>
>>> Hi Sylvain,
>>>
>>> Thanks for your response.  I am handling the
>>> 'PreparedQueryNotFoundException' more for the case of a cassandra re-start
>>> (rather then expecting to build 10 statements).
>>>
>>> I am not familiar with the binary protocol - which class/methods should
>>> I look at?
>>>
>>> Regards,
>>>
>>> Stuart
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne >> > wrote:
>>>
 In thrift, a lot of exceptions (like PreparedQueryNotFoundException)
 are simply returned as InvalidRequestException. The reason for that was a
 mix of not wanting to change the thrift API too much and because we didn't
 knew how to return a lot of different exception with thrift without making
 it horrible to work with. So you'll probably have to parse strings here
 indeed.

 This will be cleaner/less fragile if you use the binary protocol as
 exceptions are more fined grained there.

 Though taking a step back (and without saying that you shouldn't handle
 the case where a query is not prepared on the node you contact), if you're
 really considering preparing more than 10 statements, I'd suggest that
 it might be worth benchmarking whether using prepared statements in your
 case is really going to be worth the trouble. Just saying.

 --
 Sylvain



 On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad wrote:

> Hi Sorin,
>
> The PreparedQueryNotFoundException is not thrown from
> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
> prepared statements and then re-started cassandra and received the
> following exception:
>
> InvalidRequestException(why: Prepared query with ID 1124421588 not
> found (either the query was not prepared on this host (maybe the host has
> been restarted?) or you have prepared more than 10 queries and queries
> 1124421588 has been evicted from the internal cache))
>
> The best I have been able to come up with is the following:
>
> try {
> client.execute_prepared_cql3_query(psId, bindValues,
> ..);
> } catch (InvalidRequestException invEx) {
> String why = invEx.getWhy();
> CLogger.logger().warning(why);
> if(why.startsWith("Prepared query with ID")) {
> rebuildPreparedStatement(preparedStatement);
> client.execute_prepared_cql3_query(psId,
> bindValues, ..);
> } else {
> throw invEx;
> }
> }
>
> Obviously this is pretty fragile and would break if the cassandra
> message was changed...but it least it works for now!
>
> Cheers,
>
> Stuart
>
>
> On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache wrote:
>
>> On 2013-04-19 13:57, Stuart Broad wrote:
>>
>>> Hi,
>>>
>>> I am using Cassandra.Client
>>> prepare_cql3_query/execute_**prepared_cql3_query to create and run
>>> some
>>> prepared statements.  It is working well but I am unclear as to how
>>> long
>>> the server side 'caches' the prepared statements.  Should a prepared
>>> statement be prepared for every new Cassandra.Client?  Based on my
>>> limited testing it seems like I can create some prepared statements
>>> in
>>> one Cassandra.Client and use in another but I am not sure how
>>> reliable/lasting this is i.e.  If I called the prepared statement
>>> again
>>> the next day would it still exist?  What about if cassandra was
>>> re-started?
>>

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Edward Capriolo
Having to catch the exception and parse it is a bit ugly, however this is
close to what someone might do with an SQLException to determine if the
error was transient etc.  If there is an error code it is possible that it
could be added as an optional property of the InvalidRequestException in
future versions.

Switching to the "binany protocol" is not a method in thrift, it means your
not using thrift at all.




On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad  wrote:

> Hi Edward,
>
> Thanks for your reply - I was already using the prepare/execute cql
> methods that you suggested.  My problem is that these methods 'mask' the
> PreparedQueryNotFoundException as an InvalidRequestException.  At present I
> catch the InvalidRequestException (when cassandra has been re-started) and
> check the message text to figure out if I need to rebuild the prepared
> queries (rather than building each time I call).
>
> Sylvain had suggested that I use the binary protocol as the exceptions are
> more explicit so I am trying to determine how this can be done (I don't see
> any obvious methods other than the cql ones for calling prepared
> statements).
>
> Regards,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo wrote:
>
>> Thrift has a prepare_cql call which returns an ID. Then it has an
>> exececute_cql call which takes the id and a map or variable bindings.
>>
>>
>> On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:
>>
>>> Hi all,
>>>
>>> I just realised that the binary protocol is the low-level thrift api
>>> that I was originally using (Cassandra.Client>> get / insert ...).  How can
>>> a prepared statement be called through the thrift api (i.e. not the cql
>>> methods)?
>>>
>>> Cheers,
>>>
>>> Stuart
>>>
>>>
>>> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:
>>>
 Hi Sylvain,

 Thanks for your response.  I am handling the
 'PreparedQueryNotFoundException' more for the case of a cassandra re-start
 (rather then expecting to build 10 statements).

 I am not familiar with the binary protocol - which class/methods should
 I look at?

 Regards,

 Stuart



 On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne <
 sylv...@datastax.com> wrote:

> In thrift, a lot of exceptions (like PreparedQueryNotFoundException)
> are simply returned as InvalidRequestException. The reason for that was a
> mix of not wanting to change the thrift API too much and because we didn't
> knew how to return a lot of different exception with thrift without making
> it horrible to work with. So you'll probably have to parse strings here
> indeed.
>
> This will be cleaner/less fragile if you use the binary protocol as
> exceptions are more fined grained there.
>
> Though taking a step back (and without saying that you shouldn't
> handle the case where a query is not prepared on the node you contact), if
> you're really considering preparing more than 10 statements, I'd
> suggest that it might be worth benchmarking whether using prepared
> statements in your case is really going to be worth the trouble. Just
> saying.
>
> --
> Sylvain
>
>
>
> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad wrote:
>
>> Hi Sorin,
>>
>> The PreparedQueryNotFoundException is not thrown from
>> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
>> prepared statements and then re-started cassandra and received the
>> following exception:
>>
>> InvalidRequestException(why: Prepared query with ID 1124421588 not
>> found (either the query was not prepared on this host (maybe the host has
>> been restarted?) or you have prepared more than 10 queries and 
>> queries
>> 1124421588 has been evicted from the internal cache))
>>
>> The best I have been able to come up with is the following:
>>
>> try {
>> client.execute_prepared_cql3_query(psId, bindValues,
>> ..);
>> } catch (InvalidRequestException invEx) {
>> String why = invEx.getWhy();
>> CLogger.logger().warning(why);
>> if(why.startsWith("Prepared query with ID")) {
>> rebuildPreparedStatement(preparedStatement);
>> client.execute_prepared_cql3_query(psId,
>> bindValues, ..);
>> } else {
>> throw invEx;
>> }
>> }
>>
>> Obviously this is pretty fragile and would break if the cassandra
>> message was changed...but it least it works for now!
>>
>> Cheers,
>>
>> Stuart
>>
>>
>> On Sun, Apr 21, 2013 at 11:51 AM, Sorin Manolache 
>> wrote:
>>
>>> On 2013-04-19 13:57, Stuart Broad wrote:
>>>
 Hi,

 I am using Cassandra.Client
 p

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Hi Edward,

My understanding was that thrift supports a number of protocols (binary
being one of them).  I don't understand what switching to "binary protocol"
but not using thrift means.  Can you point me to any code examples?

Regards,

Stuart


On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo wrote:

> Having to catch the exception and parse it is a bit ugly, however this is
> close to what someone might do with an SQLException to determine if the
> error was transient etc.  If there is an error code it is possible that it
> could be added as an optional property of the InvalidRequestException in
> future versions.
>
> Switching to the "binany protocol" is not a method in thrift, it means
> your not using thrift at all.
>
>
>
>
> On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad wrote:
>
>> Hi Edward,
>>
>> Thanks for your reply - I was already using the prepare/execute cql
>> methods that you suggested.  My problem is that these methods 'mask' the
>> PreparedQueryNotFoundException as an InvalidRequestException.  At present I
>> catch the InvalidRequestException (when cassandra has been re-started) and
>> check the message text to figure out if I need to rebuild the prepared
>> queries (rather than building each time I call).
>>
>> Sylvain had suggested that I use the binary protocol as the exceptions
>> are more explicit so I am trying to determine how this can be done (I don't
>> see any obvious methods other than the cql ones for calling prepared
>> statements).
>>
>> Regards,
>>
>> Stuart
>>
>>
>> On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo 
>> wrote:
>>
>>> Thrift has a prepare_cql call which returns an ID. Then it has an
>>> exececute_cql call which takes the id and a map or variable bindings.
>>>
>>>
>>> On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:
>>>
 Hi all,

 I just realised that the binary protocol is the low-level thrift api
 that I was originally using (Cassandra.Client>> get / insert ...).  How can
 a prepared statement be called through the thrift api (i.e. not the cql
 methods)?

 Cheers,

 Stuart


 On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:

> Hi Sylvain,
>
> Thanks for your response.  I am handling the
> 'PreparedQueryNotFoundException' more for the case of a cassandra re-start
> (rather then expecting to build 10 statements).
>
> I am not familiar with the binary protocol - which class/methods
> should I look at?
>
> Regards,
>
> Stuart
>
>
>
> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne <
> sylv...@datastax.com> wrote:
>
>> In thrift, a lot of exceptions (like PreparedQueryNotFoundException)
>> are simply returned as InvalidRequestException. The reason for that was a
>> mix of not wanting to change the thrift API too much and because we 
>> didn't
>> knew how to return a lot of different exception with thrift without 
>> making
>> it horrible to work with. So you'll probably have to parse strings here
>> indeed.
>>
>> This will be cleaner/less fragile if you use the binary protocol as
>> exceptions are more fined grained there.
>>
>> Though taking a step back (and without saying that you shouldn't
>> handle the case where a query is not prepared on the node you contact), 
>> if
>> you're really considering preparing more than 10 statements, I'd
>> suggest that it might be worth benchmarking whether using prepared
>> statements in your case is really going to be worth the trouble. Just
>> saying.
>>
>> --
>> Sylvain
>>
>>
>>
>> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad 
>> wrote:
>>
>>> Hi Sorin,
>>>
>>> The PreparedQueryNotFoundException is not thrown from
>>> Cassandra.Client>>execute_prepared_cql3_query method.  I created some
>>> prepared statements and then re-started cassandra and received the
>>> following exception:
>>>
>>> InvalidRequestException(why: Prepared query with ID 1124421588 not
>>> found (either the query was not prepared on this host (maybe the host 
>>> has
>>> been restarted?) or you have prepared more than 10 queries and 
>>> queries
>>> 1124421588 has been evicted from the internal cache))
>>>
>>> The best I have been able to come up with is the following:
>>>
>>> try {
>>> client.execute_prepared_cql3_query(psId, bindValues,
>>> ..);
>>> } catch (InvalidRequestException invEx) {
>>> String why = invEx.getWhy();
>>> CLogger.logger().warning(why);
>>> if(why.startsWith("Prepared query with ID")) {
>>> rebuildPreparedStatement(preparedStatement);
>>> client.execute_prepared_cql3_query(psId,
>>> bindValues, ..);
>>> } else {
>>> 

Re: Advice on memory warning

2013-04-23 Thread Ralph Goers
We are using DSE, which I believe is also 1.1.9.  We have basically had a 
non-usable cluster for months due to this error.  In our case, once it starts 
doing this it starts flushing sstables to disk and eventually fills up the disk 
to the point where it can't compact.  If we catch it soon enough and restart 
the node it usually can recover.

In our case, the heap size is 12 GB. As I understand it Cassandra will give 1/3 
of that for sstables. I then noticed that we have one column family that is 
using nearly 4GB in bloom filters on each node.  Since the nodes will start 
doing this when the heap reaches 9GB we essentially only have 1GB of free 
memory so when compactions, cleanups, etc take place this situation starts 
happening.  We are working to change our data model to try to resolve this.

Ralph 

On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:

> Hello,
> 
> We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle 
> additional load, but to also relieve memory pressure.  It appears to have 
> accomplished both, however, we are still getting a warning, 0-3 times a day, 
> on our database nodes:
> 
> WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) 
> Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically
> 
> This is happening much less frequently than before the upgrade, but after 
> essentially doubling the amount of available memory, I'm curious on what I 
> can do to determine what is happening during this time.  
> 
> I am collecting all the JMX statistics.  Memtable space is elevated but not 
> extraordinarily high.  No GC messages are being output to the log.   
> 
> These warnings do seem to be occurring doing compactions of column families 
> using LCS with wide rows, but I'm not sure there is a direct correlation.
> 
> We are running Cassandra 1.1.9, with a maximum heap of 8G.  
> 
> Any advice?
> Thanks,
> -Mike



Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Sylvain Lebresne
When we speak of "binary protocol", we talk about the protocol introduced
in Cassandra 1.2 that is an alternative to thrift for CQL3. It's a custom,
binary, protocol, that has not link to thrift whatsoever.

That protocol is defined by the document here:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v1.spec;hb=HEAD

Of course, this is just a protocol, and unless you have the time and
willingness to write a proper library using that protocol, you should just
use an existing driver implementing it. If you are using Java (some of your
example above seems to be in Java), then you could for instance pick
https://github.com/datastax/java-driver. If you're not using java, then
well, since said protocol is fairly recent, there isn't an existing driver
for every languages, but a bunch of drivers are in the work.

That being said, I'm not saying you *should* use a driver that uses the
binary protocol, just that at least for exceptions handling, said binary
protocol has a slightly cleaner handling of them than what's available
through thrift. I'll not that even if you do want to use thrift, it's
usually advised to use a high level client rather than raw thrift. Unless
you have no choice or like suffering that is.

--
Sylvain


On Tue, Apr 23, 2013 at 5:38 PM, Stuart Broad  wrote:

> Hi Edward,
>
> My understanding was that thrift supports a number of protocols (binary
> being one of them).  I don't understand what switching to "binary protocol"
> but not using thrift means.  Can you point me to any code examples?
>
> Regards,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo wrote:
>
>> Having to catch the exception and parse it is a bit ugly, however this is
>> close to what someone might do with an SQLException to determine if the
>> error was transient etc.  If there is an error code it is possible that it
>> could be added as an optional property of the InvalidRequestException in
>> future versions.
>>
>> Switching to the "binany protocol" is not a method in thrift, it means
>> your not using thrift at all.
>>
>>
>>
>>
>> On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad wrote:
>>
>>> Hi Edward,
>>>
>>> Thanks for your reply - I was already using the prepare/execute cql
>>> methods that you suggested.  My problem is that these methods 'mask' the
>>> PreparedQueryNotFoundException as an InvalidRequestException.  At present I
>>> catch the InvalidRequestException (when cassandra has been re-started) and
>>> check the message text to figure out if I need to rebuild the prepared
>>> queries (rather than building each time I call).
>>>
>>> Sylvain had suggested that I use the binary protocol as the exceptions
>>> are more explicit so I am trying to determine how this can be done (I don't
>>> see any obvious methods other than the cql ones for calling prepared
>>> statements).
>>>
>>> Regards,
>>>
>>> Stuart
>>>
>>>
>>> On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo 
>>> wrote:
>>>
 Thrift has a prepare_cql call which returns an ID. Then it has an
 exececute_cql call which takes the id and a map or variable bindings.


 On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:

> Hi all,
>
> I just realised that the binary protocol is the low-level thrift api
> that I was originally using (Cassandra.Client>> get / insert ...).  How 
> can
> a prepared statement be called through the thrift api (i.e. not the cql
> methods)?
>
> Cheers,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:
>
>> Hi Sylvain,
>>
>> Thanks for your response.  I am handling the
>> 'PreparedQueryNotFoundException' more for the case of a cassandra 
>> re-start
>> (rather then expecting to build 10 statements).
>>
>> I am not familiar with the binary protocol - which class/methods
>> should I look at?
>>
>> Regards,
>>
>> Stuart
>>
>>
>>
>> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne <
>> sylv...@datastax.com> wrote:
>>
>>> In thrift, a lot of exceptions (like PreparedQueryNotFoundException)
>>> are simply returned as InvalidRequestException. The reason for that was 
>>> a
>>> mix of not wanting to change the thrift API too much and because we 
>>> didn't
>>> knew how to return a lot of different exception with thrift without 
>>> making
>>> it horrible to work with. So you'll probably have to parse strings here
>>> indeed.
>>>
>>> This will be cleaner/less fragile if you use the binary protocol as
>>> exceptions are more fined grained there.
>>>
>>> Though taking a step back (and without saying that you shouldn't
>>> handle the case where a query is not prepared on the node you contact), 
>>> if
>>> you're really considering preparing more than 10 statements, I'd
>>> suggest that it might be worth benchmarking whether using prepar

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Stuart Broad
Aha - got it.  Thanks for everyones help.

I think I will stick with the prepare/execute CQL (with the
InvalidRequestException check) for now.  I will take a look at the  driver
you mentioned though.

Cheers,

Stuart


On Tue, Apr 23, 2013 at 4:55 PM, Sylvain Lebresne wrote:

> When we speak of "binary protocol", we talk about the protocol introduced
> in Cassandra 1.2 that is an alternative to thrift for CQL3. It's a custom,
> binary, protocol, that has not link to thrift whatsoever.
>
> That protocol is defined by the document here:
> https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v1.spec;hb=HEAD
>
> Of course, this is just a protocol, and unless you have the time and
> willingness to write a proper library using that protocol, you should just
> use an existing driver implementing it. If you are using Java (some of your
> example above seems to be in Java), then you could for instance pick
> https://github.com/datastax/java-driver. If you're not using java, then
> well, since said protocol is fairly recent, there isn't an existing driver
> for every languages, but a bunch of drivers are in the work.
>
> That being said, I'm not saying you *should* use a driver that uses the
> binary protocol, just that at least for exceptions handling, said binary
> protocol has a slightly cleaner handling of them than what's available
> through thrift. I'll not that even if you do want to use thrift, it's
> usually advised to use a high level client rather than raw thrift. Unless
> you have no choice or like suffering that is.
>
> --
> Sylvain
>
>
> On Tue, Apr 23, 2013 at 5:38 PM, Stuart Broad  wrote:
>
>> Hi Edward,
>>
>> My understanding was that thrift supports a number of protocols (binary
>> being one of them).  I don't understand what switching to "binary protocol"
>> but not using thrift means.  Can you point me to any code examples?
>>
>> Regards,
>>
>> Stuart
>>
>>
>> On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo 
>> wrote:
>>
>>> Having to catch the exception and parse it is a bit ugly, however this
>>> is close to what someone might do with an SQLException to determine if the
>>> error was transient etc.  If there is an error code it is possible that it
>>> could be added as an optional property of the InvalidRequestException in
>>> future versions.
>>>
>>> Switching to the "binany protocol" is not a method in thrift, it means
>>> your not using thrift at all.
>>>
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad wrote:
>>>
 Hi Edward,

 Thanks for your reply - I was already using the prepare/execute cql
 methods that you suggested.  My problem is that these methods 'mask' the
 PreparedQueryNotFoundException as an InvalidRequestException.  At present I
 catch the InvalidRequestException (when cassandra has been re-started) and
 check the message text to figure out if I need to rebuild the prepared
 queries (rather than building each time I call).

 Sylvain had suggested that I use the binary protocol as the exceptions
 are more explicit so I am trying to determine how this can be done (I don't
 see any obvious methods other than the cql ones for calling prepared
 statements).

 Regards,

 Stuart


 On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo >>> > wrote:

> Thrift has a prepare_cql call which returns an ID. Then it has an
> exececute_cql call which takes the id and a map or variable bindings.
>
>
> On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:
>
>> Hi all,
>>
>> I just realised that the binary protocol is the low-level thrift api
>> that I was originally using (Cassandra.Client>> get / insert ...).  How 
>> can
>> a prepared statement be called through the thrift api (i.e. not the cql
>> methods)?
>>
>> Cheers,
>>
>> Stuart
>>
>>
>> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad 
>> wrote:
>>
>>> Hi Sylvain,
>>>
>>> Thanks for your response.  I am handling the
>>> 'PreparedQueryNotFoundException' more for the case of a cassandra 
>>> re-start
>>> (rather then expecting to build 10 statements).
>>>
>>> I am not familiar with the binary protocol - which class/methods
>>> should I look at?
>>>
>>> Regards,
>>>
>>> Stuart
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne <
>>> sylv...@datastax.com> wrote:
>>>
 In thrift, a lot of exceptions (like
 PreparedQueryNotFoundException) are simply returned as
 InvalidRequestException. The reason for that was a mix of not wanting 
 to
 change the thrift API too much and because we didn't knew how to 
 return a
 lot of different exception with thrift without making it horrible to 
 work
 with. So you'll probably have to parse strings here indeed.

 T

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Hiller, Dean
Out of curiosity, why did cassandra choose to re-invent the wheel instead of 
using something like google protobuf which spans multiple languages?  I see it 
as a step better than thrift since it is really only defining message format 
and has all sorts of goodies with it.  I think you only need to frame it and 
that may exist already as well actually but I can't remember.

Also, it would have made other drivers easier to create as 
serialization/deserialization would already be there since protobuf has all the 
generation done there.

Lastly, does the java-driver have an asynch nature to it at all?  It would be 
nice to be able to call driver.put(myData, myCallbackSuccessHandler); wich 
would return immediately so I can process my next batch of stuff…..extremely 
good in batching of course where I don't need an immediate success response.

Later,
Dean

From: Sylvain Lebresne mailto:sylv...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, April 23, 2013 9:55 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

When we speak of "binary protocol", we talk about the protocol introduced in 
Cassandra 1.2 that is an alternative to thrift for CQL3. It's a custom, binary, 
protocol, that has not link to thrift whatsoever.

That protocol is defined by the document here: 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v1.spec;hb=HEAD

Of course, this is just a protocol, and unless you have the time and 
willingness to write a proper library using that protocol, you should just use 
an existing driver implementing it. If you are using Java (some of your example 
above seems to be in Java), then you could for instance pick 
https://github.com/datastax/java-driver. If you're not using java, then well, 
since said protocol is fairly recent, there isn't an existing driver for every 
languages, but a bunch of drivers are in the work.

That being said, I'm not saying you *should* use a driver that uses the binary 
protocol, just that at least for exceptions handling, said binary protocol has 
a slightly cleaner handling of them than what's available through thrift. I'll 
not that even if you do want to use thrift, it's usually advised to use a high 
level client rather than raw thrift. Unless you have no choice or like 
suffering that is.

--
Sylvain


On Tue, Apr 23, 2013 at 5:38 PM, Stuart Broad 
mailto:stu...@moogsoft.com>> wrote:
Hi Edward,

My understanding was that thrift supports a number of protocols (binary being 
one of them).  I don't understand what switching to "binary protocol" but not 
using thrift means.  Can you point me to any code examples?

Regards,

Stuart


On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:
Having to catch the exception and parse it is a bit ugly, however this is close 
to what someone might do with an SQLException to determine if the error was 
transient etc.  If there is an error code it is possible that it could be added 
as an optional property of the InvalidRequestException in future versions.

Switching to the "binany protocol" is not a method in thrift, it means your not 
using thrift at all.




On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad 
mailto:stu...@moogsoft.com>> wrote:
Hi Edward,

Thanks for your reply - I was already using the prepare/execute cql methods 
that you suggested.  My problem is that these methods 'mask' the 
PreparedQueryNotFoundException as an InvalidRequestException.  At present I 
catch the InvalidRequestException (when cassandra has been re-started) and 
check the message text to figure out if I need to rebuild the prepared queries 
(rather than building each time I call).

Sylvain had suggested that I use the binary protocol as the exceptions are more 
explicit so I am trying to determine how this can be done (I don't see any 
obvious methods other than the cql ones for calling prepared statements).

Regards,

Stuart


On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:
Thrift has a prepare_cql call which returns an ID. Then it has an exececute_cql 
call which takes the id and a map or variable bindings.


On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad 
mailto:stu...@moogsoft.com>> wrote:
Hi all,

I just realised that the binary protocol is the low-level thrift api that I was 
originally using (Cassandra.Client>> get / insert ...).  How can a prepared 
statement be called through the thrift api (i.e. not the cql methods)?

Cheers,

Stuart


On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad 
mailto:stu...@moogsoft.com>> wrote:
Hi Sylvain,

Thanks for your response.  I am handling the 'PreparedQueryNotFoundException' 
more for the case of a cassandra re-start (rather then expecting to build 
10 statements).

I am not famili

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Edward Capriolo
Cassandra has a non thrift protocol called the "native protocol" aka "cql
binary protocol"

http://www.datastax.com/docs/1.2/cql_cli/cql_binary_protocol

It is its own port, with it's own protocol, and it does not have thrift
methods.

In my opinion, switching from the thrift to the native protocol only to
catch the error code is overkill. I bet it would not be impossible to add
the error code to the thrift method somehow, since we have added in the
exception a field about how many replicas an operation succeeded on in the
past. However, it seems now that even though things could be added to
thrift, no one is interested in doing it, or getting the feature added to
thrift lags behind getting it added to cql.


On Tue, Apr 23, 2013 at 11:38 AM, Stuart Broad  wrote:

> Hi Edward,
>
> My understanding was that thrift supports a number of protocols (binary
> being one of them).  I don't understand what switching to "binary protocol"
> but not using thrift means.  Can you point me to any code examples?
>
> Regards,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo wrote:
>
>> Having to catch the exception and parse it is a bit ugly, however this is
>> close to what someone might do with an SQLException to determine if the
>> error was transient etc.  If there is an error code it is possible that it
>> could be added as an optional property of the InvalidRequestException in
>> future versions.
>>
>> Switching to the "binany protocol" is not a method in thrift, it means
>> your not using thrift at all.
>>
>>
>>
>>
>> On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad wrote:
>>
>>> Hi Edward,
>>>
>>> Thanks for your reply - I was already using the prepare/execute cql
>>> methods that you suggested.  My problem is that these methods 'mask' the
>>> PreparedQueryNotFoundException as an InvalidRequestException.  At present I
>>> catch the InvalidRequestException (when cassandra has been re-started) and
>>> check the message text to figure out if I need to rebuild the prepared
>>> queries (rather than building each time I call).
>>>
>>> Sylvain had suggested that I use the binary protocol as the exceptions
>>> are more explicit so I am trying to determine how this can be done (I don't
>>> see any obvious methods other than the cql ones for calling prepared
>>> statements).
>>>
>>> Regards,
>>>
>>> Stuart
>>>
>>>
>>> On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo 
>>> wrote:
>>>
 Thrift has a prepare_cql call which returns an ID. Then it has an
 exececute_cql call which takes the id and a map or variable bindings.


 On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad wrote:

> Hi all,
>
> I just realised that the binary protocol is the low-level thrift api
> that I was originally using (Cassandra.Client>> get / insert ...).  How 
> can
> a prepared statement be called through the thrift api (i.e. not the cql
> methods)?
>
> Cheers,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 11:48 AM, Stuart Broad wrote:
>
>> Hi Sylvain,
>>
>> Thanks for your response.  I am handling the
>> 'PreparedQueryNotFoundException' more for the case of a cassandra 
>> re-start
>> (rather then expecting to build 10 statements).
>>
>> I am not familiar with the binary protocol - which class/methods
>> should I look at?
>>
>> Regards,
>>
>> Stuart
>>
>>
>>
>> On Tue, Apr 23, 2013 at 11:29 AM, Sylvain Lebresne <
>> sylv...@datastax.com> wrote:
>>
>>> In thrift, a lot of exceptions (like PreparedQueryNotFoundException)
>>> are simply returned as InvalidRequestException. The reason for that was 
>>> a
>>> mix of not wanting to change the thrift API too much and because we 
>>> didn't
>>> knew how to return a lot of different exception with thrift without 
>>> making
>>> it horrible to work with. So you'll probably have to parse strings here
>>> indeed.
>>>
>>> This will be cleaner/less fragile if you use the binary protocol as
>>> exceptions are more fined grained there.
>>>
>>> Though taking a step back (and without saying that you shouldn't
>>> handle the case where a query is not prepared on the node you contact), 
>>> if
>>> you're really considering preparing more than 10 statements, I'd
>>> suggest that it might be worth benchmarking whether using prepared
>>> statements in your case is really going to be worth the trouble. Just
>>> saying.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 12:14 PM, Stuart Broad 
>>> wrote:
>>>
 Hi Sorin,

 The PreparedQueryNotFoundException is not thrown from
 Cassandra.Client>>execute_prepared_cql3_query method.  I created some
 prepared statements and then re-started cassandra and received the
 following exception:

 InvalidRequestException(why: Pre

Re: Advice on memory warning

2013-04-23 Thread Haithem Jarraya
We are facing similar issue, and we are not able to have the ring stable.
 We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
The total data ~ 84gb (which is relatively small for C* to handle, with a
RF of 3).  Our application is heavy read, we see the GC complaints in all
nodes, I copied and past the output below.
Also we usually see much larger values for the Pending - ReadStage, not
sure what is the best advice for this.

Thanks,

Haithem

INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119)
GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max
is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line
119) GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used;
max is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line
119) GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136
used; max is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line
53) Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line
68) ReadStage 4 4 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) RequestResponseStage  1 6 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) MutationStage 0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) ReplicateOnWriteStage 0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) GossipStage   0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) MemtablePostFlusher   0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) FlushWriter   0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) commitlog_archiver0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) AntiEntropySessions   0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line
73) CompactionManager 0 0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
85) MessagingServicen/a  15,1
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
95) Cache Type Size Capacity
KeysToSave
Provider
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
96) KeyCache  251658064251658081
   all
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
102) RowCache  00
   all
 org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
109) ColumnFamilyMemtable ops,data
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.local  0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.peers  0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.batchlog   0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.NodeIdInfo 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.LocationInfo   0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.Schema 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.Migrations 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.schema_keyspaces   0,0
 INFO [ScheduledTasks:1] 2013

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Sylvain Lebresne
On Tue, Apr 23, 2013 at 6:02 PM, Hiller, Dean  wrote:

> Out of curiosity, why did cassandra choose to re-invent the wheel instead
> of using something like google protobuf which spans multiple languages?

I see it as a step better than thrift since it is really only defining
> message format and has all sorts of goodies with it.  I think you only need
> to frame it and that may exist already as well actually but I can't
> remember.
>

The serialization/deserialization involved in the binary protocol is not a
big deal tbh, so I guess we chose to avoid the dependency. I personally
don't think using protobufs would have simplified things much in practice,
I don't think there is that much wheel reinventing, and so I'm reasonably
happy to have something tailored to your needs. I'll admit there is some
subjectivity in that opinion however, your mileage may vary.

Lastly, does the java-driver have an asynch nature to it at all?


The java driver is completely asynchronous, from the protocol to it's
implementation, so yes.

 It would be nice to be able to call driver.put(myData,
> myCallbackSuccessHandler);


In case you look at the driver API, its execute method returns a future,
that happens to extends guava's ListenableFuture, and so you can add a
callback/listener through that.

--
Sylvain




> From: Sylvain Lebresne mailto:sylv...@datastax.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Tuesday, April 23, 2013 9:55 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)
>
> When we speak of "binary protocol", we talk about the protocol introduced
> in Cassandra 1.2 that is an alternative to thrift for CQL3. It's a custom,
> binary, protocol, that has not link to thrift whatsoever.
>
> That protocol is defined by the document here:
> https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v1.spec;hb=HEAD
>
> Of course, this is just a protocol, and unless you have the time and
> willingness to write a proper library using that protocol, you should just
> use an existing driver implementing it. If you are using Java (some of your
> example above seems to be in Java), then you could for instance pick
> https://github.com/datastax/java-driver. If you're not using java, then
> well, since said protocol is fairly recent, there isn't an existing driver
> for every languages, but a bunch of drivers are in the work.
>
> That being said, I'm not saying you *should* use a driver that uses the
> binary protocol, just that at least for exceptions handling, said binary
> protocol has a slightly cleaner handling of them than what's available
> through thrift. I'll not that even if you do want to use thrift, it's
> usually advised to use a high level client rather than raw thrift. Unless
> you have no choice or like suffering that is.
>
> --
> Sylvain
>
>
> On Tue, Apr 23, 2013 at 5:38 PM, Stuart Broad  stu...@moogsoft.com>> wrote:
> Hi Edward,
>
> My understanding was that thrift supports a number of protocols (binary
> being one of them).  I don't understand what switching to "binary protocol"
> but not using thrift means.  Can you point me to any code examples?
>
> Regards,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo  > wrote:
> Having to catch the exception and parse it is a bit ugly, however this is
> close to what someone might do with an SQLException to determine if the
> error was transient etc.  If there is an error code it is possible that it
> could be added as an optional property of the InvalidRequestException in
> future versions.
>
> Switching to the "binany protocol" is not a method in thrift, it means
> your not using thrift at all.
>
>
>
>
> On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad  > wrote:
> Hi Edward,
>
> Thanks for your reply - I was already using the prepare/execute cql
> methods that you suggested.  My problem is that these methods 'mask' the
> PreparedQueryNotFoundException as an InvalidRequestException.  At present I
> catch the InvalidRequestException (when cassandra has been re-started) and
> check the message text to figure out if I need to rebuild the prepared
> queries (rather than building each time I call).
>
> Sylvain had suggested that I use the binary protocol as the exceptions are
> more explicit so I am trying to determine how this can be done (I don't see
> any obvious methods other than the cql ones for calling prepared
> statements).
>
> Regards,
>
> Stuart
>
>
> On Tue, Apr 23, 2013 at 4:05 PM, Edward Capriolo  > wrote:
> Thrift has a prepare_cql call which returns an ID. Then it has an
> exececute_cql call which takes the id and a map or variable bindings.
>
>
> On Tue, Apr 23,

Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

2013-04-23 Thread Hiller, Dean
Nice,
Thanks,
Dean

From: Sylvain Lebresne mailto:sylv...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, April 23, 2013 11:31 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

On Tue, Apr 23, 2013 at 6:02 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
Out of curiosity, why did cassandra choose to re-invent the wheel instead of 
using something like google protobuf which spans multiple languages?
I see it as a step better than thrift since it is really only defining message 
format and has all sorts of goodies with it.  I think you only need to frame it 
and that may exist already as well actually but I can't remember.

The serialization/deserialization involved in the binary protocol is not a big 
deal tbh, so I guess we chose to avoid the dependency. I personally don't think 
using protobufs would have simplified things much in practice, I don't think 
there is that much wheel reinventing, and so I'm reasonably happy to have 
something tailored to your needs. I'll admit there is some subjectivity in that 
opinion however, your mileage may vary.

Lastly, does the java-driver have an asynch nature to it at all?

The java driver is completely asynchronous, from the protocol to it's 
implementation, so yes.

 It would be nice to be able to call driver.put(myData, 
myCallbackSuccessHandler);

In case you look at the driver API, its execute method returns a future, that 
happens to extends guava's ListenableFuture, and so you can add a 
callback/listener through that.

--
Sylvain



From: Sylvain Lebresne 
mailto:sylv...@datastax.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Tuesday, April 23, 2013 9:55 AM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: Re: Prepared Statement - cache duration (CQL3 - Cassandra 1.2.4)

When we speak of "binary protocol", we talk about the protocol introduced in 
Cassandra 1.2 that is an alternative to thrift for CQL3. It's a custom, binary, 
protocol, that has not link to thrift whatsoever.

That protocol is defined by the document here: 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol_v1.spec;hb=HEAD

Of course, this is just a protocol, and unless you have the time and 
willingness to write a proper library using that protocol, you should just use 
an existing driver implementing it. If you are using Java (some of your example 
above seems to be in Java), then you could for instance pick 
https://github.com/datastax/java-driver. If you're not using java, then well, 
since said protocol is fairly recent, there isn't an existing driver for every 
languages, but a bunch of drivers are in the work.

That being said, I'm not saying you *should* use a driver that uses the binary 
protocol, just that at least for exceptions handling, said binary protocol has 
a slightly cleaner handling of them than what's available through thrift. I'll 
not that even if you do want to use thrift, it's usually advised to use a high 
level client rather than raw thrift. Unless you have no choice or like 
suffering that is.

--
Sylvain


On Tue, Apr 23, 2013 at 5:38 PM, Stuart Broad 
mailto:stu...@moogsoft.com>>>
 wrote:
Hi Edward,

My understanding was that thrift supports a number of protocols (binary being 
one of them).  I don't understand what switching to "binary protocol" but not 
using thrift means.  Can you point me to any code examples?

Regards,

Stuart


On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>>>
 wrote:
Having to catch the exception and parse it is a bit ugly, however this is close 
to what someone might do with an SQLException to determine if the error was 
transient etc.  If there is an error code it is possible that it could be added 
as an optional property of the InvalidRequestException in future versions.

Switching to the "binany protocol" is not a method in thrift, it means your not 
using thrift at all.




On Tue, Apr 23, 2013 at 11:13 AM, Stuart Broad 
mailto:stu...@moogsoft.com>>>
 wrote:
Hi Edward,

Thanks for your reply - I was already using the prepare/execute cql methods 
that you suggested.  My problem is that these methods 'mask' the 
Prepa

move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Wei Zhu
Hi,
We are trying to upgrade from 1.1.6 to 1.2.4, it's not really a live upgrade. 
We are going to retire the old hardware and bring in a set of new hardware for 
1.2.4. 
For old cluster, we have 5 nodes with RF = 3, total of 1TB data.
For new cluster, we will have 10 nodes with RF = 3. We will use VNodes. What is 
the best way to bring the data from 1.1.6 to 1.2.4? A couple of concerns:
* We also use LCS and plan to increase SSTable size. 

* We use randomPartitioner, we should stick with it, not to mess up 
with murmur3?

Thanks for your feedback.

-Wei

Re: move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Hiller, Dean
We went from 1.1.4 to 1.2.2 and in QA rolling restart failed but in production 
and QA bringing down the whole cluster upgrading every node and then bringing 
it back up worked fine.  We left ours at randompartitioner and had LCS as well. 
 We did not convert to Vnodes at all.  Don't know if it helps at all, but it is 
a similar case I would think.

Dean

From: Wei Zhu mailto:wz1...@yahoo.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, Wei Zhu 
mailto:wz1...@yahoo.com>>
Date: Tuesday, April 23, 2013 12:11 PM
To: Cassandr usergroup 
mailto:user@cassandra.apache.org>>
Subject: move data from Cassandra 1.1.6 to 1.2.4

Hi,
We are trying to upgrade from 1.1.6 to 1.2.4, it's not really a live upgrade. 
We are going to retire the old hardware and bring in a set of new hardware for 
1.2.4.
For old cluster, we have 5 nodes with RF = 3, total of 1TB data.
For new cluster, we will have 10 nodes with RF = 3. We will use VNodes. What is 
the best way to bring the data from 1.1.6 to 1.2.4? A couple of concerns:

 *   We also use LCS and plan to increase SSTable size.
 *   We use randomPartitioner, we should stick with it, not to mess up with 
murmur3?

Thanks for your feedback.

-Wei


Re: move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Wei Zhu
Hi Dean,
It's a bit different case for us. We will have a set of new machines to replace 
the old ones and we want to migrate those data over. I would imagine to do 
something like
* Let new nodes (with VNodes) join the cluster
* decommission the old nodes. (Without VNodes)
Thanks.
-Wei



 From: "Hiller, Dean" 
To: "user@cassandra.apache.org" ; Wei Zhu 
 
Sent: Tuesday, April 23, 2013 11:17 AM
Subject: Re: move data from Cassandra 1.1.6 to 1.2.4
 

We went from 1.1.4 to 1.2.2 and in QA rolling restart failed but in production 
and QA bringing down the whole cluster upgrading every node and then bringing 
it back up worked fine.  We left ours at randompartitioner and had LCS as well. 
 We did not convert to Vnodes at all.  Don't know if it helps at all, but it is 
a similar case I would think.

Dean

From: Wei Zhu mailto:wz1...@yahoo.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, Wei Zhu 
mailto:wz1...@yahoo.com>>
Date: Tuesday, April 23, 2013 12:11 PM
To: Cassandr usergroup 
mailto:user@cassandra.apache.org>>
Subject: move data from Cassandra 1.1.6 to 1.2.4

Hi,
We are trying to upgrade from 1.1.6 to 1.2.4, it's not really a live upgrade. 
We are going to retire the old hardware and bring in a set of new hardware for 
1.2.4.
For old cluster, we have 5 nodes with RF = 3, total of 1TB data.
For new cluster, we will have 10 nodes with RF = 3. We will use VNodes. What is 
the best way to bring the data from 1.1.6 to 1.2.4? A couple of concerns:

*   We also use LCS and plan to increase SSTable size.
*   We use randomPartitioner, we should stick with it, not to mess up with 
murmur3?

Thanks for your feedback.

-Wei

Re: move data from Cassandra 1.1.6 to 1.2.4

2013-04-23 Thread Hiller, Dean
But 1.1.4 does not have Vnodes, right?  In that case, I would baby step it 
doing the upgrade to 1.24 first on the old nodes, and after that is done then 
adding the new nodes in,  and after that is done then decommissioning the old 
nodes…finally I would convert to vnodes….and I would try that all in QA of 
course first.  Some of these steps you can do while the cluster is live like 
adding new nodes.

Dean

From: Wei Zhu mailto:wz1...@yahoo.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, Wei Zhu 
mailto:wz1...@yahoo.com>>
Date: Tuesday, April 23, 2013 12:53 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: move data from Cassandra 1.1.6 to 1.2.4

Hi Dean,
It's a bit different case for us. We will have a set of new machines to replace 
the old ones and we want to migrate those data over. I would imagine to do 
something like

 *   Let new nodes (with VNodes) join the cluster
 *   decommission the old nodes. (Without VNodes)

Thanks.
-Wei


From: "Hiller, Dean" mailto:dean.hil...@nrel.gov>>
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>; Wei Zhu 
mailto:wz1...@yahoo.com>>
Sent: Tuesday, April 23, 2013 11:17 AM
Subject: Re: move data from Cassandra 1.1.6 to 1.2.4

We went from 1.1.4 to 1.2.2 and in QA rolling restart failed but in production 
and QA bringing down the whole cluster upgrading every node and then bringing 
it back up worked fine.  We left ours at randompartitioner and had LCS as well. 
 We did not convert to Vnodes at all.  Don't know if it helps at all, but it is 
a similar case I would think.

Dean

From: Wei Zhu 
mailto:wz1...@yahoo.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>,
 Wei Zhu 
mailto:wz1...@yahoo.com>>>
Date: Tuesday, April 23, 2013 12:11 PM
To: Cassandr usergroup 
mailto:user@cassandra.apache.org>>>
Subject: move data from Cassandra 1.1.6 to 1.2.4

Hi,
We are trying to upgrade from 1.1.6 to 1.2.4, it's not really a live upgrade. 
We are going to retire the old hardware and bring in a set of new hardware for 
1.2.4.
For old cluster, we have 5 nodes with RF = 3, total of 1TB data.
For new cluster, we will have 10 nodes with RF = 3. We will use VNodes. What is 
the best way to bring the data from 1.1.6 to 1.2.4? A couple of concerns:

*  We also use LCS and plan to increase SSTable size.
*  We use randomPartitioner, we should stick with it, not to mess up with 
murmur3?

Thanks for your feedback.

-Wei



Re: How to make compaction run faster?

2013-04-23 Thread Jay Svc
Thanks Aaron,

The parameters I tried above are set one at a time, based on what I
observed, the problem at the core is that "can compaction catch up with
write speed".

I have  gone up to 30,000 to 35000 writes per second. I do not see number
of writes a much issue either. I see compaction is not catching up with a
write speed is an issue, in spite of I have more CPU and memory. Because
over the period of time, we will see growing number of pending compactions,
as write continues, that will degrade my read performance.

Do you think STCS is compaction strategy to speed up compaction? What is a
good approach when we have greater number of reads and writes, so
compaction catch up with the write speed?

Thank you in advance.
Jay




On Sun, Apr 21, 2013 at 1:43 PM, aaron morton wrote:

> You are suggesting to go back to STCS and increase the
> compaction_throughput step by step to see if compaction catch up with write
> traffic?
>
> As a personal approach, when so many config settings are changed it
> becomes impossible to understand cause and effect. So I try to return to a
> known base line and then make changes.
>
> As I watched Disk latency on DSE Opscenter as well as on iostat the await
> is always 35 to 40 ms for longer period of time during the test.
>
> You previously said this was the await on the commit log.
> What is the queue size ?
>
> The problem sounds like IO is not keeping up, moving to STS will reduce
> the IO. Levelled Compaction is designed to reduce the number os SSTables in
> a read, not to do compaction faster.
>
> At some point you may be writing too fast for the nodes. I'm not sure if
> you have discussed the level of writes going through the system. Get
> something that works and then make one change at a time until it does not.
> You should then be able to say "The system can handle X writes of Y size
> per second, but after that compaction cannot keep up."
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/04/2013, at 7:16 AM, Jay Svc  wrote:
>
> Thanks Aaron,
>
> Please find answers to your questions.
>
> 1. I started test with default parameters the compaction is backing up. So
> went for various options.
> 2. The data is on RAID10.
> 3. As I watched Disk latency on DSE Opscenter as well as on iostat the
> await is always 35 to 40 ms for longer period of time during the test.
> (which probably gives me high write latency on client side) Do you think
> this could contribute to slowing down the compaction? probably not..!
>
> So Aaron, I am trying to understand -
> You are suggesting to go back to STCS and increase the
> compaction_throughput step by step to see if compaction catch up with write
> traffic?
>
> Thank you for your inputs.
>
> Regards,
> Jay
>
>
> On Thu, Apr 18, 2013 at 1:52 PM, aaron morton wrote:
>
>> > Parameters used:
>> >   • SSTable size: 500MB (tried various sizes from 20MB to 1GB)
>> >   • Compaction throughput mb per sec: 250MB (tried from 16MB to
>> 640MB)
>> >   • Concurrent write: 196 (tried from 32 to 296)
>> >   • Concurrent compactors: 72 (tried disabling to making it 172)
>> >   • Multithreaded compaction: true (tried both true and false)
>> >   • Compaction strategy: LCS (tried STCS as well)
>> >   • Memtable total space in mb: 4096 MB (tried default and some
>> other params too)
>> I would restore to default settings before I did anything else.
>>
>> > Aaron, Please find the iostat below: the sdb and dm-2 are the commitlog
>> disks.
>> > Please find the iostat of some of 3 different boxes in my cluster.
>>
>> What is the data on ?
>> It's important to call iostat with a period and watch the await / queue
>> size of time. Not just view a snapshot.
>>
>> I would go back to STS with default settings, and ramp up the throughput
>> until compaction cannot keep up. Then increase the throughout and see how
>> that works. Then increase throughput again and see what happens.
>>
>> Cheers
>>
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 19/04/2013, at 5:05 AM, Jay Svc  wrote:
>>
>> > Hi Aaron, Alexis,
>> >
>> > Thanks for reply, Please find some more details below.
>> >
>> > Core problems: Compaction is taking longer time to finish. So it will
>> affect my reads. I have more CPU and memory, want to utilize that to speed
>> up the compaction process.
>> > Parameters used:
>> >   • SSTable size: 500MB (tried various sizes from 20MB to 1GB)
>> >   • Compaction throughput mb per sec: 250MB (tried from 16MB to
>> 640MB)
>> >   • Concurrent write: 196 (tried from 32 to 296)
>> >   • Concurrent compactors: 72 (tried disabling to making it 172)
>> >   • Multithreaded compaction: true (tried both true and false)
>> >   • Compaction strategy: LCS (tried STCS as well)
>> >   • Memtable total space in mb: 4096 MB 

Re: How to make compaction run faster?

2013-04-23 Thread Hiller, Dean
I assume you are trying to maximize your PER node write throughput?  If not 
determining per node throughput, just add more nodes so your nodes can keep up. 
 That is the easiest way.

Finding that sweet spot of per node write throughtput will take some doing.  If 
compaction can't keep up, the real answer may be you have too few nodes.

As a side note you did get me curious.  Aaron, how would one determine the 
current write throughput/node that my cluster could take without compactions 
falling behind?  Would I just write like hell for 4 hours and then measure when 
compactions are completely done and my theoritical limit is the number of 
writes in 4 hours divided by Total time?  Of course this doesn't take into 
affect as my data set size grows compactions will take longer and longer for 
the same amount of writes.  It is a question that would be nice to answer in 
some kind of tool though….is there anything?

Thanks,
Dean

From: Jay Svc mailto:jaytechg...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, April 23, 2013 2:00 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: How to make compaction run faster?

Thanks Aaron,

The parameters I tried above are set one at a time, based on what I observed, 
the problem at the core is that "can compaction catch up with write speed".

I have  gone up to 30,000 to 35000 writes per second. I do not see number of 
writes a much issue either. I see compaction is not catching up with a write 
speed is an issue, in spite of I have more CPU and memory. Because over the 
period of time, we will see growing number of pending compactions, as write 
continues, that will degrade my read performance.

Do you think STCS is compaction strategy to speed up compaction? What is a good 
approach when we have greater number of reads and writes, so compaction catch 
up with the write speed?

Thank you in advance.
Jay




On Sun, Apr 21, 2013 at 1:43 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:
You are suggesting to go back to STCS and increase the compaction_throughput 
step by step to see if compaction catch up with write traffic?
As a personal approach, when so many config settings are changed it becomes 
impossible to understand cause and effect. So I try to return to a known base 
line and then make changes.

As I watched Disk latency on DSE Opscenter as well as on iostat the await is 
always 35 to 40 ms for longer period of time during the test.
You previously said this was the await on the commit log.
What is the queue size ?

The problem sounds like IO is not keeping up, moving to STS will reduce the IO. 
Levelled Compaction is designed to reduce the number os SSTables in a read, not 
to do compaction faster.

At some point you may be writing too fast for the nodes. I'm not sure if you 
have discussed the level of writes going through the system. Get something that 
works and then make one change at a time until it does not. You should then be 
able to say "The system can handle X writes of Y size per second, but after 
that compaction cannot keep up."

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/04/2013, at 7:16 AM, Jay Svc 
mailto:jaytechg...@gmail.com>> wrote:

Thanks Aaron,

Please find answers to your questions.

1. I started test with default parameters the compaction is backing up. So went 
for various options.
2. The data is on RAID10.
3. As I watched Disk latency on DSE Opscenter as well as on iostat the await is 
always 35 to 40 ms for longer period of time during the test. (which probably 
gives me high write latency on client side) Do you think this could contribute 
to slowing down the compaction? probably not..!

So Aaron, I am trying to understand -
You are suggesting to go back to STCS and increase the compaction_throughput 
step by step to see if compaction catch up with write traffic?

Thank you for your inputs.

Regards,
Jay


On Thu, Apr 18, 2013 at 1:52 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:
> Parameters used:
>   • SSTable size: 500MB (tried various sizes from 20MB to 1GB)
>   • Compaction throughput mb per sec: 250MB (tried from 16MB to 640MB)
>   • Concurrent write: 196 (tried from 32 to 296)
>   • Concurrent compactors: 72 (tried disabling to making it 172)
>   • Multithreaded compaction: true (tried both true and false)
>   • Compaction strategy: LCS (tried STCS as well)
>   • Memtable total space in mb: 4096 MB (tried default and some other 
> params too)
I would restore to default settings before I did anything else.

> Aaron, Please find the iostat below: the sdb and dm-2 are the commitlog disks.
> Please find the iostat of some of 3 different boxes in my cluster.

What is the data on ?
It's important to call iostat with a period and watch the await 

Re: Building SSTables using SSTableSimpleUnsortedWriter (v. 1.2.3)

2013-04-23 Thread aaron morton
You should be able to call CompositeType.getInstance(List> 
types) to construct a CompositeType with the appropriate components. Then call 
CompositeType.decompose() with a list of the values for the key, that will get 
you a byte buffer. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/04/2013, at 11:40 AM, David McNelis  wrote:

> I figured that the primary key and how to define it was the issue.  
> 
> What I don't get is how to structure my 
> SSTableSimpleUnsortedWriter.newRow() call to support the CQL3 style composite 
> primary keys.  It takes only a ByteBuffer as an argument... 
> 
> I guess I'm looking for some kind of example of a newRow() through 
> addColumns() example of how to write an SSTable that can be imported to a 
> CQL3 table using the sstableloader.
> 
> For example, should I convert both to a string, concat them with a : and then 
> bytebuffer that string, as if I were inserting a composite column from 
> cassandra-cli?  
> 
> 
> On Sun, Apr 21, 2013 at 3:55 PM, aaron morton  wrote:
> The key to your problem is likely the row key. 
> 
> Take a look in at the table schema / sample data in the cassandra-cli to see 
> how CQL uses composites also 
> http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
> 
> The simple thing to do is use COMPACT STORAGE but that may not suite all use 
> cases http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/04/2013, at 4:36 PM, David McNelis  wrote:
> 
>> Was trying to do a test of writing SSTs for a CQL3 table.  So I created the 
>> following table:
>> 
>> CREATE TABLE test_sst_load (
>>   mykey1 ascii,
>>   mykey2 ascii,
>>   value1 ascii,
>>   PRIMARY KEY (mykey1, mykey2)
>> ) 
>> 
>> I then set up my writer like so: (moved to gist: 
>> https://gist.github.com/dmcnelis/5424756 )
>> 
>> This created my SST files ok and they imported without throwing any sorts of 
>> errors (had -v and --debug on) when using sstableloader.
>> 
>> When I went to query my data in cqlsh, I got an rpc error.  In my system.log 
>> I saw an exception: java.lang.RuntimeException: 
>> java.lang.IllegalArgumentException
>>  (also at the gist above).
>> 
>> I had a feeling that it wouldn't work.. but I can't see a way with the 
>> SSTableSimpleUnsortedWriter (or in the AbstractSSTableWriter) to create an 
>> sstable file that is going to work with the CQL3 tables.  I know its got to 
>> be possible, I can import SSTs with the sstableloader from one cluster to 
>> another, where the tables are CQL3.
>> 
>> What am I missing here?
>> 
>> 
>> 
>> 
> 
> 



Re: loading all rows from cassandra using multiple (python) clients in parallel

2013-04-23 Thread aaron morton
> 
>  EDIT: works after switching to testing against the lastest version of the 
> cassandra database (doh!), and also updating the syntax per notes below:
http://stackoverflow.com/questions/16137944/loading-all-rows-from-cassandra-using-multiple-python-clients-in-parallel

Is this still a problem?

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 12:15 AM, John R. Frank  wrote:

> Cassandra Experts,
> 
> I understand that when using Cassandra's recommended RandomPartitioner (or 
> Murmur3Partitioner), it is not possible to do meaningful range queries on 
> keys, because the rows are distributed around the cluster using the md5 hash 
> of the key.  These hashes are called "tokens."
> 
> Nonetheless, it would be very useful to split up a large table amongst many 
> compute workers by assigning each a range of tokens.  Using CQL3, it appears 
> possible to issue queries directly against the tokens, however the following 
> python does not work:
> 
> http://stackoverflow.com/questions/16137944/loading-all-rows-from-cassandra-using-multiple-python-clients-in-parallel
> 
> I would ideally like to make this work with pycassa, because I prefer its 
> more pythonic interface.
> 
> Am I just not invoking CQL3 correctly through the cql package?
> 
> Is there a better way to do this?
> 
> 
> Thanks for any pointers!
> 
> John
> 
> 
> 
> 



Re: Cassandra + Hadoop - 2 Task attempts with million of rows

2013-04-23 Thread aaron morton
>> Our cluster is evenly partitioned (Murmur3Partitioner)
Murmor3Partitioner is only available in 1.2 and changing partitioners is not 
supported. Did you change from Random Partitioner under 1.1?

Are you using virtual nodes in your 1.2 cluster ? 

>> We have roughly 97million rows in our cluster. Why we are getting above 
>> behavior? Do you have any suggestion or clue to trouble shoot in this issue?
Can you make some of the logs from the tasks available?

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 5:50 AM, Shamim  wrote:

> We are using Hadoop 1.0.3 and pig 0.11.1 version
> 
> -- 
> Best regards
>   Shamim A.
> 
> 22.04.2013, 21:48, "Shamim" :
>> Hello all,
>>   recently we have upgrade our cluster (6 nodes) from cassandra version 
>> 1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner). We 
>> are using pig for parse and compute aggregate data.
>> 
>> When we submit job through pig, what i consistently see is that, while most 
>> of the task have 20-25k row assigned each (Map input records), only 2 of 
>> them (always 2 ) getting more than 2 million rows. This 2 tasks always 
>> complete 100% and hang for long time. Also most of the time we are getting 
>> killed task (2%) with TimeoutException.
>> 
>> We increased rpc_timeout to 6, also set cassandra.input.split.size=1024 
>> but nothing help.
>> 
>> We have roughly 97million rows in our cluster. Why we are getting above 
>> behavior? Do you have any suggestion or clue to trouble shoot in this issue? 
>> Any help will be highly thankful. Thankx in advance.
>> 
>> --
>> Best regards
>>   Shamim A.



Re: com.datastax.driver.core.exceptions.InvalidQueryException using Datastax Java driver

2013-04-23 Thread aaron morton
> Can I insert into Column Family (that I created from CLI mode) using Datastax 
> Java driver or not with Cassandra 1.2.3?
No. 
Create you table using CQL 3 via the cqlsh. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 6:29 AM, Techy Teck  wrote:

> I am using correct keyspace name for that column family. I have verified that 
> as well.
> 
> Can I insert into Column Family (that I created from CLI mode) using Datastax 
> Java driver or not with Cassandra 1.2.3?
> 
> 
> On Mon, Apr 22, 2013 at 5:05 AM, Internet Group  wrote:
> It seems to me that you are not saying the keyspace of your column family 
> 'profile'.
> 
> Regards,
> Francisco.
> 
> 
> On Apr 20, 2013, at 9:56 PM, Techy Teck  wrote:
> 
>> I created my column family like this from the CLI-
>> 
>> 
>> create column family profile
>> with key_validation_class = 'UTF8Type'
>> and comparator = 'UTF8Type'
>> and default_validation_class = 'UTF8Type'
>> and column_metadata = [
>>   {column_name : account, validation_class : 'UTF8Type'}
>>   {column_name : advertising, validation_class : 'UTF8Type'}
>>   {column_name : behavior, validation_class : 'UTF8Type'}
>>   {column_name : info, validation_class : 'UTF8Type'}
>>   ];
>> 
>> 
>> 
>> Now I was trying to insert into this column family using the Datastax Java 
>> driver-
>> 
>> 
>> public void upsertAttributes(final String userId, final Map 
>> attributes) {
>> 
>>  
>> String batchInsert = "INSERT INTO PROFILE(id, account, advertising, 
>> behavior, info) VALUES ( '12345', 'hello11', 'bye2234', 'bye1', 'bye2') "; 
>> 
>> 
>> 
>> 
>>  
>> CassandraDatastaxConnection.getInstance().getSession().execute(batchInsert);
>> 
>> 
>>  }
>> 
>> I always get this exception-
>> 
>> 
>> 
>> com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured 
>> columnfamily profile
>> 
>> 
>> And by this way, I am trying to create connection/session initialization to 
>> Cassandra-
>> 
>> 
>> private CassandraDatastaxConnection() {
>> 
>> 
>>  try{
>>  cluster = Cluster.builder().addContactPoint("localhost").build();
>>  session = cluster.connect("my_keyspace");   
>>  } catch (NoHostAvailableException e) {
>> 
>> 
>> 
>> 
>>  throw new RuntimeException(e);
>> 
>>  }
>> }
>> 
>> I am running Cassandra 1.2.3. And I am able to connect to Cassandra using 
>> the above code. The only problem I am facing is while inserting.
>> 
>> Any idea why it is happening?
>> 
> 
> 



Re: Datastax Java Driver connection issue

2013-04-23 Thread aaron morton
> Just for clarification, why it is necessary to set the server rpc address to 
> 127.0.0.1?
It's not necessary for it to be 127.0.0.1. But it is necessary for the server 
to be listening for client connections (the rpc_address) on the same interface 
/ IP you are trying to connect to. 

In your case the error message said you could not find the server at 127.0.0.1, 
so the simple thing to do is make sure the server is listening there. 

You can set rpc_address to whatever you like (see the yaml) just make sure it's 
the same address you are connecting to. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 7:06 AM, Abhijit Chanda  wrote:

> Aaron,
> 
> Just for clarification, why it is necessary to set the server rpc address to 
> 127.0.0.1?
> 
> 
> On Mon, Apr 22, 2013 at 2:22 AM, aaron morton  wrote:
> Make sure that the server rpc_address is set to 127.0.0.1
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/04/2013, at 1:47 PM, Techy Teck  wrote:
> 
>> I am also running into this problem. I have already enabled 
>> start_native_transport: true
>> 
>> And by this, I am trying to make a connection-
>> 
>> private CassandraDatastaxConnection() {
>> 
>> try{
>> cluster = Cluster.builder().addContactPoint("localhost").build();
>> session = cluster.connect("my_keyspace");
>> } catch (NoHostAvailableException e) {
>> throw new RuntimeException(e);
>> }
>> }
>> 
>> And everytime it gives me the same exception-
>> 
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
>> tried for query failed (tried: [localhost/127.0.0.1])
>> 
>> Any idea how to fix this problem?
>> 
>> Thanks for the help.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Fri, Apr 19, 2013 at 6:41 AM, Abhijit Chanda  
>> wrote:
>> @Gabriel, @Wright: thanks, such a silly of me. 
>> 
>> 
>> On Fri, Apr 19, 2013 at 6:48 PM, Keith Wright  wrote:
>> Did you enable the binary protocol in Cassandra.yaml?
>> 
>> Abhijit Chanda  wrote:
>> 
>> Hi,
>> 
>> I have downloaded the CQL driver provided by Datastax using 
>>
>> com.datastax.cassandra
>> cassandra-driver-core
>> 1.0.0-beta2
>> 
>> 
>> Then tried a sample program to connect to the cluster
>> Cluster cluster = Cluster.builder()
>> .addContactPoints(db1)
>> .withPort(9160)
>> .build();
>> 
>> But sadly its returning 
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
>> tried for query failed   
>> 
>> I am using cassandra 1.2.2
>> 
>> Can any one suggest me whats wrong with that. 
>> 
>> And i am really sorry for posting  datastax java driver related question in 
>> this forum, can't find a better place for the instant reaction 
>> 
>> 
>> -Abhijit
>> 
>> 
>> 
>> -- 
>> -Abhijit
>> 
> 
> 
> 
> 
> -- 
> -Abhijit



Re: Insert into column which is of DateType

2013-04-23 Thread aaron morton
Have you tried to Astyanax example and use the Date override ? 

https://github.com/Netflix/astyanax/wiki/Writing-data
http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/ColumnMutation.html#putValue(java.util.Date,
 java.lang.Integer)

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 3:23 PM, Techy Teck  wrote:

> I created my column family in Cassandra database like this from the CLI-
> 
> create column family PROFILE
> with key_validation_class = 'UTF8Type'
> and comparator = 'UTF8Type'
> and default_validation_class = 'UTF8Type'
> and column_metadata = [
>   {column_name : lmd, validation_class : 'DateType'}
> ];
> 
> 
> Now I was trying to insert into above lmd columns using few of the
> clients like Netflix/Pelops/Datastax
> 
> I am not sure how to insert into columns which is of DateType. I was using 
> the below code to insert
> into lmd column.
> 
> 
> final long LMD = System.currentTimeMillis() / 1000L;
> 
> attrMap.put("lmd", String.valueOf(LMD));
> 
> And everytime, I get exception as -
> 
> (Expected 8 or 0 byte long for date (30)) [my_keyspace][PROFILE][lmd] failed 
> validation
> 
> Is there anything wrong I am doing?
> 
> attrMap is String, String here. And then I am using this map later on to 
> retrieve the column and then populate it into cassandra database



Re: Unable to drop secondary index

2013-04-23 Thread aaron morton
That sounds horrible. 

The log messages seem fine to me. It's handling eventually updating the 
secondary indexes. 

Good luck. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 6:34 PM, Michal Michalski  wrote:

> A little update:
> 
> OK, after ~8 hours of GC madness and compacting node B (the one on which 
> keyspace has disappeared) works fine. No issues noticed so far.
> 
> Node A was started with larger heap and after I turned debugging on I can see 
> it does this:
> 
> DEBUG [MutationStage:110] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:117] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:73] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:108] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:87] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:70] 2013-04-23 06:23:44,893 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:86] 2013-04-23 06:23:44,892 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:86] 2013-04-23 06:23:44,898 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:86] 2013-04-23 06:23:44,898 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:86] 2013-04-23 06:23:44,898 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:74] 2013-04-23 06:23:44,892 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:123] 2013-04-23 06:23:44,891 RowMutationVerbHandler.java 
> (line 40) Applying mutation
> DEBUG [MutationStage:69] 2013-04-23 06:23:44,891 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:67] 2013-04-23 06:23:44,891 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:84] 2013-04-23 06:23:44,891 
> AbstractSimplePerColumnSecondaryIndex.java (line 118) applying index row 1 in 
> ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:false:0@1366633655364522,])
> DEBUG [MutationStage:83] 2013-04-23 06:23:44,890 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscreen_idx 
> [416e64726f69645f48494c4956455f48493453:true:4@1366633655364522,])
> DEBUG [MutationStage:77] 2013-04-23 06:23:44,890 
> AbstractSimplePerColumnSecondaryIndex.java (line 100) removed index entry for 
> cleaned-up value DecoratedKey(1, 
> 01):ColumnFamily(DRDevices.DRDevices_touchscree

Re: 'sstableloader' is not recognized as an internal or external command,

2013-04-23 Thread aaron morton
> Is sstableloader supported in windows, looking at the source it seems to be 
> unix shell file?

Yup. 
If you would like to put together an sstableloader.bat file use the 
sstablekeys.bat file as a template but use 
org.apache.cassandra.tools.BulkLoader and the CASSANDRA_MAIN

If you can get it working please raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA and donate it back to Apache. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 6:37 PM, Viktor Jevdokimov  
wrote:

> If your Cassandra cluster is on Linux, I believe that streaming is not 
> supported in mixed environment, i.e. Cassandra nodes can’t stream between 
> Windows and Linux and sstableloader can’t stream feom Windows to Linux.
>  
> If your Cassandra also on Windows, just try to create bat file for 
> sstableloader using other bat files for example.
> I don’t know if sstableloader will support Windows directory structure.
>  
>  
>  
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
> 
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> Take a ride with Adform's Rich Media Suite
> 
> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
> 
> From: Techy Teck [mailto:comptechge...@gmail.com] 
> Sent: Tuesday, April 23, 2013 09:10
> To: user
> Subject: 'sstableloader' is not recognized as an internal or external command,
>  
> I have bunch of `SSTables` with me that I got from somebody within my team. 
> Now I was trying to push those `SSTABLES` into `Cassandra database`.
> 
> I created corresponding keyspace and column family successfully.
> 
> Now as soon as I execute `SSTableLoader` command, I always get below 
> exception?
> 
> 
> S:\Apache Cassandra\apache-cassandra-1.2.3\bin>sstableloader
> C:\CassandraClient-LnP\20130405\profileks\PROFILECF 'sstableloader' is
> not recognized as an internal or external command, operable program or
> batch file.
> 
> 
> Can anyone tell me what wrong I am doing here? I am running Cassandra 1.2.3. 
> And this is my first time working with `SSTableLoader`. I am working in 
> windows environment.
> 
> Is sstableloader supported in windows, looking at the source it seems to be 
> unix shell file?

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com



Re: How to find total number of rows in Cassandra databaase?

2013-04-23 Thread aaron morton
cassandra-cli has some good online help.

There are no features to count rows as cassandra does not count them, but it 
it's only 1,000 try using list;

You can also see the number of rows by using nodetool cfstats. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 6:50 PM, Techy Teck  wrote:

> Is there any way to see how many rows are there using CLI mode? If I don't 
> want to use CQL mode.
> 
> 
> On Mon, Apr 22, 2013 at 12:13 AM, Nikolay Mihaylov  wrote:
> Hi
> 
> it is very important to know that counting rows is very very very expensive.
> here is my 5 cents - 
> 
> in one of my early projects we made separate column family, with just single 
> row.
> we inserted each row key from other CF on this row as column key.
> 
> then once a day or who, we did get_count().
> 
> however because get_count() became way too slow,
> we have split the keys on several rows - e.g. on 1024 rows.
> it is still way too slow, but we do not need it to be realtime.
> 
> in our "second" project we decided to use cassandra counters.
> however in order to be distinct, we need to read before write.
> this degrade insert performance, so we did special CF with hashesh and other 
> stuff.
> insert performance is still slow. 2 sec or something for 500-600 counters
> (note single insert is OK, but we need to do 500-600 per batch, and 100-200 
> batches per second).
> 
> finally we have researched about probabilistic counters and we decided to use 
> these.
> we also decided to make the project in Python, and we did not do proper tests 
> yes.
> 
> this is our final "take", it uses modified hyper log log, so we do not need 
> to read at all.
> 
> https://github.com/nmmmnu/CubicHyperLogLog
> 
> we tested the library very well, but not with real live data.
> version for Redis is included too for easy testing.
> 
> Nikolay.
> 
> 
> 
> 
> 
> On Mon, Apr 22, 2013 at 2:19 AM, Utkarsh Sengar  wrote:
> Difference b/w cqlsh and cli is documented by the datastax guys here nicely: 
> http://www.datastax.com/support-forums/topic/cli-vs-cql
> 
> Thanks,
> -Utkarsh
> 
> 
> On Sun, Apr 21, 2013 at 1:39 PM, Techy Teck  wrote:
> Yeah it helps a lot. I always have this doubt with me. What is the difference 
> between CLI and CQL?
> 
> 
> 
> On Sun, Apr 21, 2013 at 1:30 PM, Utkarsh Sengar  wrote:
> Using cqlsh you can do:
> 
> SELECT COUNT(*) FROM columnfamily LIMIT 5000;
> 
> Does that help?
> 
> Read more: http://www.datastax.com/docs/1.0/references/cql/SELECT
> 
> Thanks,
> -Utkarsh
> 
> 
> 
> On Sun, Apr 21, 2013 at 1:04 PM, Techy Teck  wrote:
> I have inserted 1000 rows in Cassandra database. Now I am trying to find out 
> how many rows have been inserted in Cassandra database using the CLI mode.
> 
> 
> In rdbms, I can do this sql-
> 
>SELECT count(*) from TABLE;
> 
> And this will give me total count for that table;
> 
> How to do the same thing in Cassandra database?
> 
> I am running Cassandra 1.2.3
> 
> 
> 
> -- 
> Thanks,
> -Utkarsh
> 
> 
> 
> 
> -- 
> Thanks,
> -Utkarsh
> 
> 



Re: Ec2Snitch to Ec2MultiRegionSnitch

2013-04-23 Thread aaron morton
> You are advising me to test it, what would be a good way of testing it (I can 
> use AWS EC2 instances if needed) ?
If you are only using one Available Zone per region then you have only one rack 
per DC and the NetworkTopologyStrategy will do the right thing. 

> Why ? I mean we have maybe only 5% of our customers on the us-east zone, what 
> in C* require to have the same number of node on each DC ?
Because you are going to replicate your data 3 times in each DC so that each DC 
can operate with a LOCAL_QUOURM. 

> What is better on adding nodes with no data and then rebuild them compared to 
> using the auto_bootstrap ?

nodetool rebuild is designed to handle pulling data from another dc, so you can 
use it when the local DC does not contain data. i.e. you do not want a node in 
the new DC bootstrapping from other nodes in the new DC, they have no data. 
  
> Any doc on this ? I am not aware of all the possibilities. Why is this the 
> best method according to you ?
http://wiki.apache.org/cassandra/Operations?highlight=%28token%29#Token_selection
http://www.datastax.com/docs/1.2/initialize/token_generation

Cause it's easier to understand than interleaving the nodes and works with 2+ 
DC's. 

> What is the point of this ?
http://wiki.apache.org/cassandra/FAQ#seed

> I didn't thought this change would be that tricky, thank you guys for these 
> warnings and your help ;)
Yup, this is a lot of work. 

Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 7:26 PM, Alain RODRIGUEZ  wrote:

> Hi,these advice are very welcome.
> 
> @Dane, about the rack awareness, we use only one rack per DC, so I guess 
> using EC2MultiRegionSnitch will do just fine and it doesn't need any 
> configuration. Does it seem right to you. If we are someday interested on 
> multi racks I will make sure to use them properly. Thank you for this insight 
> anyway. You are advising me to test it, what would be a good way of testing 
> it (I can use AWS EC2 instances if needed) ?
> 
> @Aaron
> 
> "I recommend using the same number of nodes in both DC's."
> 
> Why ? I mean we have maybe only 5% of our customers on the us-east zone, what 
> in C* require to have the same number of node on each DC ?
> 
> "Add the nodes (I recommend 6) with auto_bootstrap: false added to the yaml.
> update the keyspace replication strategy to add rf:3 for the new DC. 
> Use nodetool rebuild on the new nodes to rebuild them from the us-west DC. "
> 
> What is better on adding nodes with no data and then rebuild them compared to 
> using the auto_bootstrap ?
> 
> "I prefer to use the offset method. Take the 6 tokens from your us-west DC 
> and add 100 to them for the new dc. "
> 
> Any doc on this ? I am not aware of all the possibilities. Why is this the 
> best method according to you ?
> 
> About seeds => "Yes. Have 3 from each."
> 
> What is the point of this ?
> 
> I didn't thought this change would be that tricky, thank you guys for these 
> warnings and your help ;)
> 
> Alain
> 
> 
> 2013/4/23 Dane Miller 
> On Thu, Apr 18, 2013 at 7:41 AM, Alain RODRIGUEZ  wrote:
> > I am wondering about the process to grow from one data center to a few of
> > them. First thing is we use EC2Snitch for now. So I guess we have to switch
> > to Ec2MultiRegionSnitch.
> >
> > c/ I am using the SimpleStrategy. Is it worth it/mandatory to change this
> > strategy when using multiple DC ?
> 
> I suggest you thoroughly read the datastax documentation on cassandra
> replication.  The change you are planning is big - make sure to try it
> in a test environment first.  Also, you might find you don't really
> need Cassandra's rack aware feature, and can operate using
> (Gossiping)PropertyFileSnitch.  The rack feature is listed as an
> "anti-pattern" here:
> http://www.datastax.com/docs/1.2/cluster_architecture/anti_patterns
> 
> Here are some recent discussions on this list:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-tp7586272.html
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-tp7481090.html
> 
> Dane
> 



Re: readable (not hex encoded) column names using sstable2json

2013-04-23 Thread aaron morton
What the CF definition ?
What are the errors you are getting?

> We're trying to move data over to another cluster but this prevents us from 
> doing so. 
Is there a reason you are converting the SSTables to JSON ? 
You could just copy the sstables. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 9:37 PM, Hans Melgers  wrote:

> Hello,
> 
> Using Cassandra 1.0.7 sstable2json on some tables I get readable column
> names. This leads to problems (java.lang.NumberFormatException: Non-hex
> characters in) when importing later.
> 
> We're trying to move data over to another cluster but this prevents us
> from doing so. Could it have to do with using a custom Serializer?
> 
> Here example output:
> 
> D:\Java\apache-cassandra-1.0.7\bin>sstable2json
> d:\var\lib\cassandra\data2\depsi\ACCOUNT_RECEIVERS-hc-1-Data.db
> {
> "236964236561393231626331383534313561616133613637333739623038633
> 9": [["dep.1205050","",1364383456519006]],
> "2369642339633830656236636638336534616238386362323830663863643930343
> 2": [["dep.1057162","",1364383456664000]],
> [GOES ON here]
> 
> The value "dep.1205050" is literally what we put in there. It's not hex
> encoded.
> 
> Kind regards,
> Hans Melgers
> 
> 
> 



Re: Ec2Snitch to Ec2MultiRegionSnitch

2013-04-23 Thread Alain RODRIGUEZ
"If you are only using one Available Zone per region then you have only one
rack per DC and the NetworkTopologyStrategy will do the right thing."

So you mean this part doesn't need more testing ? This will work for sure ?
Did you already did it yourself ?

"Because you are going to replicate your data 3 times in each DC so that
each DC can operate with a LOCAL_QUOURM"
Yet I don't get it. Tell me where I am wrong. LOCAL_QUORUM need to
read/write 2 nodes (since RF = 3) per region. So if I use 6 eu-west and 3
us-east, C* will be able to reach the LOCAL_QUORUM everywhere, won't it ?
So why should I use 6 + 6 servers ?

"nodetool rebuild is designed to handle pulling data from another dc, so
you can use it when the local DC does not contain data. i.e. you do not
want a node in the new DC bootstrapping from other nodes in the new DC,
they have no data"
Good to know, thanks about it, as about all your pointers to the doc.

"Cause it's easier to understand than interleaving the nodes and works with
2+ DC's."
Good point.

If your are interested, I'll let you know how all the things go when we'll
add the second DC.


2013/4/24 aaron morton 

> > You are advising me to test it, what would be a good way of testing it
> (I can use AWS EC2 instances if needed) ?
> If you are only using one Available Zone per region then you have only one
> rack per DC and the NetworkTopologyStrategy will do the right thing.
>
> > Why ? I mean we have maybe only 5% of our customers on the us-east zone,
> what in C* require to have the same number of node on each DC ?
> Because you are going to replicate your data 3 times in each DC so that
> each DC can operate with a LOCAL_QUOURM.
>
> > What is better on adding nodes with no data and then rebuild them
> compared to using the auto_bootstrap ?
>
> nodetool rebuild is designed to handle pulling data from another dc, so
> you can use it when the local DC does not contain data. i.e. you do not
> want a node in the new DC bootstrapping from other nodes in the new DC,
> they have no data.
>
> > Any doc on this ? I am not aware of all the possibilities. Why is this
> the best method according to you ?
>
> http://wiki.apache.org/cassandra/Operations?highlight=%28token%29#Token_selection
> http://www.datastax.com/docs/1.2/initialize/token_generation
>
> Cause it's easier to understand than interleaving the nodes and works with
> 2+ DC's.
>
> > What is the point of this ?
> http://wiki.apache.org/cassandra/FAQ#seed
>
> > I didn't thought this change would be that tricky, thank you guys for
> these warnings and your help ;)
> Yup, this is a lot of work.
>
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/04/2013, at 7:26 PM, Alain RODRIGUEZ  wrote:
>
> > Hi,these advice are very welcome.
> >
> > @Dane, about the rack awareness, we use only one rack per DC, so I guess
> using EC2MultiRegionSnitch will do just fine and it doesn't need any
> configuration. Does it seem right to you. If we are someday interested on
> multi racks I will make sure to use them properly. Thank you for this
> insight anyway. You are advising me to test it, what would be a good way of
> testing it (I can use AWS EC2 instances if needed) ?
> >
> > @Aaron
> >
> > "I recommend using the same number of nodes in both DC's."
> >
> > Why ? I mean we have maybe only 5% of our customers on the us-east zone,
> what in C* require to have the same number of node on each DC ?
> >
> > "Add the nodes (I recommend 6) with auto_bootstrap: false added to the
> yaml.
> > update the keyspace replication strategy to add rf:3 for the new DC.
> > Use nodetool rebuild on the new nodes to rebuild them from the us-west
> DC. "
> >
> > What is better on adding nodes with no data and then rebuild them
> compared to using the auto_bootstrap ?
> >
> > "I prefer to use the offset method. Take the 6 tokens from your us-west
> DC and add 100 to them for the new dc. "
> >
> > Any doc on this ? I am not aware of all the possibilities. Why is this
> the best method according to you ?
> >
> > About seeds => "Yes. Have 3 from each."
> >
> > What is the point of this ?
> >
> > I didn't thought this change would be that tricky, thank you guys for
> these warnings and your help ;)
> >
> > Alain
> >
> >
> > 2013/4/23 Dane Miller 
> > On Thu, Apr 18, 2013 at 7:41 AM, Alain RODRIGUEZ 
> wrote:
> > > I am wondering about the process to grow from one data center to a few
> of
> > > them. First thing is we use EC2Snitch for now. So I guess we have to
> switch
> > > to Ec2MultiRegionSnitch.
> > >
> > > c/ I am using the SimpleStrategy. Is it worth it/mandatory to change
> this
> > > strategy when using multiple DC ?
> >
> > I suggest you thoroughly read the datastax documentation on cassandra
> > replication.  The change you are planning is big - make sure to try it
> > in a test environment first.  Also, you might find you don't really
> > need Cassandra's rack aware featu

Re:Cassandra + Hadoop - 2 Task attempts with million of rows

2013-04-23 Thread Shamim
Hello Aron,
We have build up our new cluster from the scratch with version 1.2 - partition 
murmor3. We are not using vnodes at all. 
Actually log is clean and nothing serious, now investigating logs and post soon 
if found something criminal

 >>> Our cluster is evenly partitioned (Murmur3Partitioner) > > 
 >>> Murmor3Partitioner is only available in 1.2 and changing partitioners is 
 >>> not supported. Did you change from Random Partitioner under 1.1? > > Are 
 >>> you using virtual nodes in your 1.2 cluster ? > >>> We have roughly 
 >>> 97million rows in our cluster. Why we are getting above behavior? Do you 
 >>> have any suggestion or clue to trouble shoot in this issue? > > Can you 
 >>> make some of the logs from the tasks available? > > Cheers > > --
--- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > 
> @aaronmorton > http://www.thelastpickle.com > > On 23/04/2013, at 5:50 AM, 
Shamim  wrote: > >> We are using Hadoop 1.0.3 and pig 0.11.1 version >> >> -- 
>> Best regards >> Shamim A. >> >> 22.04.2013, 21:48, "Shamim" : >> >>> Hello 
all, >>> recently we have upgrade our cluster (6 nodes) from cassandra version 
1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner). We are 
using pig for parse and compute aggregate data. >>> >>> When we submit job 
through pig, what i consistently see is that, while most of the task have 
20-25k row assigned each (Map input records), only 2 of them (always 2 ) 
getting more than 2 million rows. This 2 tasks always complete 100% and hang 
for long time. Also most of the time we are getting killed task (2%) with 
TimeoutException. >>> >>> We increased rpc_timeout to 6, also set 
cassandra.input.split.size=1024 but nothing help. >>> >>> We have roughly 
97million rows in our cluster. Why we are getting above behavior? Do you have 
any suggestion or clue to trouble shoot in this issue? Any help will be highly 
thankful. Thankx in advance. >>> >>> -- >>> Best regards >>> Shamim A. -- Best 
regards
  Shamim A.