How is null handled in terms of storage when using static schemas?

2014-06-20 Thread Kevin Burton
Let's say we have a table like with just an integer primary key named ID and a text column named VALUE… if we set value to 0, "hello world" … obviously , that's a normal value. However, what happens if we update it with 0, null … how is the 'null' stored? I couldn't find any documentation fo

Use Cassnadra thrift API with collection type

2014-06-20 Thread Huiliang Zhang
Hi, I have a problem when insert data of the map type into a cassandra table. I tried all kinds of MapSerializer to serialize the Map data and did not succeed. My code is like this: Column column = new Column(); column.name=columnSerializer.toByteBuffer(colname); // the co

output interpretation of cassandra-stress

2014-06-20 Thread Senhua Huang
Hi all, I have a quick question on the unit of the latency in the output of cassandra-stress: is it milli-second or second? I cannot find the answer in the documentation: http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsCStressOutput_c.html Thanks, Senhua

Re: Using Cassandra as cache

2014-06-20 Thread Robert Coli
On Fri, Jun 20, 2014 at 3:09 PM, DuyHai Doan wrote: > @Robert: do we still need to cleanup manually snapshot when truncating ? I remembered that on the 1.2 branch, even though the auto_snapshot param was set to false, truncating leads to snapshot creation that forced us to manually remove the snap

Re: Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Thanks, Is there any code way to know when the scheme finished to settle down? Can working RF=2 and CL=ANY result in any problem with consistency? I am not sure I can have problems with consistency if I don't do updates, only writes and reads. Can I? By the way I am using Cassandra 2.0.8. Pavel

Re: Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Thanks Robert, Can you please explain what problems DROP/CREATE keyspace may cause? Seems like truncate working per column family and I have up to 10. What I should I delete from disk in that case? I can't delete whole folder right? I need to delete all content under each cf folder, but not folder

Re: Using Cassandra as cache

2014-06-20 Thread DuyHai Doan
Schema propagation takes times: https://issues.apache.org/jira/browse/CASSANDRA-5725 @Robert: do we still need to cleanup manually snapshot when truncating ? I remembered that on the 1.2 branch, even though the auto_snapshot param was set to false, truncating leads to snapshot creation that forced

Re: Using Cassandra as cache

2014-06-20 Thread Robert Coli
On Fri, Jun 20, 2014 at 2:48 PM, Pavel Kogan wrote: > So what we did is creating every hour new keyspace named _MM_dd_HH and > when disk becomes full, script running in crontrab on each node drops > keyspace with "IF EXISTS" flag, and deletes whole keyspace folder. That way > whole process is

Re: Using Cassandra as cache

2014-06-20 Thread Robert Stupp
Am 20.06.2014 um 23:48 schrieb Pavel Kogan : > 1) When new keyspace with its columnfamilies is being just created (every > round hour), sometimes other modules failed to read/write data, and we lose > request. Can it be that creation of keyspace and columnfamilies is async > operation or there

Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Hi, In our project, many distributed modules sending each other binary blobs, up to 100-200kb each in average. Small JSONs are being sent over message queue, while Cassandra is being used as temporary storage for blobs. We are using Cassandra instead of in memory distributed cache like Couch due t

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Ok, in my case it was straightforward. It is just warning, which however says that batches with large data size (above 5Kb) can sometimes lead to node instability (why?). This limit seems to be hard-coded, I didn't find anyway to configure it externally. Anyway, removing batch and giving up atomici

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Logged batch. On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan wrote: > I think some figures from "nodetool tpstats" and "nodetool > compactionstats" may help seeing clearer > > And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED > batch ? > > > > > > On Fri, Jun 20, 2014 at 8:02

Re: Best way to do a multi_get using CQL

2014-06-20 Thread DuyHai Doan
"The bad design part (just my opinion, no intention to offend) is not allow the possibility of sending batches directly to the data nodes, without using a coordinator." Well it's normal that it's not possible. What is a batch ? It's a bunch of insert/update/delete statements put together. Now e

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jonathan Haddad
I forgot to add that each connection can handle multiple simultaneous queries. This was part of the original protocol as of C* 1.2: http://www.datastax.com/dev/blog/binary-protocol Asynchronous: each connection can handle more than one active request at the same time. In practice, this means that

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
There is nothing preventing that in Cassandra, it's just a matter of how intelligent the driver API is. Submit a feature request to Astyanax or Datastax driver projects. On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > The bad design part (just my opin

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
The bad design part (just my opinion, no intention to offend) is not allow the possibility of sending batches directly to the data nodes, without using a coordinator. I would choose that option. []s 2014-06-20 16:05 GMT-03:00 DuyHai Doan : > Well it's kind of a trade-off. > > Either you send da

Re: Best way to do a multi_get using CQL

2014-06-20 Thread DuyHai Doan
Well it's kind of a trade-off. Either you send data directly to the primary replica nodes to take advantage of data-locality using token-aware strategy and the price to pay is a high number of opened connections from client side. Or you just batch data to a random node playing the coordinator ro

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me,

Re: Custom snitch classpath?

2014-06-20 Thread Marcelo Elias Del Valle
This is nice! I was looking for something like this to implement a multi DC cluster between OVh and Amazon. Thanks for sharing! []s 2014-06-20 15:35 GMT-03:00 Jeremy Jongsma : > Sharing in case anyone else wants to use this: > > > https://github.com/barchart/cassandra-plugins/blob/master/src/mai

Re: Custom snitch classpath?

2014-06-20 Thread Jeremy Jongsma
Sharing in case anyone else wants to use this: https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java Basically it is a proxy that attempts to use GossipingPropertyFileSnitch, and it that fails

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread DuyHai Doan
I think some figures from "nodetool tpstats" and "nodetool compactionstats" may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > If you have 32 Gb RAM

Re: Custom snitch classpath?

2014-06-20 Thread Tyler Hobbs
The lib directory (where all the other jars are). bin/cassandra.in.sh does this: for jar in "$CASSANDRA_HOME"/lib/*.jar; do CLASSPATH="$CLASSPATH:$jar" done On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma wrote: > Where do I add my custom snitch JAR to the Cassandra classpath so I can >

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Marcelo Elias Del Valle
If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? I

Custom snitch classpath?

2014-06-20 Thread Jeremy Jongsma
Where do I add my custom snitch JAR to the Cassandra classpath so I can use it?

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle < marc...@s1mbi0

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM,

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Robert Coli
On Fri, Jun 20, 2014 at 2:39 AM, Simon Chemouil wrote: > OK, so Cassandra 2.1 now rejects writes it considers too big. It is > possible to increase the value by changing commitlog_segment_size_in_mb > in cassandra.yaml. It defaults to 32MB, and the maximum segment size for > a write is half that

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Marcelo Elias Del Valle
Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really a

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete al

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread DuyHai Doan
Thanks Simon for the info. I didn't know that the maximum payload size is related to commit log config, interesting ... On Fri, Jun 20, 2014 at 11:39 AM, Simon Chemouil wrote: > OK, so Cassandra 2.1 now rejects writes it considers too big. It is > possible to increase the value by changing comm

Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Va

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Laing, Michael
However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back

Re: Best practices for repair

2014-06-20 Thread Paolo Crosato
Thank you very much, I recompiled it with 2.0 and it works well, now I will try to figure out which granularity works better. Your example was really a boost, thanks again! Regards, Paolo Il 19/06/2014 22:42, Paulo Ricardo Motta Gomes ha scritto: Hello Paolo, I just published an open source

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Simon Chemouil
OK, so Cassandra 2.1 now rejects writes it considers too big. It is possible to increase the value by changing commitlog_segment_size_in_mb in cassandra.yaml. It defaults to 32MB, and the maximum segment size for a write is half that value: from CommitLog.java: // we only permit records HALF the s

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Simon Chemouil
For the record, I could reproduce the problem with blobs of size below 64MB. Caused by: java.lang.IllegalArgumentException: Mutation of 32000122 bytes is too large for the maxiumum size of 16777216 32000122 is just ~30MB and fails on 2.1-rc1 while it works on 2.0.X for even larger values (up to 6

Re: Sending BLOBs to Cassandra +

2014-06-20 Thread Simon Chemouil
So looks like I was sending more than I expected. Still the question stands: is CQL the best way to send BLOBs? Are there any remote operations available on BLOBs? Thanks, Simon Le 20/06/2014 10:03, Simon Chemouil a écrit : > Hi, > > I read in Cassandra's FAQ that it is fine with BLOBs up to 64M

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Simon Chemouil
Le 20/06/2014 10:41, Duncan Sands a écrit : > Hi Simon, > 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Turns out there was a mistake in my code. The blob in this case was actually 122MB! Still the same code works fine on Cassandra 2.0.x so there might be a bug lurkin

Re: Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Duncan Sands
Hi Simon, On 20/06/14 10:18, Simon Chemouil wrote: Hi, When I am sending BLOBs _below_ the max query size (blob size=0.6MB), on Cassandra 2.0, it works fine, but on 2.1-rc1 I get the following error within the Cassandra server (from the logs) and the query just dies: WARN [SharedPool-Worker-2

Bug on 2.1-rc1 with BLOBs?

2014-06-20 Thread Simon Chemouil
Hi, When I am sending BLOBs _below_ the max query size (blob size=0.6MB), on Cassandra 2.0, it works fine, but on 2.1-rc1 I get the following error within the Cassandra server (from the logs) and the query just dies: WARN [SharedPool-Worker-2] 2014-06-20 10:06:00,263 AbstractTracingAwareExecutor

Sending BLOBs to Cassandra +

2014-06-20 Thread Simon Chemouil
Hi, I read in Cassandra's FAQ that it is fine with BLOBs up to 64MB. Here am I trying to send a 1.6MB BLOB using CQL and Cassandra rejects my query with the following message: Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Request is too big: length 409600086 exceeds maximum

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad : > The only case in which it might be better to use an IN clause is if > the entire query can be satisfied from that machine. Otherwise, go > async. > > The nati