Re: Using CQL to insert a column to a row dynamically

2013-05-28 Thread Tristan Seligmann
On Mon, May 27, 2013 at 11:38 PM, Matthew Hillsborough < matthew.hillsboro...@gmail.com> wrote: > Originally what I thought of doing was creating a column family in > Cassandra named `ride_events`. Each row key would be a rideID that's simply > an integer. I would then arbitrarily create columns w

Re: Running Cassandra with no open TCP ports

2013-05-28 Thread Oleg Dulin
Mark: This begs a question -- why are you using Cassandra for this ? There are simpler noSQL stores than Cassandra that are better for embedding. Oleg On 2013-05-28 02:24:48 +, Mark Mccraw said: Hi All, I'm using Cassandra as an embedded datastore for a small service that doesn't need

Re: Running Cassandra with no open TCP ports

2013-05-28 Thread Edward Capriolo
While not exactly optimized for embedded systems there is no reason it could not be done. Today's super computer is tomorrows embedded watch processor. On Tue, May 28, 2013 at 9:11 AM, Oleg Dulin wrote: > Mark: > > This begs a question -- why are you using Cassandra for this ? There are > simpl

Re: Using CQL to insert a column to a row dynamically

2013-05-28 Thread Edward Capriolo
Or just: cli > create column family ride_events with comparator(Int32Type,Int32Type,Int32Type); On Tue, May 28, 2013 at 4:45 AM, Tristan Seligmann wrote: > On Mon, May 27, 2013 at 11:38 PM, Matthew Hillsborough < > matthew.hillsboro...@gmail.com> wrote: > >> Originally what I thought of doing wa

Cleanup understastanding

2013-05-28 Thread Víctor Hugo Oliveira Molinar
Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misundersta

Cassandra on a single (under-powered) instance?

2013-05-28 Thread Daniel Morton
Hello All. I am new to Cassandra and I am evaluating it for a project I am working on. This project has several distribution models, ranging from a cloud distribution where we would be collecting hundreds of millions of rows per day to a single box distribution where we could be collecting as few

weird token ownerships

2013-05-28 Thread Hiller, Dean
I was assuming my node a1 would always own token 0, but we just added 5 of 6 more nodes and a1 no longer owns that token range. I have a few questions on the table at the bottom 1. Is this supposed to happen where host a1 no longer owns token range 0(but that is in his cassandra.yaml file), b

Re: Running Cassandra with no open TCP ports

2013-05-28 Thread Mark Mccraw
Oleg: The simple answer for why I'm using Cassandra thusly is laziness/fear of uncertainty. I'm using Cassandra indirectly as the back end data store for Titan(http://thinkaurelius.github.io/titan/), which is a graph interface. Titan does let you swap out the data store, and it gives you seve

Re: weird token ownerships

2013-05-28 Thread Hiller, Dean
Sorry, missed some info. The table is sorted by token and the very first column is the node with that token in the cassandra.yaml file. The second to last column of 3 hostnames is the nodes that ended up with that data according to nodetool describering so taking the first row, a1 owns token 0 an

Re: Running Cassandra with no open TCP ports

2013-05-28 Thread Sam Overton
You can configure cassandra to use an ephemeral port for the storage endpoint by setting the following in cassandra.yaml: storage_port: 0 or by setting the system property cassandra.storage_port=0 Similarly for the RPC (thrift) endpoint, using rpc_port in cassandra.yaml or the system property cas

Re: Cleanup understastanding

2013-05-28 Thread Andrey Ilinykh
cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar < vhmoli...@gmail.com> wrote: > Hello everyone. > I have a daily main

Re: Cleanup understastanding

2013-05-28 Thread Robert Coli
On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar wrote: > So I'd like to know more about what does happens in a cleanup operation. > Appreciate any help. ./src/java/org/apache/cassandra/db/compaction/CompactionManager.java" line 591 of 1175 " logger.info("Cleaning up " + sstable);

Re: Suggested Ruby client using Cassandra 1.2.5

2013-05-28 Thread aaron morton
Go with the twitter client that is more mature / under active development. If you are just starting out / experimenting Thrift is fine to use, it's not going away. As you get more experience you may start to prefer CQL and by that time their may be a ruby/C driver. Cheers - Aa

Re: how to handle join properly in this case

2013-05-28 Thread aaron morton
A common pattern is to materialise views, that is store the join at the same time you are writing to CF's A and B. In this case it sounds like the two CF's are written to at different times. If that is the case you may need to do the join client side (do two reads). Hope that helps. ---

Re: Cassandra read reapair

2013-05-28 Thread aaron morton
Start using QUOURM for reads and writes and then run a nodetool repair. That should get you back to the land of the consistent. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/05/2013, at 10:01 PM, Kais Ahmed w

Re: how to handle join properly in this case

2013-05-28 Thread Hiller, Dean
Another option is joins on partitions to keep the number of stuff needing to join relatively small. PlayOrm actually supports joins of partition 1 of table A with partition X of table B. You then just keep the number of rows in each partition at less than millions and you can filter with the wher

data clean up problem

2013-05-28 Thread cem
Hi Experts, We have general problem about cleaning up data from the disk. I need to free the disk space after retention period and the customer wants to dimension the disk space base on that. After running multiple performance tests with TTL of 1 day we saw that the compaction couldn't keep up wi

Re: data clean up problem

2013-05-28 Thread Edward Capriolo
You need to change the gc_grace time of the column family. It defaults to 10 days. By default the tombstones will not go away for 10 days. On Tue, May 28, 2013 at 2:46 PM, cem wrote: > Hi Experts, > > We have general problem about cleaning up data from the disk. I need to > free the disk space

Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-05-28 Thread MHPC 2013
we apologize if you receive multiple copies of this message === CALL FOR PAPERS 2013 Workshop on Middleware for HPC and Big Data Systems MHPC '13 as part of Euro-Par 2013, Aachen, Germany

Re: data clean up problem

2013-05-28 Thread cem
Thanks for the answer but it is already set to 0 since I don't do any delete. Cem On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo wrote: > You need to change the gc_grace time of the column family. It defaults to > 10 days. By default the tombstones will not go away for 10 days. > > > On Tue,

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Don't do any delete != "need to free the disk space after retention period" which you have in both your emails. My understanding is TTL is an expiry and just like tombstones will only be really deleted upon a compaction(ie. You do have deletes via TTL from the sound of it). If you have TTL of

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
You said compaction can't keep up. Are you manually running compaction all the time or just letting cassandra kick off compactions when needed? Is compaction always 100% running or are you saying your disk is growing faster than you like and would like compactions to be always 100% running? (

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Also, how many nodes are you running? From: cem mailto:cayiro...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Tuesday, May 28, 2013 1:17 PM To: "user@cassandra.apache.org" mailto:use

Re: data clean up problem

2013-05-28 Thread cem
Thanks for the answer. Sorry for the misunderstanding. I tried to say I don't send delete request from the client so it safe to set gc_grace to 0. TTL is used for data clean up. I am not running a manual compaction. I tried that ones but it took a lot of time finish and I will not have this amount

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
How much disk used on each node? We run the suggested < 300G per node as above that compactions can have trouble keeping up. Ps. We run compactions during peak hours just fine because our client reroutes to the 2 of 3 nodes not running compactions based on seeing the slow node so performance s

RE: data clean up problem

2013-05-28 Thread Dwight Smith
How do you determine the slow node, client side response latency? -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: Tuesday, May 28, 2013 1:10 PM To: user@cassandra.apache.org Subject: Re: data clean up problem How much disk used on each node? We run the suggeste

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Actually, we did a huge investigation into this on astyanax and cassandra. Astyanax if I remember worked if configured correctly but casasndra did not so we patched cassandraŠfor some reason cassandra once on the co-ordinator who had one copy fo the data would wait for both other nodes to respond

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Oh and yes, astyanax uses client side response latency and cassandra does the same as a client of the other nodes. Dean On 5/28/13 2:23 PM, "Hiller, Dean" wrote: >Actually, we did a huge investigation into this on astyanax and cassandra. > Astyanax if I remember worked if configured correctly b

Re: data clean up problem

2013-05-28 Thread Bryan Talbot
I think what you're asking for (efficient removal of TTL'd write-once data) is already in the works but not until 2.0 it seems. https://issues.apache.org/jira/browse/CASSANDRA-5228 -Bryan On Tue, May 28, 2013 at 1:26 PM, Hiller, Dean wrote: > Oh and yes, astyanax uses client side response la

Dynamic column family using CQL2, possible?

2013-05-28 Thread Matthew Hillsborough
Hi all, I started building a schema using CQL3's interface following the instructions here: http://www.datastax.com/dev/blog/thrift-to-cql3 In particular, the dynamic column family instructions did exactly what I need to model my data on that blog post. I created a schema that looks like the fol

Re: data clean up problem

2013-05-28 Thread Robert Coli
On Tue, May 28, 2013 at 2:38 PM, Bryan Talbot wrote: > I think what you're asking for (efficient removal of TTL'd write-once data) > is already in the works but not until 2.0 it seems. If your entire dataset in a keyspace or column family is deleted every [small time period], then maybe use TRUNC

Re: Cleanup understastanding

2013-05-28 Thread Takenori Sato(Cloudian)
Hi Victor, As Andrey said, running cleanup doesn't work as you expect. > The reason I need to clean things is that I wont need most of my inserted data on the next day. Deleted objects(columns/records) become deletable from sstable file when they get expired(after gc_grace_seconds). Such d

Cleanup cannot run before a node has joined the ring

2013-05-28 Thread S C
I have added two nodes to the cluster running on 1.1.9 and when I run a "nodetool cleanup" I see the following in the logs. INFO [CompactionExecutor:7] 2013-05-28 22:41:58,480 CompactionManager.java (line 531) Cleanup cannot run before a node has joined the ring However, "nodetool ring/gossip/inf