Thrift vs. Hector

2010-04-27 Thread David Boxenhorn
Hi all, I'm trying to install a Cassandra development environment for Java. It is much harder than I thought it would be I got Cassandra up and running with the CLI. So now I'm trying to get the Java interface up. Thrift is installed, supposedly. Basic questions (thanks in advance): 1. Is

Re: How to generate 'unique' identifiers for use in Cassandra

2010-04-27 Thread Andriy Bohdan
There's no easy and efficient way to implement auto_increment keys in cassandra. So people usually use UUIDs (http://en.wikipedia.org/wiki/UUID) for this purpose, which is considered globally unique. If you can use one of the fields from your data model as a unique key, better use it instead of ge

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
Hi David, I have a few questions (and answers), see inline On Tue, Apr 27, 2010 at 12:49 PM, David Boxenhorn wrote: > Hi all, > > I'm trying to install a Cassandra development environment for Java. It is > much harder than I thought it would be > > I got Cassandra up and running with the CLI

Re: Thrift vs. Hector

2010-04-27 Thread David Boxenhorn
Thanks, Ran. Very nice to meet you! Responses inline. Summary: I downloaded and unzipped Thrift and Hector. I included them in my project. What do I do now? On Tue, Apr 27, 2010 at 1:01 PM, Ran Tavory wrote: > Hi David, I have a few questions (and answers), see inline > > On Tue, Apr 27, 2010 a

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
That was a lame joke - here it is (in the mail). I can use some help with it, but what we have now for hector is here http://wiki.github.com/rantav/hector/ And for thrift, at lease the parts I know of: http://wiki.apache.org/cassandra/ThriftExamples but you sh

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
On Tue, Apr 27, 2010 at 1:14 PM, David Boxenhorn wrote: > Thanks, Ran. Very nice to meet you! > :) > > Responses inline. Summary: I downloaded and unzipped Thrift and Hector. I > included them in my project. What do I do now? > On Tue, Apr 27, 2010 at 1:01 PM, Ran Tavory wrote: > >> Hi David, I

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
good link, thanks, can I get one done for hector as? :) There are some wiki pages and blog posts in the above mentioned links so they are a good start for hector, then there are some unit tests, but I certainly would appreciate help from the community here. On Tue, Apr 27, 2010 at 1:25 PM, Swaroop

Call for help - Documentation

2010-04-27 Thread Ran Tavory
Do you use Hecor? Do you find it useful? Contribute back by helping others getting started and learn. The hector dev team would appreciate help writing documentation for hector. So far we have a few blog posts, a wiki and unit-tests, but proper documentation, tutorials and examples aren't there yet

Re: Thrift vs. Hector

2010-04-27 Thread David Boxenhorn
Some background: I am part of a team which is working on a rather large internet program. It is currently using an RDBMS, but we want to start using Cassandra. I'm working in Eclipse. I downloaded and unzipped hector-0.6.0-11.zip and got a whole bunch of jars, most of which don't look like Hector.

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
It includes thrift, yes. You need cassandra as well (the jar includes both client and server code) On Apr 27, 2010 2:03 PM, "David Boxenhorn" wrote: Some background: I am part of a team which is working on a rather large internet program. It is currently using an RDBMS, but we want to start usin

Re: Thrift vs. Hector

2010-04-27 Thread David Boxenhorn
So I should get rid of my Thrift project? On Tue, Apr 27, 2010 at 2:11 PM, Ran Tavory wrote: > It includes thrift, yes. > You need cassandra as well (the jar includes both client and server code) > > On Apr 27, 2010 2:03 PM, "David Boxenhorn" wrote: > > Some background: I am part of a team wh

Re: Thrift vs. Hector

2010-04-27 Thread Ran Tavory
if you're speaking of an eclipse thrift project, then I don't think you need one, no. On Tue, Apr 27, 2010 at 2:14 PM, David Boxenhorn wrote: > So I should get rid of my Thrift project? > > > On Tue, Apr 27, 2010 at 2:11 PM, Ran Tavory wrote: > >> It includes thrift, yes. >> You need cassandra

Re: Thrift vs. Hector

2010-04-27 Thread David Boxenhorn
Thanks for all your help, Ran! (I'll probably be needing more, later...) On Tue, Apr 27, 2010 at 2:28 PM, Ran Tavory wrote: > if you're speaking of an eclipse thrift project, then I don't think you > need one, no. > > > On Tue, Apr 27, 2010 at 2:14 PM, David Boxenhorn wrote: > >> So I should get

detecting write retries

2010-04-27 Thread Maxim Grinev
Hi all, if the node that proccesses my write fails, I should retry my write. If my write increments some counter, the counter will be incremented several times instead of just once. Does Cassandra support any mechanism to identify write retries to avoid multiple exacution? Maxim

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-27 Thread Eric Yu
I wrote a script to record the tpstats output every 5 seconds. Here is the output just before the jvm OOM: Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0280 STREAM-STAGE 0 0 0 RESPONSE

Re: Broken pipe

2010-04-27 Thread Jonathan Ellis
get_range_slices works fine in the system tests, so something is wrong on your client side. Some possibilities: - sending to a non-Thrift port - using an incompatible set of Thrift bindings than the one your server supports - mixing a framed client with a non-framed server or vice versa [movi

Multiple keyspaces per application?

2010-04-27 Thread David Boxenhorn
I just saw this note from storage-conf.xml: "Except in very unusual circumstances you will have one Keyspace per application." Why is that? I was thinking of putting our "normal data" and "indexes" in separate keyspaces so they could be maintained separately. What are the disadvantages of multi

Re: error during snapshot

2010-04-27 Thread Lee Parker
Can anyone help with this? It is preventing me from getting backups of our cluster. Lee Parker On Mon, Apr 26, 2010 at 10:02 PM, Lee Parker wrote: > I was attempting to get a snapshot on our cassandra nodes. I get the > following error every time I run nodetool ... snapshot. > > Exception in t

Re: error during snapshot

2010-04-27 Thread Eric Hauser
Have you read this? http://forums.sun.com/thread.jspa?messageID=9734530 I don't think EC2 instances have any swap. On Tue, Apr 27, 2010 at 10:16 AM, Lee Parker wrote: > Can anyone help with this? It is preventing me from getting backups of

Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Ed Anuff
Assuming a ColumnFamily with a CompareWith of TimeUUIDType, is it possible to call get_slice with an arbitrary date range? How would valid values for the start and finish attributes of the slice range be constructed? Thanks Ed

Re: error during snapshot

2010-04-27 Thread Lee Parker
Adding a swapfile fixed the error, but it doesn't look as though the process is even using the swap file at all. Lee Parker On Tue, Apr 27, 2010 at 9:49 AM, Eric Hauser wrote: > Have you read this? > > http://forums.sun.com/thread.jspa?messageID=9734530 > >

Re: Multiple keyspaces per application?

2010-04-27 Thread banks
The only advantage is the RF is per keyspace On Tue, Apr 27, 2010 at 6:57 AM, David Boxenhorn wrote: > I just saw this note from storage-conf.xml: > > "Except in very unusual circumstances you will have one Keyspace per > application." > > Why is that? > > I was thinking of putting our "normal d

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-27 Thread Schubert Zhang
Seems: ROW-MUTATION-STAGE 32 3349 63897493 is the clue, too many mutation requests are pending. Yes, I also think cassandra should add a mechanism to avoid too many requests pending (in queue). When the queue is full, just reject the request from client. Seems https://issues.apache.

Re: Multiple keyspaces per application?

2010-04-27 Thread David Boxenhorn
Thanks!. er, what is RF? On Tue, Apr 27, 2010 at 6:50 PM, banks wrote: > The only advantage is the RF is per keyspace > > > On Tue, Apr 27, 2010 at 6:57 AM, David Boxenhorn wrote: > >> I just saw this note from storage-conf.xml: >> >> "Except in very unusual circumstances you will have one

Re: Multiple keyspaces per application?

2010-04-27 Thread Miguel Verde
Replication Factor, the number of copies (replicas) of the data that Cassandra will store and an important number for quorum consistency calculations. On Tue, Apr 27, 2010 at 11:14 AM, David Boxenhorn wrote: > Thanks!. er, what is RF? > > > On Tue, Apr 27, 2010 at 6:50 PM, banks wrote: > >>

Re: Multiple keyspaces per application?

2010-04-27 Thread Sylvain Lebresne
I think the idea behind this sentence in storage-conf.xml is just to say that, unless you're doing something specific (like not have the same replication factor for all your data), there is no reason you would want more than one keyspace by application. As for disadvantages, you may want to know t

Re: Cassandra reverting deletes?

2010-04-27 Thread Joost Ouwerkerk
To check that rows are gone, I check that KeySlice.columns is empty. And as I mentioned, immediately after the delete job, this returns the expected number. Unfortunately I reproduced with QUORUM this morning. No node outages. I am going to try ALL to see if that changes anything, but I am star

Re: error during snapshot

2010-04-27 Thread Jonathan Shook
The allocation of memory may have failed depending on the available virtual memory, whether or not the memory would have been subsequently accessed by the process. Some systems do the work of allocating physical pages only when they are accessed for the first time. I'm not sure if yours is one of

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Justin Sanders
You're going to have to build TimeUUIDs based on the date range you are scanning. Problem is most UUID libraries build version 1 UUIDs based on the current time. I was able to get this working in Python by changing the library to allow me to pass in a time. This isn't safe for creating unique UU

ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Lucas Di Pentima
Hello, I'm importing some data on Cassandra, running only on my laptop, with all config values by default. After some time running the import script I've written (which includes some reads besides the import writes), I get the following error message and stack trace: /opt/local/lib/ruby/gems/1

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Ryan King
It looks like you need to update your storage-conf to bind to an ip other than loopback. -ryan On Tue, Apr 27, 2010 at 1:11 PM, Lucas Di Pentima wrote: > Hello, > > I'm importing some data on Cassandra, running only on my laptop, with all > config values by default. After some time running the

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Lucas Di Pentima
I think what Ed asked is that having some CF with TimeUUIDType allowed him to call get_slice with *any* date range, even though those values weren't inserted in Cassandra. I'm just a newbie here, but I think that this is not possible, and to do just that you have to construct an additional inde

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Lucas Di Pentima
Thanks Ryan for the fast response! Can you explain to me why binding against 127.0.0.1 causes the problem? Maybe it's useful to point this out in the documentation to avoid users deploy this kind of setups. Thanks again! El 27/04/2010, a las 17:28, Ryan King escribió: > It looks like you need

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Ryan King
On Tue, Apr 27, 2010 at 1:31 PM, Lucas Di Pentima wrote: > Thanks Ryan for the fast response! Can you explain to me why binding against > 127.0.0.1 causes the problem? Maybe it's useful to point this out in the > documentation to avoid users deploy this kind of setups. Are you trying to talk to

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Lucas Di Pentima
El 27/04/2010, a las 17:34, Ryan King escribió: > On Tue, Apr 27, 2010 at 1:31 PM, Lucas Di Pentima > wrote: >> Thanks Ryan for the fast response! Can you explain to me why binding against >> 127.0.0.1 causes the problem? Maybe it's useful to point this out in the >> documentation to avoid use

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Ryan King
On Tue, Apr 27, 2010 at 1:38 PM, Lucas Di Pentima wrote: > Nope, I'm doing some tests locally on my notebook (Macbook OSX 10.6.3 w/4GB > RAM). My script insert several hundred thousand columns with stable speed, > and then it exits throwing that exception. Its possible you're running into a di

Re: Cassandra reverting deletes?

2010-04-27 Thread Joost Ouwerkerk
Hmm... Even after deleting with cl.ALL, I'm getting data back for some rows after having deleted them. Which rows return data is inconsistent from one run of the job to the next. On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk wrote: > To check that rows are gone, I check that KeySlice.columns

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Lee Parker
I have used the solution presented by Justin and it works just fine. When you construct a TimeUUID with a specific timestamp and use that for the start or finish of the range slice, cassandra will use the timestamp embedded in the UUID even if that specific UUID doesn't exist in the index. It is

Storage Layout Questions

2010-04-27 Thread Jonathan Shook
I'm trying to model a one-to-many set of data in which both sides of the relation may grow arbitrarily large. There are arbitrarily many FOOs. For each FOO, there are arbitrarily many BARs. Both types are modeled as an object, containing multiple fields (columns) in the application. Given a key-add

batch_mutate - PHP

2010-04-27 Thread Ken McCarthy
Any examples on how to set up the mutation_map ? The best I can figure out is: array(1) { ["195f224d2e8d15d215726590754e3ff3"]=> array(1) { ["time"]=> array(8) { [0]=> object(cassandra_Mutation)#18 (2) { ["column_or_supercolumn"]=> object(cassandra_ColumnOr

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Lucas Di Pentima
El 27/04/2010, a las 18:11, Ryan King escribió: > On Tue, Apr 27, 2010 at 1:38 PM, Lucas Di Pentima > wrote: > >> Nope, I'm doing some tests locally on my notebook (Macbook OSX 10.6.3 w/4GB >> RAM). My script insert several hundred thousand columns with stable speed, >> and then it exits thro

Re: Cassandra reverting deletes?

2010-04-27 Thread Nathan McCall
Have you confirmed that your clocks are all synced in the cluster? This may be the result of an unintentional read-repair occurring if that were the case. -Nate On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk wrote: > Hmm... Even after deleting with cl.ALL, I'm getting data back for some > rows

Re: Cassandra reverting deletes?

2010-04-27 Thread Joost Ouwerkerk
Clocks are in sync: cluster04:~/cassandra$ dsh -g development "date" Tue Apr 27 17:36:33 EDT 2010 Tue Apr 27 17:36:33 EDT 2010 Tue Apr 27 17:36:33 EDT 2010 Tue Apr 27 17:36:33 EDT 2010 Tue Apr 27 17:36:34 EDT 2010 Tue Apr 27 17:36:34 EDT 2010 Tue Apr 27 17:36:34 EDT 2010 Tue Apr 27 17:36:34 EDT 20

Is Hector a wrapper around thrift?

2010-04-27 Thread S Ahmed
Just trying to get my head wrapped around everything here, so bare with me :) So Thrift can spit out generated code for any language, be it C#, Java or python etc. Hector is a higher level wrapper around the java generated code by Thrift. Do I have this right? And Hector is probably the most wo

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Ryan King
On Tue, Apr 27, 2010 at 2:29 PM, Lucas Di Pentima wrote: > > El 27/04/2010, a las 18:11, Ryan King escribió: > >> On Tue, Apr 27, 2010 at 1:38 PM, Lucas Di Pentima >> wrote: >> >>> Nope, I'm doing some tests locally on my notebook (Macbook OSX 10.6.3 w/4GB >>> RAM). My script insert several hund

Re: batch_mutate - PHP

2010-04-27 Thread Jordan Pittier
Hi, Here is a working example : $mutation_map = array("$key"=>array("Standard1" => array())); for($column_name=0; $column_name<$options['numcolumns']; $column_name++) { $column = new cassandra_Column(array('name' => "$column_name", 'value' => 'put your data here', 'timestamp' =>

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Lucas Di Pentima
El 27/04/2010, a las 18:23, Lee Parker escribió: > I have used the solution presented by Justin and it works just fine. When > you construct a TimeUUID with a specific timestamp and use that for the start > or finish of the range slice, cassandra will use the timestamp embedded in > the UUID

Re: ThriftTransportException using Ruby Gem 0.8.2 against Cassandra 0.6.1

2010-04-27 Thread Lucas Di Pentima
El 27/04/2010, a las 19:00, Ryan King escribió: > On Tue, Apr 27, 2010 at 2:29 PM, Lucas Di Pentima > wrote: >> >> El 27/04/2010, a las 18:11, Ryan King escribió: >> >>> On Tue, Apr 27, 2010 at 1:38 PM, Lucas Di Pentima >>> wrote: >>> Nope, I'm doing some tests locally on my notebook (M

Re: Multiple keyspaces per application?

2010-04-27 Thread Mark Robson
I can't see any advantage in using multiple keyspaces. It is highly unlikely that several applications would share the same Cassandra cluster in any nontrivial deployment. Things more important than replication-factor, such as partitioner and ring token distribution would be compromised by several

Re: Cassandra reverting deletes?

2010-04-27 Thread Joost Ouwerkerk
Update: I ran a test whereby I deleted ALL the rows in a column family, using a consistency level of ALL. To do this, I mapped the ColumnFamily and called remove on each row id. There were 1.5 million rows, so 1.5 million rows were deleted. I ran a counter job immediately after. This job maps t

Re: How to generate 'unique' identifiers for use in Cassandra

2010-04-27 Thread Mark Robson
> > 2010/4/26 Roland Hänel : > > Typically, in the SQL world we use things like AUTO_INCREMENT columns > that > > let us create a unique key automatically if a row is inserted into a > table. > auto_increment is an antipattern; it adds an extra key which you don't need (usually). If your data has

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Ed Anuff
Yes, Lucas was correct about the nature of my original question. I'm glad to hear that Justin's solution works, it makes for a much simpler schema. Ed On Tue, Apr 27, 2010 at 3:06 PM, Lucas Di Pentima wrote: > > El 27/04/2010, a las 18:23, Lee Parker escribió: > > > I have used the solution pre

Re: How do you construct an index and use it, especially in Ruby

2010-04-27 Thread Bob Hutchison
embedded response, way down below... On 2010-04-26, at 12:56 PM, Ryan King wrote: > On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison > wrote: >> >> Hi, >> >> I'm new to Cassandra and trying to work out how to do something that I've >> implemented any number of times (e.g. TokyoCabinet, Perst

Re: detecting write retries

2010-04-27 Thread Jonathan Ellis
You'll want to follow https://issues.apache.org/jira/browse/CASSANDRA-580 On Tue, Apr 27, 2010 at 7:06 AM, Maxim Grinev wrote: > Hi all, > if the node that proccesses my write fails, I should retry my write. If my > write increments some counter, the counter will be incremented several times > in

Re: Problem with JVM? concurrent mode failure

2010-04-27 Thread Jonathan Ellis
We're working on this in https://issues.apache.org/jira/browse/CASSANDRA-1014 On Tue, Apr 27, 2010 at 12:28 PM, Daniel Gimenez wrote: > Hi everyone, > several days ago I was doing some tests in a Cassandra installation and > everything was right (few inserts, few deletions, few reads). > > Yester

Re: how to get apache cassandra version with thrift client ?

2010-04-27 Thread Jonathan Ellis
You'll want to create a ticket at https://issues.apache.org/jira/browse/CASSANDRA to add that, then. On Mon, Apr 26, 2010 at 9:46 PM, Shuge Lee wrote: > I know I can get thrift API version. > However, I writing a CLI for Cassandra in Python with readline support, > and it will supports one-key de

Re: Problem with JVM? concurrent mode failure

2010-04-27 Thread Brandon Williams
On Tue, Apr 27, 2010 at 7:05 PM, Jonathan Ellis wrote: > We're working on this in > https://issues.apache.org/jira/browse/CASSANDRA-1014 There's an easy workaround noted in the ticket if you're willing to sacrifice a bit of performance: use batch mode instead of periodic for your commit log syn

Re: error during snapshot

2010-04-27 Thread Lee Parker
The system is a ubuntu server running 8.04 LTS. Now, I'm getting the problem again this evening even with the addition of the swap space. Lee Parker On Tue, Apr 27, 2010 at 1:13 PM, Jonathan Shook wrote: > The allocation of memory may have failed depending on the available virtual > memory, whe

Re: How to generate 'unique' identifiers for use in Cassandra

2010-04-27 Thread Shuge Lee
import uuid unique_key = uuid.uuid4() if you using Python. 2010/4/28 Mark Robson > 2010/4/26 Roland Hänel : >> > Typically, in the SQL world we use things like AUTO_INCREMENT columns >> that >> > let us create a unique key automatically if a row is inserted into a >> table. >> > > auto_increme

How to permanently delete one key ?

2010-04-27 Thread Jeff Zhang
Hi all, I use the thrift api to remove one key, and then use the get_range_slices to get all the keys and find that the key which I deleted is still there. I refer to the thrift api doc which says get_range_slices will apply to all the keys including the deleted keys. So my question is how can I d

Re: Is Hector a wrapper around thrift?

2010-04-27 Thread Jeff Zhang
Yes, Hector is a higher level wrapper around the java thrift api, also with other features such as connection poll. Not sure whether there's something similar in python. On Wed, Apr 28, 2010 at 5:54 AM, S Ahmed wrote: > Just trying to get my head wrapped around everything here, so bare with me >

Re: error during snapshot

2010-04-27 Thread Lee Parker
So, after reading the thread which Eric posted earlier, I have created a workaround for the issue. In my backup script, I add a swapfile with swapon, tell cassandra to create the snapshots, then remove the swapfile with swapoff. Then I continue with the rest of the work the backup script needs to

Re: How to permanently delete one key ?

2010-04-27 Thread Greg Lu
Hey Jeff, I think this article addresses your question: http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html -Greg On Tue, Apr 27, 2010 at 10:14 PM, Jeff Zhang wrote: > Hi all, > > I use the thrift api to remove one key, and then use the > get_range_slices to get all the ke

Re: How to permanently delete one key ?

2010-04-27 Thread Jeff Zhang
Thanks Lu, it's helpful. On Wed, Apr 28, 2010 at 11:42 AM, Greg Lu wrote: > Hey Jeff, > I think this article addresses your > question: http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html > -Greg > > On Tue, Apr 27, 2010 at 10:14 PM, Jeff Zhang wrote: >> >> Hi all, >> >> I

Re: how to store file in the cassandra?

2010-04-27 Thread Jeff Zhang
Mark, Thanks for your suggestion, It's really not a good idea to store one file in multiple columns in one row. The heap space problem will still exist. And I take your advice to store it in multiple rows, it works, I can event store one file with 2G. On Mon, Apr 26, 2010 at 6:12 PM, Mark Robso

Re: Problem with JVM? concurrent mode failure

2010-04-27 Thread Peter Schuller
>     -XX:+CMSIncrementalMode \ >     -XX:+CMSIncrementalPacing \ This may not be an issue given your other VM opts, but just FYI I have had some difficulty making the incremental CMS mode perform GC work sufficiently aggressively to avoid concurrent mode failures during significant garba