Re: ORM in Cassandra?

2010-04-23 Thread Benoit Perroud
I understand the question more like : Is there already a lib which
help to get rid of writing hardcoded and hard to maintain lines like :

MyClass data;
String[] myFields = {"name", "label", ...}
List columns;
for (String field : myFields) {
if (field == "name") {
   columns.add(new Column(field, data.getName()))
} else if (field == "label") {
  columns.add(new Column(field, data.getLabel()))
} else ...
}
(same for loading (instanciating) automagically the object).

Kind regards,

Benoit.

2010/4/23 dir dir :
>>So maybe it's weird to combine ORM and Cassandra, right? Is there
>>anything we can take from ORM?
>
> Honestly I do not understand what is your question. It is clear that
> you can not combine ORM such as Hibernate or iBATIS with Cassandra.
> Cassandra it self is not a RDBMS, so you will not map the table into
> the object.
>
> Dir.
>
> On Fri, Apr 23, 2010 at 12:12 PM, aXqd  wrote:
>>
>> Hi, all:
>>
>> I know many people regard O/R Mapping as rubbish. However it is
>> undeniable that ORM is quite easy to use in most simple cases,
>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
>> No-Relational solution.
>> So maybe it's weird to combine ORM and Cassandra, right? Is there
>> anything we can take from ORM?
>> I just hate to write CRUD functions/Data layer for each object in even
>> a disposable prototype program.
>>
>> Regards.
>> -Tian
>
>


Re: Row deletion and get_range_slices (cassandra 0.6.1)

2010-04-23 Thread Ryan King
On Thu, Apr 22, 2010 at 8:24 PM, David Harrison
 wrote:
> Do those tombstone-d keys ever get purged completely ?  I've tried
> shortening the GCGraceSeconds right down but they still don't get
> cleaned up.

The GCGraceSeconds will only apply when you compact data.

-ryan


Re: ORM in Cassandra?

2010-04-23 Thread aXqd
On Fri, Apr 23, 2010 at 1:25 PM, Jeremy Dunck  wrote:
> See what you think of tragedy:
> http://github.com/enki/tragedy

This one is feasible. I love the idea of 'Build your data model from
Model and Index'. Even better, I am INDEED working with python and
those indexes can be deferred to be resolved. Thanks.

>
>
> On Fri, Apr 23, 2010 at 12:12 AM, aXqd  wrote:
>> Hi, all:
>>
>> I know many people regard O/R Mapping as rubbish. However it is
>> undeniable that ORM is quite easy to use in most simple cases,
>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
>> No-Relational solution.
>> So maybe it's weird to combine ORM and Cassandra, right? Is there
>> anything we can take from ORM?
>> I just hate to write CRUD functions/Data layer for each object in even
>> a disposable prototype program.
>>
>> Regards.
>> -Tian
>>
>


Re: ORM in Cassandra?

2010-04-23 Thread aXqd
On Fri, Apr 23, 2010 at 3:03 PM, Benoit Perroud  wrote:
> I understand the question more like : Is there already a lib which
> help to get rid of writing hardcoded and hard to maintain lines like :
>
> MyClass data;
> String[] myFields = {"name", "label", ...}
> List columns;
> for (String field : myFields) {
>    if (field == "name") {
>       columns.add(new Column(field, data.getName()))
>    } else if (field == "label") {
>      columns.add(new Column(field, data.getLabel()))
>    } else ...
> }
> (same for loading (instanciating) automagically the object).

Yes, I am talking about this question.

>
> Kind regards,
>
> Benoit.
>
> 2010/4/23 dir dir :
>>>So maybe it's weird to combine ORM and Cassandra, right? Is there
>>>anything we can take from ORM?
>>
>> Honestly I do not understand what is your question. It is clear that
>> you can not combine ORM such as Hibernate or iBATIS with Cassandra.
>> Cassandra it self is not a RDBMS, so you will not map the table into
>> the object.
>>
>> Dir.

Sorry, English is not my mother tongue.

I do understand I cannot combine ORM with Cassandra, because they are
totally different ways for building our data model. But I think there
are still something can be learnt from ORM to make Cassandra easier to
use, just as what ORM did to RDBMS before.

IMHO, domain model is still intact when we design our software, hence
we need another way to map them to Cassandra's entity model. Relation
does not just go away in this case, hence we need another way to
express those relations and have a tool to set up Keyspace /
ColumnFamily automatically as what django's SYNCDB does.

According to my limited experience with Cassandra, now, we do more
when we write, and less when we read/query. Hence I think the problem
lies exactly in how we duplicate our data to do queries.

Please correct me if I got these all wrong.

>>
>> On Fri, Apr 23, 2010 at 12:12 PM, aXqd  wrote:
>>>
>>> Hi, all:
>>>
>>> I know many people regard O/R Mapping as rubbish. However it is
>>> undeniable that ORM is quite easy to use in most simple cases,
>>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
>>> No-Relational solution.
>>> So maybe it's weird to combine ORM and Cassandra, right? Is there
>>> anything we can take from ORM?
>>> I just hate to write CRUD functions/Data layer for each object in even
>>> a disposable prototype program.
>>>
>>> Regards.
>>> -Tian
>>
>>
>


Re: Row deletion and get_range_slices (cassandra 0.6.1)

2010-04-23 Thread David Harrison
So I'm guessing that means compaction doesn't include purging of
tombstone-d keys ?  Is there any situation or maintenance process that
does ? (or are keys forever?)

On 23 April 2010 17:44, Ryan King  wrote:
> On Thu, Apr 22, 2010 at 8:24 PM, David Harrison
>  wrote:
>> Do those tombstone-d keys ever get purged completely ?  I've tried
>> shortening the GCGraceSeconds right down but they still don't get
>> cleaned up.
>
> The GCGraceSeconds will only apply when you compact data.
>
> -ryan
>


Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Johan Oskarsson
I have written some code to avoid thrift reconnection, it just keeps the 
connection open between get_range_slices calls. 
I can extract that and put it up but not until early next week.

/Johan

On 23 apr 2010, at 05.09, Jonathan Ellis wrote:

> That would be an easy win, sure.
> 
> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk  wrote:
>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when
>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096) and
>> this seems to have fixed my problem, although it has slowed things down a
>> bit -- presumably because there are 16x more calls to get_range_slices.
>> While I was in that code I noticed that a new client was being created for
>> each batch get.  By decreasing the batch size, I've increased this
>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do some
>> connection pooling.  Anyone have any thoughts on that?
>> joost.
>> 



Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-23 Thread richard yao
I got the same question, and after that cassandra cann't be started.
I want to know how to restart the cassandra after it crashed.
Thanks for any reply.


How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Olivier Rosello
Here is my test code :

ColumnPath new_col;
new_col.__isset.column = true; /* this is required! */
new_col.column_family.assign("Incoming");
new_col.column.assign("1968ec4a-2a73-11df-9aca-00012e27a270");
client.insert("MyKeyspace", "somekey", new_col, "Random Value", time(NULL), 
ONE);

I didn't found in the C++ Cassandra/Thrift API how to specify TimeUUID bytes 
(16) to the column name. The ColumnPath type get only a string field for the 
name "column".

With a String like this example shows, the TimeUUID is a 36 chars String and 
this code throws a Exception "UUIDs must be exactly 16 bytes".

I didn't found a function like "client.insert_timeuuid_column" which convert 
the column name to an uint8_t[16]... or anything else which could help me.

Cheers,

Olivier



-- 
Olivier




Re: Cassandra Ruby Library's batch method example?

2010-04-23 Thread Lucas Di Pentima
So basically the idea behind the batch processing is some performance gain via 
network usage optimization? Thanks Jonathan!

El 22/04/2010, a las 21:32, Jonathan Ellis escribió:

> nope, there is no guarantee of that.  if the server fails
> mid-operation you have to retry it.
> 
> On Thu, Apr 22, 2010 at 7:23 PM, Lucas Di Pentima
>  wrote:
>> 
>> El 22/04/2010, a las 19:57, Ryan King escribió:
>> 
>>> The batch method in the cassandra gem is still a little crippled (it
>>> doesn't actually batch together everything it can), but you can use it
>>> like this:
>>> 
>>> http://github.com/fauna/cassandra/blob/master/test/cassandra_test.rb#L299
>> 
>> Thanks Ryan! One question about this feature: Ideally it should execute all 
>> batched operations or none, is that right? In case one batched operation 
>> raise some exception, the previous ops are rolled back?
>> 
>> --
>> Lucas Di Pentima - Santa Fe, Argentina
>> Jabber: lu...@di-pentima.com.ar
>> MSN: ldipent...@hotmail.com
>> 
>> 
>> 
>> 
>> 

--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lu...@di-pentima.com.ar
MSN: ldipent...@hotmail.com






org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
How is this specified?
Is it a large hex #?
A string of bytes in hex?

http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.


Re: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Jonathan Ellis
I would assume that you'd want to look for a C++ library that deals
with UUIDs.  Cassandra or Thrift aren't in the business of doing that
conversion.

On Fri, Apr 23, 2010 at 4:59 AM, Olivier Rosello  wrote:
> Here is my test code :
>
> ColumnPath new_col;
> new_col.__isset.column = true; /* this is required! */
> new_col.column_family.assign("Incoming");
> new_col.column.assign("1968ec4a-2a73-11df-9aca-00012e27a270");
> client.insert("MyKeyspace", "somekey", new_col, "Random Value", time(NULL), 
> ONE);
>
> I didn't found in the C++ Cassandra/Thrift API how to specify TimeUUID bytes 
> (16) to the column name. The ColumnPath type get only a string field for the 
> name "column".
>
> With a String like this example shows, the TimeUUID is a 36 chars String and 
> this code throws a Exception "UUIDs must be exactly 16 bytes".
>
> I didn't found a function like "client.insert_timeuuid_column" which convert 
> the column name to an uint8_t[16]... or anything else which could help me.
>
> Cheers,
>
> Olivier
>
>
>
> --
> Olivier
>
>
>


Re: Row deletion and get_range_slices (cassandra 0.6.1)

2010-04-23 Thread Jonathan Ellis
On Fri, Apr 23, 2010 at 3:53 AM, David Harrison
 wrote:
> So I'm guessing that means compaction doesn't include purging of
> tombstone-d keys ?

Incorrect.

http://wiki.apache.org/cassandra/DistributedDeletes
http://wiki.apache.org/cassandra/MemtableSSTable


Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Jonathan Ellis
Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
to track this.

On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson  wrote:
> I have written some code to avoid thrift reconnection, it just keeps the 
> connection open between get_range_slices calls.
> I can extract that and put it up but not until early next week.
>
> /Johan
>
> On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
>
>> That would be an easy win, sure.
>>
>> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk  
>> wrote:
>>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when
>>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096) and
>>> this seems to have fixed my problem, although it has slowed things down a
>>> bit -- presumably because there are 16x more calls to get_range_slices.
>>> While I was in that code I noticed that a new client was being created for
>>> each batch get.  By decreasing the batch size, I've increased this
>>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do some
>>> connection pooling.  Anyone have any thoughts on that?
>>> joost.
>>>
>
>


lazyboy - batch insert

2010-04-23 Thread Lubos Pusty
Hello,

Is there a possibility to execute batch mutation (e.g. insert) over
different rows with supercolumns in lazyboy?

To be more concrete:

Keyspace1: {

ColumnFamily1: {
  rowid1: {
SuperColumn1: {
   key1_1 : val1,
   key1_2 : val2,
..
 },
SuperColumn2: {
   key2_1 : val1,
   key2_2 : val2,
}
  }
  rowid2: {
SuperColumn3: {
...
}

}

Records are part of same columnfamily with following keys:

Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperColumn1',
key='rowid1')
Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperColumn2',
key='rowid1')
Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperColumn3',
key='rowid2')

Having those records filled with data and keys, I'd like to insert them in
batch instead of doing it one by one (as it is currently implemented in
lazyboy):

recordset.py:

def save(self, consistency=None):
consistency = consistency or self.consistency
records = modified(self.itervalues())
if not valid(records):
raise ErrorMissingField("Missing required field(s):",
missing(records))
for record in records:
record.save(consistency)
return self
I've seen in thrift interface specification method batch_mutate with
mutation_map, is it relevant to this type of operation?

Thanks,

Lubos


Question about a potential configuration scenario

2010-04-23 Thread Campbell, Joseph
Question:
It is possible to setup Cassandra such that 2 independent
Cassandra rings/clusters replicate to one another, ensuring that each
ring/cluster has at least 1 copy of all the data on each ring/cluster?

The setup is like this:
2 Data centers, one in Philadelphia and another in Denver.  In
each data center there exists a Cassandra ring/cluster.  Each data
center is being used as a live-live origin (meaning both data centers
are in use at any point in time).  I would like to be able to guarantee
that in the event that one or the other of the data centers goes down
that 'ALL' the available data in that failed data center is also
available in the other data center such that traffic to the origin
website that depends on the data can simply be switched over to the
other site (Using Akamai, or other tools).  Is this type of
configuration possible/available in Cassandra?  If so how would you set
it up, and what might some of the draw backs be?

Thanks,
Joe Campbell



--
Anyone can get hit by a MOVING car, 
but it takes skill to get hit by a PARKED car.
 -- Random Tee-shirt on Dysfunction

Joe Campbell | one comcast center | philadelphia, pa 19103 |
215.286.5073


RE: lazyboy - batch insert

2010-04-23 Thread Dop Sun
http://code.google.com/p/jassandra/source/browse/trunk/org.softao.jassandra/
src/org/softao/jassandra/thrift/ThriftColumnFamily.java

 

Insert and Delete method of this class are using batch_mutation.

Cheers.

Dop

 

From: Lubos Pusty [mailto:lubospu...@gmail.com] 
Sent: Friday, April 23, 2010 9:40 PM
To: user@cassandra.apache.org
Subject: lazyboy - batch insert

 

Hello,

 

Is there a possibility to execute batch mutation (e.g. insert) over
different rows with supercolumns in lazyboy?

 

To be more concrete:

 

Keyspace1: {

 

ColumnFamily1: {

  rowid1: {

SuperColumn1: {

   key1_1 : val1,

   key1_2 : val2,

..

 },

SuperColumn2: {

   key2_1 : val1,

   key2_2 : val2,

}

  }

  rowid2: {

SuperColumn3: {

...

}

 

}

 

Records are part of same columnfamily with following keys:

 

Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperCol
umn1', key='rowid1')

Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperCol
umn2', key='rowid1')

Key(keyspace='Keyspace',column_family='ColumnFamily1',super_column='SuperCol
umn3', key='rowid2')

 

Having those records filled with data and keys, I'd like to insert them in
batch instead of doing it one by one (as it is currently implemented in
lazyboy):

 

recordset.py:

 

 
def save(self, consistency=None):
 
consistency = consistency or self.consistency
 
records = modified(self.itervalues())
 
if not valid(records):
 
raise ErrorMissingField("Missing required field(s):",
 
missing(records))
 
for record in records:
 
record.save(consistency)
 
return self
 
 
I've seen in thrift interface specification method batch_mutate with
mutation_map, is it relevant to this type of operation?
 
 
Thanks,
 
 
Lubos
 


Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Joost Ouwerkerk
Awesome.  In the meantime, I hacked something similar myself.  The
performance difference does not appear to be material.  I think the real
killer is the get_range_slices call.  Relative to that, the cost of getting
the connection appears to be more or less trivial.  What can I do to
alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that to
0.6.1 ?
joost.

On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis  wrote:

> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
> to track this.
>
> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson 
> wrote:
> > I have written some code to avoid thrift reconnection, it just keeps the
> connection open between get_range_slices calls.
> > I can extract that and put it up but not until early next week.
> >
> > /Johan
> >
> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
> >
> >> That would be an easy win, sure.
> >>
> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk 
> wrote:
> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit()
> when
> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096)
> and
> >>> this seems to have fixed my problem, although it has slowed things down
> a
> >>> bit -- presumably because there are 16x more calls to get_range_slices.
> >>> While I was in that code I noticed that a new client was being created
> for
> >>> each batch get.  By decreasing the batch size, I've increased this
> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
> some
> >>> connection pooling.  Anyone have any thoughts on that?
> >>> joost.
> >>>
> >
> >
>


Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Jonathan Ellis
You could look into it, but it's not going to be an easy backport
since SSTableReader and SSTableScanner got split into two classes in
trunk.

On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk  wrote:
> Awesome.  In the meantime, I hacked something similar myself.  The
> performance difference does not appear to be material.  I think the real
> killer is the get_range_slices call.  Relative to that, the cost of getting
> the connection appears to be more or less trivial.  What can I do to
> alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that to
> 0.6.1 ?
> joost.
> On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis  wrote:
>>
>> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
>> to track this.
>>
>> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson 
>> wrote:
>> > I have written some code to avoid thrift reconnection, it just keeps the
>> > connection open between get_range_slices calls.
>> > I can extract that and put it up but not until early next week.
>> >
>> > /Johan
>> >
>> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
>> >
>> >> That would be an easy win, sure.
>> >>
>> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk 
>> >> wrote:
>> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit()
>> >>> when
>> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096)
>> >>> and
>> >>> this seems to have fixed my problem, although it has slowed things
>> >>> down a
>> >>> bit -- presumably because there are 16x more calls to
>> >>> get_range_slices.
>> >>> While I was in that code I noticed that a new client was being created
>> >>> for
>> >>> each batch get.  By decreasing the batch size, I've increased this
>> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
>> >>> some
>> >>> connection pooling.  Anyone have any thoughts on that?
>> >>> joost.
>> >>>
>> >
>> >
>
>


Re: Clarification on Ring operations in Cassandra 0.5.1

2010-04-23 Thread Jonathan Ellis
On Wed, Apr 21, 2010 at 1:48 PM, Anthony Molinaro
 wrote:
> So why is Token - 1 better?  Doesn't that result in more data movement
> than PreviousTokenInRing + 1?

No, because a node is responsible for (previous token, own token].  So
if you introduce token T-1 before token T then the only keys the old
node will be responsible for would be one corresponding exactly to T.

>> You could use scp-then-repair if you can tolerate slightly out of date
>> data being served by the new machine until the repair finishes.
>
> So with scp-then-repair, what would my config look like?  Would I specify
> the InitialToken as the same as the old token, but have AutoBootstrap
> set to false?

Right.


Re: Will cassandra block client ?

2010-04-23 Thread Todd Burruss
Ran, Under very heavy load using more than 50 threads with 20k payload size, I 
have seen Hector close connections then reopen so such that time_wait builds up 
and can no longer connect.



-Original Message-
From: Ran Tavory [ran...@gmail.com]
Received: 4/22/10 1:29 AM
To: user@cassandra.apache.org [u...@cassandra.apache.org]
Subject: Re: Will cassandra block client ?

it reuses connections, yes. but wouldn't hurt to check as well ;)
you may want to check the haproxy connections as well.

On Thu, Apr 22, 2010 at 11:26 AM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
I use the hector java client, I think it reuse the connection, or
maybe I should check the source code.


On Thu, Apr 22, 2010 at 4:10 PM, Ran Tavory 
mailto:ran...@gmail.com>> wrote:
> are you reusing your connections? If not, you may be running out of tcp
> ports on the bombing client. check netstat -na | grep TIME_WAIT
>
> On Thu, Apr 22, 2010 at 10:52 AM, Jeff Zhang 
> mailto:zjf...@gmail.com>> wrote:
>>
>> Hi all,
>>
>> I made too many requests to cassandra , and then after a while, I can
>> not connect to it. But I can still connect it from another machine ?
>> So does it mean cassandra will block client in some situation ?
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



--
Best Regards

Jeff Zhang



Re: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Olivier Rosello
Le vendredi 23 avril 2010 à 08:30 -0500, Jonathan Ellis a écrit :
> want to look for a C++ library that deals
> with UUIDs.  Cassandra or Thrift aren't 

Tank you for the response.

That's not the problem for me.

The problem is that new_col.column type is string.

uint8_t uuid[17];
uuid_generate_time(uuid); // returns a 16 bytes uuid v1
uuid[16] = 0;
ColumnPath new_col;
new_col.__isset.column = true; /* this is required! */
new_col.column_family.assign("Incoming");
new_col.column.assign((char *)uuid);
client.insert("MyKeyspace", "somekey", new_col, "Random Value",
time(NULL), ONE);

This works, except that sometimes, there are \0 bytes in uuid, so to
column.assign get a string shorter than 16 bytes...

My question is : how could I attribute the 16 bytes of uuid to column
without using a string ? :)


Cheers,

Olivier




-- 
Olivier Rosello -+- Free Mobile -+- 
orose...@corp.free.fr -+- 1231, avenue du mondial 98 - 34000 Montpellier
Tel : +33 4 34 67 89 08 -+- 



Re: Concurrent SuperColumn update question

2010-04-23 Thread Jonathan Ellis
On Thu, Apr 22, 2010 at 11:34 AM, tsuraan  wrote:
> Suppose I have a SuperColumn CF where one of the SuperColumns in each
> row is being treated as a list (e.g. keys only, values are just
> empty).  In this list, values will only ever be added; deletion never
> occurs.  If I have two processes simultaneously add values to this
> list (on different nodes, whatever), is that guaranteed to be safe
> from race conditions?

As long as you use column names that do not collide, such as uuids.

> Also, in a scheme like this, is there a limit on the number of entries
> I can have in my "list"?  I know that compaction normally needs to
> read an entire row into RAM in order to compact it.  Does this also
> apply to SuperColumn columns?

#3 here: http://wiki.apache.org/cassandra/CassandraLimitations


Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Jonathan Ellis
a normal String from the same universe as your keys.

On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones  wrote:
> How is this specified?
>
> Is it a large hex #?
>
> A string of bytes in hex?
>
>
>
> http://wiki.apache.org/cassandra/StorageConfiguration doesn’t say.


Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Joost Ouwerkerk
In that case I should probably wait for 0.7.  Is there any fundamental
performance difference in get_range_slices between Random and
Order-Preserving partitioners.  If so, by what factor?
joost.

On Fri, Apr 23, 2010 at 10:47 AM, Jonathan Ellis  wrote:

> You could look into it, but it's not going to be an easy backport
> since SSTableReader and SSTableScanner got split into two classes in
> trunk.
>
> On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk 
> wrote:
> > Awesome.  In the meantime, I hacked something similar myself.  The
> > performance difference does not appear to be material.  I think the real
> > killer is the get_range_slices call.  Relative to that, the cost of
> getting
> > the connection appears to be more or less trivial.  What can I do to
> > alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that
> to
> > 0.6.1 ?
> > joost.
> > On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis 
> wrote:
> >>
> >> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
> >> to track this.
> >>
> >> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson 
> >> wrote:
> >> > I have written some code to avoid thrift reconnection, it just keeps
> the
> >> > connection open between get_range_slices calls.
> >> > I can extract that and put it up but not until early next week.
> >> >
> >> > /Johan
> >> >
> >> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
> >> >
> >> >> That would be an easy win, sure.
> >> >>
> >> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <
> jo...@openplaces.org>
> >> >> wrote:
> >> >>> I was getting client timeouts in
> ColumnFamilyRecordReader.maybeInit()
> >> >>> when
> >> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from
> 4096)
> >> >>> and
> >> >>> this seems to have fixed my problem, although it has slowed things
> >> >>> down a
> >> >>> bit -- presumably because there are 16x more calls to
> >> >>> get_range_slices.
> >> >>> While I was in that code I noticed that a new client was being
> created
> >> >>> for
> >> >>> each batch get.  By decreasing the batch size, I've increased this
> >> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
> >> >>> some
> >> >>> connection pooling.  Anyone have any thoughts on that?
> >> >>> joost.
> >> >>>
> >> >
> >> >
> >
> >
>


RE: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Mark Jones
std::string strUUID(uuid, 16) will do the right thing for you.

-Original Message-
From: Olivier Rosello [mailto:orose...@corp.free.fr]
Sent: Friday, April 23, 2010 9:59 AM
To: user@cassandra.apache.org
Subject: Re: How to insert a row with a TimeUUIDType column in C++

Le vendredi 23 avril 2010 à 08:30 -0500, Jonathan Ellis a écrit :
> want to look for a C++ library that deals
> with UUIDs.  Cassandra or Thrift aren't

Tank you for the response.

That's not the problem for me.

The problem is that new_col.column type is string.

uint8_t uuid[17];
uuid_generate_time(uuid); // returns a 16 bytes uuid v1
uuid[16] = 0;
ColumnPath new_col;
new_col.__isset.column = true; /* this is required! */
new_col.column_family.assign("Incoming");
new_col.column.assign((char *)uuid);
client.insert("MyKeyspace", "somekey", new_col, "Random Value",
time(NULL), ONE);

This works, except that sometimes, there are \0 bytes in uuid, so to
column.assign get a string shorter than 16 bytes...

My question is : how could I attribute the 16 bytes of uuid to column
without using a string ? :)


Cheers,

Olivier




--
Olivier Rosello -+- Free Mobile -+-
orose...@corp.free.fr -+- 1231, avenue du mondial 98 - 34000 Montpellier
Tel : +33 4 34 67 89 08 -+-



RE: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
So if my keys are binary, is there any way to escape the keysequence in?

I have 20 bytes (any value 0x0-0xff is possible) as the key.

Are they compared as an array of bytes?  So that I can use truncation?

4 nodes, broken up by 0x00, 0x40, 0x80, 0xC0?


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Friday, April 23, 2010 10:22 AM
To: user@cassandra.apache.org
Subject: Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

a normal String from the same universe as your keys.

On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones  wrote:
> How is this specified?
>
> Is it a large hex #?
>
> A string of bytes in hex?
>
>
>
> http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.


RE: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Mark Jones
Turns out assign can be called with the length as well

So mod your code to be

new_col.column.assign((char *)uuid, 16);

and you are fixed.

-Original Message-
From: Mark Jones [mailto:mjo...@imagehawk.com]
Sent: Friday, April 23, 2010 10:52 AM
To: user@cassandra.apache.org
Subject: RE: How to insert a row with a TimeUUIDType column in C++

std::string strUUID(uuid, 16) will do the right thing for you.

-Original Message-
From: Olivier Rosello [mailto:orose...@corp.free.fr]
Sent: Friday, April 23, 2010 9:59 AM
To: user@cassandra.apache.org
Subject: Re: How to insert a row with a TimeUUIDType column in C++

Le vendredi 23 avril 2010 à 08:30 -0500, Jonathan Ellis a écrit :
> want to look for a C++ library that deals
> with UUIDs.  Cassandra or Thrift aren't

Tank you for the response.

That's not the problem for me.

The problem is that new_col.column type is string.

uint8_t uuid[17];
uuid_generate_time(uuid); // returns a 16 bytes uuid v1
uuid[16] = 0;
ColumnPath new_col;
new_col.__isset.column = true; /* this is required! */
new_col.column_family.assign("Incoming");
new_col.column.assign((char *)uuid);
client.insert("MyKeyspace", "somekey", new_col, "Random Value",
time(NULL), ONE);

This works, except that sometimes, there are \0 bytes in uuid, so to
column.assign get a string shorter than 16 bytes...

My question is : how could I attribute the 16 bytes of uuid to column
without using a string ? :)


Cheers,

Olivier




--
Olivier Rosello -+- Free Mobile -+-
orose...@corp.free.fr -+- 1231, avenue du mondial 98 - 34000 Montpellier
Tel : +33 4 34 67 89 08 -+-



Re: Concurrent SuperColumn update question

2010-04-23 Thread tsuraan
> On Thu, Apr 22, 2010 at 11:34 AM, tsuraan  wrote:
>> Suppose I have a SuperColumn CF where one of the SuperColumns in each
>> row is being treated as a list (e.g. keys only, values are just
>> empty).  In this list, values will only ever be added; deletion never
>> occurs.  If I have two processes simultaneously add values to this
>> list (on different nodes, whatever), is that guaranteed to be safe
>> from race conditions?
>
> As long as you use column names that do not collide, such as uuids.

Ok, thanks.

>> Also, in a scheme like this, is there a limit on the number of entries
>> I can have in my "list"?  I know that compaction normally needs to
>> read an entire row into RAM in order to compact it.  Does this also
>> apply to SuperColumn columns?
>
> #3 here: http://wiki.apache.org/cassandra/CassandraLimitations

Ouch, that's unfortunate.  Thanks for pointing me to that one; I had
somehow missed it, I guess.


Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-23 Thread Brandon Williams
On Fri, Apr 23, 2010 at 4:59 AM, richard yao wrote:

> I got the same question, and after that cassandra cann't be started.
> I want to know how to restart the cassandra after it crashed.
> Thanks for any reply.
>

Perhaps supply the error when you restart it?

-Brandon


Odd ring problems with 0.5.1

2010-04-23 Thread Anthony Molinaro
So I've been trying to migrate off of old ec2 m1.large nodes onto xlarge
nodes so I can get enough breathing room to then do an upgrade to 0.6.x
(I can't keep the large nodes up long enough, so I spend all my time
restarting and trying to move data, so can get all the packages I would
need for 0.6.x updated).

Anyway, I've been bootstrapping in new nodes in between old nodes then
running decommission.  Sometimes it seems to work, but I've been noticing
some oddness.

Some nodes appear in the ring from some nodes, but not others.  Right
now I have 14 nodes, 10 of those nodes have the same output of a
nodeprobe ring, the other 4 are missing one node.  Also, I have a
couple nodes that when I try to bootstrap them with an InitialToken
they get put into yet another ring with only a few nodes including
nodes that I called removetoken on.  They all have the same seed node
and it has not gone down.  The seed node has all nodes.

Anyone seen this?  How can I get those 4 nodes to see the missing node?
If a known issue has it been fixed in 0.6 or newer?

Thanks,

-Anthony

-- 

Anthony Molinaro   


Re: ORM in Cassandra?

2010-04-23 Thread Ned Wolpert
There is nothing wrong with what you are asking. Some work has been done to
get an ORM layer ontop of cassandra, for example, with a RubyOnRails
project. I'm trying to simplify cassandra integration with grails with the
plugin I'm writing.

The problem is ORM solutions to date are wrapping a relational database.
(The 'R' in ORM) Cassandra isn't a relational database so it does not map
cleanly.

On Fri, Apr 23, 2010 at 1:29 AM, aXqd  wrote:

> On Fri, Apr 23, 2010 at 3:03 PM, Benoit Perroud 
> wrote:
> > I understand the question more like : Is there already a lib which
> > help to get rid of writing hardcoded and hard to maintain lines like :
> >
> > MyClass data;
> > String[] myFields = {"name", "label", ...}
> > List columns;
> > for (String field : myFields) {
> >if (field == "name") {
> >   columns.add(new Column(field, data.getName()))
> >} else if (field == "label") {
> >  columns.add(new Column(field, data.getLabel()))
> >} else ...
> > }
> > (same for loading (instanciating) automagically the object).
>
> Yes, I am talking about this question.
>
> >
> > Kind regards,
> >
> > Benoit.
> >
> > 2010/4/23 dir dir :
> >>>So maybe it's weird to combine ORM and Cassandra, right? Is there
> >>>anything we can take from ORM?
> >>
> >> Honestly I do not understand what is your question. It is clear that
> >> you can not combine ORM such as Hibernate or iBATIS with Cassandra.
> >> Cassandra it self is not a RDBMS, so you will not map the table into
> >> the object.
> >>
> >> Dir.
>
> Sorry, English is not my mother tongue.
>
> I do understand I cannot combine ORM with Cassandra, because they are
> totally different ways for building our data model. But I think there
> are still something can be learnt from ORM to make Cassandra easier to
> use, just as what ORM did to RDBMS before.
>
> IMHO, domain model is still intact when we design our software, hence
> we need another way to map them to Cassandra's entity model. Relation
> does not just go away in this case, hence we need another way to
> express those relations and have a tool to set up Keyspace /
> ColumnFamily automatically as what django's SYNCDB does.
>
> According to my limited experience with Cassandra, now, we do more
> when we write, and less when we read/query. Hence I think the problem
> lies exactly in how we duplicate our data to do queries.
>
> Please correct me if I got these all wrong.
>
> >>
> >> On Fri, Apr 23, 2010 at 12:12 PM, aXqd  wrote:
> >>>
> >>> Hi, all:
> >>>
> >>> I know many people regard O/R Mapping as rubbish. However it is
> >>> undeniable that ORM is quite easy to use in most simple cases,
> >>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
> >>> No-Relational solution.
> >>> So maybe it's weird to combine ORM and Cassandra, right? Is there
> >>> anything we can take from ORM?
> >>> I just hate to write CRUD functions/Data layer for each object in even
> >>> a disposable prototype program.
> >>>
> >>> Regards.
> >>> -Tian
> >>
> >>
> >
>



-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe


Re: Odd ring problems with 0.5.1

2010-04-23 Thread Jonathan Ellis
On Fri, Apr 23, 2010 at 12:30 PM, Anthony Molinaro
 wrote:
> Some nodes appear in the ring from some nodes, but not others.  Right
> now I have 14 nodes, 10 of those nodes have the same output of a
> nodeprobe ring, the other 4 are missing one node.

What's the history of the missing node?  Is it a newly bootstrapped one?

> Also, I have a
> couple nodes that when I try to bootstrap them with an InitialToken
> they get put into yet another ring with only a few nodes including
> nodes that I called removetoken on.  They all have the same seed node
> and it has not gone down.  The seed node has all nodes.
>
> Anyone seen this?

The only time I have seen multiple rings is when some nodes have been
configured with a different seed than others.

-Jonathan


Re: getting cassandra setup on windows 7

2010-04-23 Thread S Ahmed
Any insights?

Much appreciated!

On Thu, Apr 22, 2010 at 11:13 PM, S Ahmed  wrote:

> I was just reading that thanks.
>
> What does he mean when he says:
>
> "This appears to be related to data storage paths I set, because if I
> switch the paths back to the default UNIX paths. Everything runs fine"
>
>
> On Thu, Apr 22, 2010 at 11:07 PM, Jonathan Ellis wrote:
>
>> https://issues.apache.org/jira/browse/CASSANDRA-948
>>
>> On Thu, Apr 22, 2010 at 10:03 PM, S Ahmed  wrote:
>> > Ok so I found the config section:
>> >
>> E:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\commitlog
>> >   
>> >
>> >
>>  
>> E:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\data
>> >   
>> >
>> > Now when I run:
>> > bin/cassandra
>> > I get:
>> > Starting cassandra server
>> > listening for transport dt_socket at address:
>> > exception in thread main java.lang.noclassDefFoundError:
>> > org/apache/cassthreft/cassandraDaemon
>> > could not find the main class:
>> > org.apache.cassandra.threif.cassandraDaemon...
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Apr 22, 2010 at 10:53 PM, S Ahmed  wrote:
>> >>
>> >> So I uncompressed the .tar, in the readme it says:
>> >> * tar -zxvf cassandra-$VERSION.tgz
>> >>   * cd cassandra-$VERSION
>> >>   * sudo mkdir -p /var/log/cassandra
>> >>   * sudo chown -R `whoami` /var/log/cassandra
>> >>   * sudo mkdir -p /var/lib/cassandra
>> >>   * sudo chown -R `whoami` /var/lib/cassandra
>> >>
>> >> My cassandra is at:
>> >> c:\java\cassandra\apache-cassandra-0.6.1/
>> >> So I have to create 2 folders log and lib?
>> >> Is there a setting in a config file that I edit?
>> >
>>
>
>


YCSB - Yahoo Cloud Serving Benchmark - now available for download

2010-04-23 Thread Brian Frank Cooper
Yahoo! Research is pleased to announce the release of the Yahoo! Cloud Serving 
Benchmark, YCSB v. 0.1.0, as an open source package. YCSB is a common 
benchmarking framework for cloud database, storage and serving systems. Results 
for benchmarking HBase, Cassandra, PNUTS and MySQL will be presented at the 
upcoming ACM Symposium on Cloud Computing on June 11. The toolkit is extensible 
to support benchmarking other systems, and defining new workloads.

Source code and documentation is available at:

http://wiki.github.com/brianfrankcooper/YCSB/



Re: Odd ring problems with 0.5.1

2010-04-23 Thread Anthony Molinaro

On Fri, Apr 23, 2010 at 12:41:17PM -0500, Jonathan Ellis wrote:
> On Fri, Apr 23, 2010 at 12:30 PM, Anthony Molinaro
>  wrote:
> > Some nodes appear in the ring from some nodes, but not others.  Right
> > now I have 14 nodes, 10 of those nodes have the same output of a
> > nodeprobe ring, the other 4 are missing one node.
> 
> What's the history of the missing node?  Is it a newly bootstrapped one?

Yes, newly bootstrapped with an initial token.

> > Also, I have a
> > couple nodes that when I try to bootstrap them with an InitialToken
> > they get put into yet another ring with only a few nodes including
> > nodes that I called removetoken on.  They all have the same seed node
> > and it has not gone down.  The seed node has all nodes.
> >
> > Anyone seen this?
> 
> The only time I have seen multiple rings is when some nodes have been
> configured with a different seed than others.

Yeah, that's the first thing I checked, but the seed is the same.  The
odd thing is also when I bootstrap a new node, it still finds hosts
which are no longer part of the cluster and puts them in the cluster.

I'm not sure how it would get this, maybe I need to restart my seed node?
When I run nodeprobe ring on the seed I don't see any of the hosts I
decommissioned, but maybe they are still listed there somewhere?

-Anthony

-- 

Anthony Molinaro   


Re: Odd ring problems with 0.5.1

2010-04-23 Thread Jonathan Ellis
On Fri, Apr 23, 2010 at 1:12 PM, Anthony Molinaro
 wrote:
> I'm not sure how it would get this, maybe I need to restart my seed node?

It's worth a try.  Sounds like you found an unusual bug in gossip.

> When I run nodeprobe ring on the seed I don't see any of the hosts I
> decommissioned, but maybe they are still listed there somewhere?

0.5 does leave decommissioned host information in gossip, but I'm not
sure how that applies to this problem.


Re: getting cassandra setup on windows 7

2010-04-23 Thread Mark Greene
Try the 
cassandra-with-fixes.bat
file
attached to the issue. I had the same issue an that bat file got cassandra
to start. It still throws another error complaining about the
log4j.properties.

On Fri, Apr 23, 2010 at 1:59 PM, S Ahmed  wrote:

> Any insights?
>
> Much appreciated!
>
>
> On Thu, Apr 22, 2010 at 11:13 PM, S Ahmed  wrote:
>
>> I was just reading that thanks.
>>
>> What does he mean when he says:
>>
>> "This appears to be related to data storage paths I set, because if I
>> switch the paths back to the default UNIX paths. Everything runs fine"
>>
>>
>> On Thu, Apr 22, 2010 at 11:07 PM, Jonathan Ellis wrote:
>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-948
>>>
>>> On Thu, Apr 22, 2010 at 10:03 PM, S Ahmed  wrote:
>>> > Ok so I found the config section:
>>> >
>>> E:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\commitlog
>>> >   
>>> >
>>> >
>>>  
>>> E:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\data
>>> >   
>>> >
>>> > Now when I run:
>>> > bin/cassandra
>>> > I get:
>>> > Starting cassandra server
>>> > listening for transport dt_socket at address:
>>> > exception in thread main java.lang.noclassDefFoundError:
>>> > org/apache/cassthreft/cassandraDaemon
>>> > could not find the main class:
>>> > org.apache.cassandra.threif.cassandraDaemon...
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Apr 22, 2010 at 10:53 PM, S Ahmed 
>>> wrote:
>>> >>
>>> >> So I uncompressed the .tar, in the readme it says:
>>> >> * tar -zxvf cassandra-$VERSION.tgz
>>> >>   * cd cassandra-$VERSION
>>> >>   * sudo mkdir -p /var/log/cassandra
>>> >>   * sudo chown -R `whoami` /var/log/cassandra
>>> >>   * sudo mkdir -p /var/lib/cassandra
>>> >>   * sudo chown -R `whoami` /var/lib/cassandra
>>> >>
>>> >> My cassandra is at:
>>> >> c:\java\cassandra\apache-cassandra-0.6.1/
>>> >> So I have to create 2 folders log and lib?
>>> >> Is there a setting in a config file that I edit?
>>> >
>>>
>>
>>
>


running cassandra as a service on windows

2010-04-23 Thread S Ahmed
Is it possible to have Cassandra run in the background on a windows server?

i.e. as a service so if the server reboots, cassandra will automatically
run?

I really hate how windows handles services


Re: running cassandra as a service on windows

2010-04-23 Thread Jonathan Ellis
you could do it with standard techniques to run java apps as windows
services.  i understand it's a bit painful.

On Fri, Apr 23, 2010 at 2:05 PM, S Ahmed  wrote:
> Is it possible to have Cassandra run in the background on a windows server?
> i.e. as a service so if the server reboots, cassandra will automatically
> run?
> I really hate how windows handles services


Re: running cassandra as a service on windows

2010-04-23 Thread Miguel Verde
https://issues.apache.org/jira/browse/CASSANDRA-292 points to
http://commons.apache.org/daemon/procrun.html which is used by other Apache
software to implement Windows services in Java.  CassandraDaemon conforms to
the Commons Daemon spec.
On Fri, Apr 23, 2010 at 2:20 PM, Jonathan Ellis  wrote:

> you could do it with standard techniques to run java apps as windows
> services.  i understand it's a bit painful.
>
> On Fri, Apr 23, 2010 at 2:05 PM, S Ahmed  wrote:
> > Is it possible to have Cassandra run in the background on a windows
> server?
> > i.e. as a service so if the server reboots, cassandra will automatically
> > run?
> > I really hate how windows handles services
>


Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-23 Thread Heath Oderman
Really interesting find.

After Jonathan E. suggested py_stress and it seemed clear the problem was in
my .net client I spent a few days debugging the client in detail.

I ended up changing my CassandraContext instantiation to use a

  TBuffferedProtocol(TSocket) instead of a
  TSocket directly.

The difference was *dramatic*.

The calls to debian suddenly behaved as expected, eclipsing the write speeds
under load of the calls to the OSX box by a factor of 2!

The change caused a performance increase in the client communicating with
OSX as well, but the improvement was smaller.

I don't understand exactly, but clearly there's a difference in the way that
Debian and OSX handle socket level communications that has a big effect on a
.net client calling in from windows.

It's been a really interesting experiment and I throughly appreciate all the
help and pointers I've gotten from this list.

Cassandra is so fast, and so impressive it strains credibility.  I'm totally
amazed by what these guys have put together.

Thanks,
Stu


Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-23 Thread Jonathan Ellis
Nice detective work!  Good to know what was causing that.  Thanks!

On Fri, Apr 23, 2010 at 2:29 PM, Heath Oderman  wrote:
> Really interesting find.
> After Jonathan E. suggested py_stress and it seemed clear the problem was in
> my .net client I spent a few days debugging the client in detail.
> I ended up changing my CassandraContext instantiation to use a
>           TBuffferedProtocol(TSocket) instead of a
>           TSocket directly.
> The difference was *dramatic*.
> The calls to debian suddenly behaved as expected, eclipsing the write speeds
> under load of the calls to the OSX box by a factor of 2!
> The change caused a performance increase in the client communicating with
> OSX as well, but the improvement was smaller.
> I don't understand exactly, but clearly there's a difference in the way that
> Debian and OSX handle socket level communications that has a big effect on a
> .net client calling in from windows.
> It's been a really interesting experiment and I throughly appreciate all the
> help and pointers I've gotten from this list.
> Cassandra is so fast, and so impressive it strains credibility.  I'm totally
> amazed by what these guys have put together.
> Thanks,
> Stu


Trove maps

2010-04-23 Thread Carlos Sanchez
Jonathan,

Have you thought of using Trove collections instead of regular java collections 
(HashMap / HashSet) in Cassandra? Trove maps are faster and require less memory

Carlos

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Re: Trove maps

2010-04-23 Thread Jonathan Ellis
>From what I have seen Trove is only a win when you are doing Maps of
primitives, which is mostly not what we use in Cassandra.  (The one
exception I can think of is a map of int -> columnfamilies in
CommitLogHeader.  You're welcome to experiment and see if using Trove
there or elsewhere makes a measurable difference with stress.py.)

-Jonathan

On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
 wrote:
> Jonathan,
>
> Have you thought of using Trove collections instead of regular java 
> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and 
> require less memory
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended 
> recipients and may contain proprietary and/or confidential information which 
> may be privileged or otherwise protected from disclosure. Any unauthorized 
> review, use, disclosure or distribution is prohibited. If you are not an 
> intended recipient, please contact the sender by reply email and destroy the 
> original message and any copies of the message as well as any attachments to 
> the original message.
>


Re: Trove maps

2010-04-23 Thread Carlos Sanchez
I will try to modify the code... what I like about Trove is that even for 
regular maps (non primitive) there are no Entry objects created so there are 
much less references to be gced

On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:

> From what I have seen Trove is only a win when you are doing Maps of
> primitives, which is mostly not what we use in Cassandra.  (The one
> exception I can think of is a map of int -> columnfamilies in
> CommitLogHeader.  You're welcome to experiment and see if using Trove
> there or elsewhere makes a measurable difference with stress.py.)
>
> -Jonathan
>
> On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
>  wrote:
>> Jonathan,
>>
>> Have you thought of using Trove collections instead of regular java 
>> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and 
>> require less memory
>>
>> Carlos
>>
>> This email message and any attachments are for the sole use of the intended 
>> recipients and may contain proprietary and/or confidential information which 
>> may be privileged or otherwise protected from disclosure. Any unauthorized 
>> review, use, disclosure or distribution is prohibited. If you are not an 
>> intended recipient, please contact the sender by reply email and destroy the 
>> original message and any copies of the message as well as any attachments to 
>> the original message.
>>


This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Re: Will cassandra block client ?

2010-04-23 Thread Ran Tavory
This used to be the case but was fixed couple of weeks ago. Which version
are you using?

On Apr 23, 2010 5:56 PM, "Todd Burruss"  wrote:

 Ran, Under very heavy load using more than 50 threads with 20k payload
size, I have seen Hector close connections then reopen so such that
time_wait builds up and can no longer connect.



-Original Message-
From: Ran Tavory [ran...@gmail.com]
Received: 4/22/10 1:29 AM
To: user...


Re: Trove maps

2010-04-23 Thread Avinash Lakshman
I think the GPL license of Trove prevents us from using it in Cassadra. But
yes for all its maps it uses Open Addressing which is much more memory
efficient than linear chaining that is employed in the JDK.

Avinash

On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez <
carlos.sanc...@riskmetrics.com> wrote:

> I will try to modify the code... what I like about Trove is that even for
> regular maps (non primitive) there are no Entry objects created so there are
> much less references to be gced
>
> On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:
>
> > From what I have seen Trove is only a win when you are doing Maps of
> > primitives, which is mostly not what we use in Cassandra.  (The one
> > exception I can think of is a map of int -> columnfamilies in
> > CommitLogHeader.  You're welcome to experiment and see if using Trove
> > there or elsewhere makes a measurable difference with stress.py.)
> >
> > -Jonathan
> >
> > On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
> >  wrote:
> >> Jonathan,
> >>
> >> Have you thought of using Trove collections instead of regular java
> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and
> require less memory
> >>
> >> Carlos
> >>
> >> This email message and any attachments are for the sole use of the
> intended recipients and may contain proprietary and/or confidential
> information which may be privileged or otherwise protected from disclosure.
> Any unauthorized review, use, disclosure or distribution is prohibited. If
> you are not an intended recipient, please contact the sender by reply email
> and destroy the original message and any copies of the message as well as
> any attachments to the original message.
> >>
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>


Super and Regular Columns

2010-04-23 Thread Robert
I am starting out with Cassandra and I had a couple of questions, I read a
lot of the documentation including:

http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

First I wanted to make sure I understand this bug:
http://issues.apache.org/jira/browse/CASSANDRA-598

Borrowing from the the example provided in that article, would an example
subcolumn be 'friend1' or 'street'?

AddressBook = { // this is a ColumnFamily of type Super
phatduckk: {// this is the key to this row inside the Super CF
friend1: {street: "8th street", zip: "90210", city: "Beverley
Hills", state: "CA"},
}, // end row
ieure: { // this is the key to another row in the Super CF
// all the address book entries for ieure
joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
William: {street: "Armpit Dr", zip: "93301", city:
"Bakersfield", state: "CA"},
},
}

Second, for a one to many map where ordering is not important what are
the tradeoffs between these two options?

A. Use a ColumnFamily where the key maps to an item id, and in each
row each column is one of the items it is mapped to?

B. Use SuperColumnFamily where each key is an item id, and each column
(are these the right terms?) is one of the items it is mapped to, and
the value is essentially empty?

Thanks!
Robert Scott


Trying To Understand get_range_slices Results When Using RandomPartitioner

2010-04-23 Thread Larry Root
I trying to better understand how using the RandomPartitioner will affect my
ability to select ranges of keys. Consider my simple example where we have
many online games across different game genres (GameType). These games need
to store data for each one of their users. With that in mind consider the
following data model:

enum GameType {'RPG', 'FPS', 'ARCADE'}

{
"GameData": { // Super Column Family

*GameType+"1234"*: {// Row (concat gametype with a
game id for example)
*"user-data:5678"*:{// Super column (user data)
*"user_prop_name"*: "value",// Subcolumn (arbitrary user
properties and values)
*"another_prop_name"*: "value",
 ...
},
*"user-data:9012"*:{
*"**user_prop_name**"*: "value",
 ...
}
},

* GameType+"3456"*: {...},
*GameType+"7890"*: {...},
...
}
}

Assume we have a multi node cluster running Cassandra 0.6.1. In that
scenario could some one help me understand what the result would be in the
following cases:

   1. We use a range slice to grab keys for all 'RPG' games (range slice at
   the ROW level). Would we be able to get all games back in a single query or
   would that not be guaranteed?

   2. For a given game we use a range slice to grab all user-data keys in
   which the ID starts with '5' (range slice at the COLUMN level). Again, would
   we be able to get all keys in one call (assuming number of keys in the
   result was not an issue)?

   3. Finally for a given game and a given user we do a range slice to grab
   all user properties that start with 'a' (range slice at the SUBCOLUMN level
   of a SUPERCOLUMN). Is that possible in one call?

I'm trying to understand at what level the RandomPartioner affects my
example data model. Is it at a fixed level like just ROWS (the sub data is
fixed to the same node) or is all data at every level *randomized* across
all nodes.

Are there any tricks to doing these sort of range slices using RP? For
example if I set my consistency level to 'ALL' when doing a range slice
would that effectively compile a complete result set for me?

Thanks for the help!

larry


MESSAGE-STREAMING-POOL exception

2010-04-23 Thread B. Todd Burruss
i see these exceptions on 4 out of the 7 nodes in my cluster.  in 
addition those same four nodes all show AE-SERVICE-STAGE with pending 
work, and been showing this for several hours now.  each node in the 
cluster has less than 2gb, so it should be finished by now.


when i do nodetool streams on these nodes i see streams with byte counts 
that are never increasing.



2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1] 
[DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor

java.lang.RuntimeException: java.net.ConnectException: Connection timed out
   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
   at sun.nio.ch.Net.connect(Native Method)
   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
   at 
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

   ... 3 more
2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1] 
[CassandraDaemon.java:78] Fatal exception in thread 
Thread[MESSAGE-STREAMING-POOL:1,5,main]

java.lang.RuntimeException: java.net.ConnectException: Connection timed out
   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
   at sun.nio.ch.Net.connect(Native Method)
   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
   at 
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

   ... 3 more



Re: Trove maps

2010-04-23 Thread Eric Hauser
According to their license page, it is LGPL.


On Fri, Apr 23, 2010 at 4:25 PM, Avinash Lakshman <
avinash.laksh...@gmail.com> wrote:

> I think the GPL license of Trove prevents us from using it in Cassadra. But
> yes for all its maps it uses Open Addressing which is much more memory
> efficient than linear chaining that is employed in the JDK.
>
> Avinash
>
> On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez <
> carlos.sanc...@riskmetrics.com> wrote:
>
>> I will try to modify the code... what I like about Trove is that even for
>> regular maps (non primitive) there are no Entry objects created so there are
>> much less references to be gced
>>
>> On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:
>>
>> > From what I have seen Trove is only a win when you are doing Maps of
>> > primitives, which is mostly not what we use in Cassandra.  (The one
>> > exception I can think of is a map of int -> columnfamilies in
>> > CommitLogHeader.  You're welcome to experiment and see if using Trove
>> > there or elsewhere makes a measurable difference with stress.py.)
>> >
>> > -Jonathan
>> >
>> > On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
>> >  wrote:
>> >> Jonathan,
>> >>
>> >> Have you thought of using Trove collections instead of regular java
>> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and
>> require less memory
>> >>
>> >> Carlos
>> >>
>> >> This email message and any attachments are for the sole use of the
>> intended recipients and may contain proprietary and/or confidential
>> information which may be privileged or otherwise protected from disclosure.
>> Any unauthorized review, use, disclosure or distribution is prohibited. If
>> you are not an intended recipient, please contact the sender by reply email
>> and destroy the original message and any copies of the message as well as
>> any attachments to the original message.
>> >>
>>
>>
>> This email message and any attachments are for the sole use of the
>> intended recipients and may contain proprietary and/or confidential
>> information which may be privileged or otherwise protected from disclosure.
>> Any unauthorized review, use, disclosure or distribution is prohibited. If
>> you are not an intended recipient, please contact the sender by reply email
>> and destroy the original message and any copies of the message as well as
>> any attachments to the original message.
>>
>
>


Re: MESSAGE-STREAMING-POOL exception

2010-04-23 Thread Jonathan Ellis
java.net.ConnectException: Connection timed out at
sun.nio.ch.Net.connect is an os-level connection problem.

On Fri, Apr 23, 2010 at 3:34 PM, B. Todd Burruss  wrote:
> i see these exceptions on 4 out of the 7 nodes in my cluster.  in addition
> those same four nodes all show AE-SERVICE-STAGE with pending work, and been
> showing this for several hours now.  each node in the cluster has less than
> 2gb, so it should be finished by now.
>
> when i do nodetool streams on these nodes i see streams with byte counts
> that are never increasing.
>
>
> 2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1]
> [DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.net.ConnectException: Connection timed out
>       at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.ConnectException: Connection timed out
>       at sun.nio.ch.Net.connect(Native Method)
>       at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>       at
> org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
>       at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>       ... 3 more
> 2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1]
> [CassandraDaemon.java:78] Fatal exception in thread
> Thread[MESSAGE-STREAMING-POOL:1,5,main]
> java.lang.RuntimeException: java.net.ConnectException: Connection timed out
>       at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.ConnectException: Connection timed out
>       at sun.nio.ch.Net.connect(Native Method)
>       at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>       at
> org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
>       at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>       ... 3 more
>
>


Re: MESSAGE-STREAMING-POOL exception

2010-04-23 Thread B. Todd Burruss

i agree, but it seems to have implications on the streaming service.

Jonathan Ellis wrote:

java.net.ConnectException: Connection timed out at
sun.nio.ch.Net.connect is an os-level connection problem.

On Fri, Apr 23, 2010 at 3:34 PM, B. Todd Burruss  wrote:
  

i see these exceptions on 4 out of the 7 nodes in my cluster.  in addition
those same four nodes all show AE-SERVICE-STAGE with pending work, and been
showing this for several hours now.  each node in the cluster has less than
2gb, so it should be finished by now.

when i do nodetool streams on these nodes i see streams with byte counts
that are never increasing.


2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1]
[DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor
java.lang.RuntimeException: java.net.ConnectException: Connection timed out
  at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
  at sun.nio.ch.Net.connect(Native Method)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
  at
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
  at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  ... 3 more
2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1]
[CassandraDaemon.java:78] Fatal exception in thread
Thread[MESSAGE-STREAMING-POOL:1,5,main]
java.lang.RuntimeException: java.net.ConnectException: Connection timed out
  at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
  at sun.nio.ch.Net.connect(Native Method)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
  at
org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
  at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  ... 3 more





Re: MESSAGE-STREAMING-POOL exception

2010-04-23 Thread Jonathan Ellis
Can you create a ticket?

On Fri, Apr 23, 2010 at 3:50 PM, B. Todd Burruss  wrote:
> i agree, but it seems to have implications on the streaming service.
>
> Jonathan Ellis wrote:
>>
>> java.net.ConnectException: Connection timed out at
>> sun.nio.ch.Net.connect is an os-level connection problem.
>>
>> On Fri, Apr 23, 2010 at 3:34 PM, B. Todd Burruss 
>> wrote:
>>
>>>
>>> i see these exceptions on 4 out of the 7 nodes in my cluster.  in
>>> addition
>>> those same four nodes all show AE-SERVICE-STAGE with pending work, and
>>> been
>>> showing this for several hours now.  each node in the cluster has less
>>> than
>>> 2gb, so it should be finished by now.
>>>
>>> when i do nodetool streams on these nodes i see streams with byte counts
>>> that are never increasing.
>>>
>>>
>>> 2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1]
>>> [DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor
>>> java.lang.RuntimeException: java.net.ConnectException: Connection timed
>>> out
>>>      at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>      at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>      at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>      at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.net.ConnectException: Connection timed out
>>>      at sun.nio.ch.Net.connect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>>>      at
>>>
>>> org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
>>>      at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>      ... 3 more
>>> 2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1]
>>> [CassandraDaemon.java:78] Fatal exception in thread
>>> Thread[MESSAGE-STREAMING-POOL:1,5,main]
>>> java.lang.RuntimeException: java.net.ConnectException: Connection timed
>>> out
>>>      at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>      at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>      at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>      at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.net.ConnectException: Connection timed out
>>>      at sun.nio.ch.Net.connect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>>>      at
>>>
>>> org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
>>>      at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>      ... 3 more
>>>
>>>
>>>
>


RE: Trove maps

2010-04-23 Thread Mark Jones
Eliminating GC hell would probably do a lot to help Cassandra maintain speed vs 
periods of superfast/superslow performance.  I look forward to hearing how this 
experiment goes.

From: Eric Hauser [mailto:ewhau...@gmail.com]
Sent: Friday, April 23, 2010 3:37 PM
To: user@cassandra.apache.org
Subject: Re: Trove maps

According to their license page, it is LGPL.

On Fri, Apr 23, 2010 at 4:25 PM, Avinash Lakshman 
mailto:avinash.laksh...@gmail.com>> wrote:
I think the GPL license of Trove prevents us from using it in Cassadra. But yes 
for all its maps it uses Open Addressing which is much more memory efficient 
than linear chaining that is employed in the JDK.

Avinash
On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez 
mailto:carlos.sanc...@riskmetrics.com>> wrote:
I will try to modify the code... what I like about Trove is that even for 
regular maps (non primitive) there are no Entry objects created so there are 
much less references to be gced

On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:

> From what I have seen Trove is only a win when you are doing Maps of
> primitives, which is mostly not what we use in Cassandra.  (The one
> exception I can think of is a map of int -> columnfamilies in
> CommitLogHeader.  You're welcome to experiment and see if using Trove
> there or elsewhere makes a measurable difference with stress.py.)
>
> -Jonathan
>
> On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
> mailto:carlos.sanc...@riskmetrics.com>> wrote:
>> Jonathan,
>>
>> Have you thought of using Trove collections instead of regular java 
>> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and 
>> require less memory
>>
>> Carlos
>>
>> This email message and any attachments are for the sole use of the intended 
>> recipients and may contain proprietary and/or confidential information which 
>> may be privileged or otherwise protected from disclosure. Any unauthorized 
>> review, use, disclosure or distribution is prohibited. If you are not an 
>> intended recipient, please contact the sender by reply email and destroy the 
>> original message and any copies of the message as well as any attachments to 
>> the original message.
>>


This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.




Re: YCSB - Yahoo Cloud Serving Benchmark - now available for download

2010-04-23 Thread Jeff Hodges
Hell yeah!
--
Jeff

On Fri, Apr 23, 2010 at 10:59 AM, Brian Frank Cooper
 wrote:
> Yahoo! Research is pleased to announce the release of the Yahoo! Cloud
> Serving Benchmark, YCSB v. 0.1.0, as an open source package. YCSB is a
> common benchmarking framework for cloud database, storage and serving
> systems. Results for benchmarking HBase, Cassandra, PNUTS and MySQL will be
> presented at the upcoming ACM Symposium on Cloud Computing on June 11. The
> toolkit is extensible to support benchmarking other systems, and defining
> new workloads.
>
> Source code and documentation is available at:
>
> http://wiki.github.com/brianfrankcooper/YCSB/
>
>


Re: MESSAGE-STREAMING-POOL exception

2010-04-23 Thread B. Todd Burruss

https://issues.apache.org/jira/browse/CASSANDRA-1019

Jonathan Ellis wrote:

Can you create a ticket?

On Fri, Apr 23, 2010 at 3:50 PM, B. Todd Burruss  wrote:
  

i agree, but it seems to have implications on the streaming service.

Jonathan Ellis wrote:


java.net.ConnectException: Connection timed out at
sun.nio.ch.Net.connect is an os-level connection problem.

On Fri, Apr 23, 2010 at 3:34 PM, B. Todd Burruss 
wrote:

  

i see these exceptions on 4 out of the 7 nodes in my cluster.  in
addition
those same four nodes all show AE-SERVICE-STAGE with pending work, and
been
showing this for several hours now.  each node in the cluster has less
than
2gb, so it should be finished by now.

when i do nodetool streams on these nodes i see streams with byte counts
that are never increasing.


2010-04-23 10:08:43,416 ERROR [MESSAGE-STREAMING-POOL:1]
[DebuggableThreadPoolExecutor.java:101] Error in ThreadPoolExecutor
java.lang.RuntimeException: java.net.ConnectException: Connection timed
out
 at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
 at sun.nio.ch.Net.connect(Native Method)
 at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
 at

org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
 at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 ... 3 more
2010-04-23 10:08:43,417 ERROR [MESSAGE-STREAMING-POOL:1]
[CassandraDaemon.java:78] Fatal exception in thread
Thread[MESSAGE-STREAMING-POOL:1,5,main]
java.lang.RuntimeException: java.net.ConnectException: Connection timed
out
 at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection timed out
 at sun.nio.ch.Net.connect(Native Method)
 at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
 at

org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:60)
 at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 ... 3 more






Re: Odd ring problems with 0.5.1

2010-04-23 Thread Anthony Molinaro

On Fri, Apr 23, 2010 at 01:17:21PM -0500, Jonathan Ellis wrote:
> On Fri, Apr 23, 2010 at 1:12 PM, Anthony Molinaro
>  wrote:
> > I'm not sure how it would get this, maybe I need to restart my seed node?
> 
> It's worth a try.  Sounds like you found an unusual bug in gossip.

Damn, restarting the seed, resulted in the seed coming up in a new ring
with 3 nodes which have been decommissioned.  Seems like restarting other
nodes brings them into that ring (or at least the first few seem to be in
the new ring).  I'll restart them all to see if I can't get to a consistent
ring.  You know what might have happened, I changed the ip of the seed host
in my /etc/hosts before starting to decommission, I bet I should have then
restarted everything.  Oh well, hopefully most of my data is still viable.

I do still have all the old sstables lying around, can I just sstable2json
then json2sstable and have it reload them?  Or do the sstables need to be
keyed to the keyrange?  I guess I can sstable2json then create an import
script to insert them via thrift?

> > When I run nodeprobe ring on the seed I don't see any of the hosts I
> > decommissioned, but maybe they are still listed there somewhere?
> 
> 0.5 does leave decommissioned host information in gossip, but I'm not
> sure how that applies to this problem.

I bet that was a red herring, I'm pretty convinced now this was all a
result of me now restarting all the nodes after making a change to the
seed.

-Anthony

-- 

Anthony Molinaro   


Internal error processing describe_keyspace

2010-04-23 Thread Amol Deshpande
Hi,

 

I 'm new to Cassandra, trying to set up a single node to play with.  I
set one up in a VM (0.6.1 off the website) , running fedora 12.  Things
seem peachy in that I can connect to it with a modified hector
ExampleClient, and insert data into it.

 

However, when I decided to view this data from Cassandra-cli,  I got the
above error with a further stacktrace of

 

Java.lang.AssertionError at
org.apache.cassandra.config.Databasedescriptor.getTableMetaData(Database
Descriptor.java:924)

 

 

I can't find anything similar by searching for getTableMetaData or
get_keyspace in the list archives.

 

 

Can someone point me in the right direction to troubleshoot this ?

 

 

Thanks,

-amol



Re: Internal error processing describe_keyspace

2010-04-23 Thread Jonathan Ellis
can you attach the full stacktrace?

On Fri, Apr 23, 2010 at 4:50 PM, Amol Deshpande
 wrote:
> Hi,
>
>
>
> I ‘m new to Cassandra, trying to set up a single node to play with.  I set
> one up in a VM (0.6.1 off the website) , running fedora 12.  Things seem
> peachy in that I can connect to it with a modified hector ExampleClient, and
> insert data into it.
>
>
>
> However, when I decided to view this data from Cassandra-cli,  I got the
> above error with a further stacktrace of
>
>
>
> Java.lang.AssertionError at
> org.apache.cassandra.config.Databasedescriptor.getTableMetaData(DatabaseDescriptor.java:924)
>
>
>
>
>
> I can’t find anything similar by searching for getTableMetaData or
> get_keyspace in the list archives.
>
>
>
>
>
> Can someone point me in the right direction to troubleshoot this ?
>
>
>
>
>
> Thanks,
>
> -amol


RE: Internal error processing describe_keyspace

2010-04-23 Thread Amol Deshpande
Sure,
INFO [COMPACTION-POOL:1] 2010-04-23 14:21:48,973 CompactionManager.java (line 
326) Compacted to 
/home/amol/apache-cassandra-0.6.1/var/lib/cassandra/data/system/LocationInfo-9-Data.db.
  1776/495
 bytes for 2 keys.  Time: 970ms.
ERROR [pool-1-thread-6] 2010-04-23 14:28:46,917 Cassandra.java (line 1812) 
Internal error process
ing describe_keyspace
java.lang.AssertionError
at 
org.apache.cassandra.config.DatabaseDescriptor.getTableMetaData(DatabaseDescriptor.jav
a:924)
at 
org.apache.cassandra.thrift.CassandraServer.describe_keyspace(CassandraServer.java:519
)
at 
org.apache.cassandra.thrift.Cassandra$Processor$describe_keyspace.process(Cassandra.ja
va:1808)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1125)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
 INFO [FLUSH-TIMER] 2010-04-23 15:22:06,484 ColumnFamilyStore.java (line 357) 
LocationInfo has reached its threshold; switching in a fresh Memtable at 
CommitLogContext(file='var/lib/cassandra/commitlog/CommitLog-1272057707351.log',
 position=1225)

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, April 23, 2010 3:09 PM
To: user@cassandra.apache.org
Subject: Re: Internal error processing describe_keyspace

can you attach the full stacktrace?

On Fri, Apr 23, 2010 at 4:50 PM, Amol Deshpande
 wrote:
> Hi,
>
>
>
> I 'm new to Cassandra, trying to set up a single node to play with.  I set
> one up in a VM (0.6.1 off the website) , running fedora 12.  Things seem
> peachy in that I can connect to it with a modified hector ExampleClient, and
> insert data into it.
>
>
>
> However, when I decided to view this data from Cassandra-cli,  I got the
> above error with a further stacktrace of
>
>
>
> Java.lang.AssertionError at
> org.apache.cassandra.config.Databasedescriptor.getTableMetaData(DatabaseDescriptor.java:924)
>
>
>
>
>
> I can't find anything similar by searching for getTableMetaData or
> get_keyspace in the list archives.
>
>
>
>
>
> Can someone point me in the right direction to troubleshoot this ?
>
>
>
>
>
> Thanks,
>
> -amol


Question about TimeUUIDType

2010-04-23 Thread Lucas Di Pentima
Hello,

I'm using Cassandra 0.6.1 with ruby library

I want to log events on a CF like this:

Events = { // CF CompareWith: TimeUUIDType
SomeEventID : { // Row
uuid_from_unix_timestamp : event_data,
...
}
}

I receive event data with a UNIX timestamp (nr of seconds passed since some 
date on 1970), so I would do something like:

db = Cassandra.new('Keyspace')
db.insert('Events', SomeEventID, {SimpleUUID:UUID.new(Time.at(unix_timestamp)} 
=> event_data)

My first question was: What happens if I have more than one event at the same 
second? I tried this on irb console and checked that TimeUUIDs are different.

So, my second question is: How different TimeUUIDs generated from the same UNIX 
timestamp are going to be ordered in the ColumnFamily?

Thanks in advance!!
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lu...@di-pentima.com.ar
MSN: ldipent...@hotmail.com






Re: Question about TimeUUIDType

2010-04-23 Thread Jesse McConnell
try LexicalUUIDType, that will distinguish the secs correctly

imo based on the existing impl (last I checked at least) TimeUUIDType
was equivalent to LongType

cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Fri, Apr 23, 2010 at 17:51, Lucas Di Pentima  wrote:
> Hello,
>
> I'm using Cassandra 0.6.1 with ruby library
>
> I want to log events on a CF like this:
>
> Events = { // CF CompareWith: TimeUUIDType
>    SomeEventID : { // Row
>        uuid_from_unix_timestamp : event_data,
>        ...
>    }
> }
>
> I receive event data with a UNIX timestamp (nr of seconds passed since some 
> date on 1970), so I would do something like:
>
> db = Cassandra.new('Keyspace')
> db.insert('Events', SomeEventID, 
> {SimpleUUID:UUID.new(Time.at(unix_timestamp)} => event_data)
>
> My first question was: What happens if I have more than one event at the same 
> second? I tried this on irb console and checked that TimeUUIDs are different.
>
> So, my second question is: How different TimeUUIDs generated from the same 
> UNIX timestamp are going to be ordered in the ColumnFamily?
>
> Thanks in advance!!
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: lu...@di-pentima.com.ar
> MSN: ldipent...@hotmail.com
>
>
>
>
>


Re: Question about a potential configuration scenario

2010-04-23 Thread banks
just make them one cluster, and use the rackAware logic...

On Fri, Apr 23, 2010 at 7:21 AM, Campbell, Joseph <
joseph_campb...@comcast.com> wrote:

> Question:
>It is possible to setup Cassandra such that 2 independent
> Cassandra rings/clusters replicate to one another, ensuring that each
> ring/cluster has at least 1 copy of all the data on each ring/cluster?
>
> The setup is like this:
>2 Data centers, one in Philadelphia and another in Denver.  In
> each data center there exists a Cassandra ring/cluster.  Each data
> center is being used as a live-live origin (meaning both data centers
> are in use at any point in time).  I would like to be able to guarantee
> that in the event that one or the other of the data centers goes down
> that 'ALL' the available data in that failed data center is also
> available in the other data center such that traffic to the origin
> website that depends on the data can simply be switched over to the
> other site (Using Akamai, or other tools).  Is this type of
> configuration possible/available in Cassandra?  If so how would you set
> it up, and what might some of the draw backs be?
>
> Thanks,
>Joe Campbell
>
>
>
> --
> Anyone can get hit by a MOVING car,
> but it takes skill to get hit by a PARKED car.
> -- Random Tee-shirt on Dysfunction
>
> Joe Campbell | one comcast center | philadelphia, pa 19103 |
> 215.286.5073
>


Re: Question about a potential configuration scenario

2010-04-23 Thread Paul Prescod
http://wiki.apache.org/cassandra/Operations

===

A Cassandra cluster always divides up the key space into ranges
delimited by Tokens as described above, but additional replica
placement is customizable via !IReplicaPlacementStrategy in the
configuration file. The standard strategies are

RackUnawareStrategy: replicas are always placed on the next (in
increasing Token order) N-1 nodes along the ring
RackAwareStrategy: replica 2 is is placed in the first node along the
ring the belongs in another data center than the first; the remaining
N-2 replicas, if any, are placed on the first nodes along the ring in
the same rack as the first
Note that with RackAwareStrategy, succeeding nodes along the ring
should alternate data centers to avoid hot spots. For instance, if you
have nodes A, B, C, and D in increasing Token order, and instead of
alternating you place A and B in DC1, and C and D in DC2, then nodes C
and A will have disproportionately more data on them because they will
be the replica destination for every Token range in the other data
center.

The corollary to this is, if you want to start with a single DC and
add another later, when you add the second DC you should add as many
nodes as you have in the first rather than adding a node or two at a
time gradually.


On Fri, Apr 23, 2010 at 4:17 PM, banks  wrote:
> just make them one cluster, and use the rackAware logic...
>
> On Fri, Apr 23, 2010 at 7:21 AM, Campbell, Joseph
>  wrote:
>>
>> Question:
>>        It is possible to setup Cassandra such that 2 independent
>> Cassandra rings/clusters replicate to one another, ensuring that each
>> ring/cluster has at least 1 copy of all the data on each ring/cluster?
>>
>> The setup is like this:
>>        2 Data centers, one in Philadelphia and another in Denver.  In
>> each data center there exists a Cassandra ring/cluster.  Each data
>> center is being used as a live-live origin (meaning both data centers
>> are in use at any point in time).  I would like to be able to guarantee
>> that in the event that one or the other of the data centers goes down
>> that 'ALL' the available data in that failed data center is also
>> available in the other data center such that traffic to the origin
>> website that depends on the data can simply be switched over to the
>> other site (Using Akamai, or other tools).  Is this type of
>> configuration possible/available in Cassandra?  If so how would you set
>> it up, and what might some of the draw backs be?
>>
>> Thanks,
>>        Joe Campbell
>>
>>
>>
>> --
>> Anyone can get hit by a MOVING car,
>> but it takes skill to get hit by a PARKED car.
>>                     -- Random Tee-shirt on Dysfunction
>>
>> Joe Campbell | one comcast center | philadelphia, pa 19103 |
>> 215.286.5073
>
>


Best way to store millisecond-accurate data

2010-04-23 Thread Andrew Nguyen
Hello,

I am looking to store patient physiologic data in Cassandra - it's being 
collected at rates of 1 to 125 Hz.  I'm thinking of storing the timestamps as 
the column names and the patient/parameter combo as the row key.  For example, 
Bob is in the ICU and is currently having his blood pressure, intracranial 
pressure, and heart rate monitored.  I'd like to collect this with the 
following row keys:

Bob-bloodpressure
Bob-intracranialpressure
Bob-heartrate

The column names would be timestamps but that's where my questions start:

I'm not sure what the best data type and CompareWith would be.  From my 
searching, it sounds like the TimeUUID may be suitable but isn't really 
designed for millisecond accuracy.  My other thought is just to store them as 
strings (2010-04-23 10:23:45.016).  While I space isn't the foremost concern, 
we will be collecting this data 24/7 so we'll be creating many columns over the 
long-term.  

I found https://issues.apache.org/jira/browse/CASSANDRA-16 which states that 
the entire row must fit in memory.  Does this include the values as well as the 
column names?

In considering the limits of cassandra and the best way to model this, we would 
be adding 3.9 billion rows per year (assuming 125 Hz @ 24/7).  However, I can't 
really think of a better way to model this...  So, am I thinking about this all 
wrong or am I on the right track?

Thanks,
Andrew

Re: Best way to store millisecond-accurate data

2010-04-23 Thread Miguel Verde
TimeUUID's time component is measured in 100-nanosecond intervals. The  
library you use might calculate it with poorer accuracy or precision,  
but from a storage/comparison standpoint in Cassandra millisecond data  
is easily captured by it.


One typical way of dealing with the data explosion of sampled time  
series data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure)  
so that you put an upper bound on the row length.


On Apr 23, 2010, at 7:01 PM, Andrew Nguyen > wrote:



Hello,

I am looking to store patient physiologic data in Cassandra - it's  
being collected at rates of 1 to 125 Hz.  I'm thinking of storing  
the timestamps as the column names and the patient/parameter combo  
as the row key.  For example, Bob is in the ICU and is currently  
having his blood pressure, intracranial pressure, and heart rate  
monitored.  I'd like to collect this with the following row keys:


Bob-bloodpressure
Bob-intracranialpressure
Bob-heartrate

The column names would be timestamps but that's where my questions  
start:


I'm not sure what the best data type and CompareWith would be.  From  
my searching, it sounds like the TimeUUID may be suitable but isn't  
really designed for millisecond accuracy.  My other thought is just  
to store them as strings (2010-04-23 10:23:45.016).  While I space  
isn't the foremost concern, we will be collecting this data 24/7 so  
we'll be creating many columns over the long-term.


I found https://issues.apache.org/jira/browse/CASSANDRA-16 which  
states that the entire row must fit in memory.  Does this include  
the values as well as the column names?


In considering the limits of cassandra and the best way to model  
this, we would be adding 3.9 billion rows per year (assuming 125 Hz  
@ 24/7).  However, I can't really think of a better way to model  
this...  So, am I thinking about this all wrong or am I on the right  
track?


Thanks,
Andrew


Re: Best way to store millisecond-accurate data

2010-04-23 Thread Erik Holstad
On Fri, Apr 23, 2010 at 5:54 PM, Miguel Verde wrote:

> TimeUUID's time component is measured in 100-nanosecond intervals. The
> library you use might calculate it with poorer accuracy or precision, but
> from a storage/comparison standpoint in Cassandra millisecond data is easily
> captured by it.
>
> One typical way of dealing with the data explosion of sampled time series
> data is to bucket/shard rows (i.e. Bob-20100423-bloodpressure) so that you
> put an upper bound on the row length.
>
>
> On Apr 23, 2010, at 7:01 PM, Andrew Nguyen <
> andrew-lists-cassan...@ucsfcti.org> wrote:
>
>  Hello,
>>
>> I am looking to store patient physiologic data in Cassandra - it's being
>> collected at rates of 1 to 125 Hz.  I'm thinking of storing the timestamps
>> as the column names and the patient/parameter combo as the row key.  For
>> example, Bob is in the ICU and is currently having his blood pressure,
>> intracranial pressure, and heart rate monitored.  I'd like to collect this
>> with the following row keys:
>>
>> Bob-bloodpressure
>> Bob-intracranialpressure
>> Bob-heartrate
>>
>> The column names would be timestamps but that's where my questions start:
>>
>> I'm not sure what the best data type and CompareWith would be.  From my
>> searching, it sounds like the TimeUUID may be suitable but isn't really
>> designed for millisecond accuracy.  My other thought is just to store them
>> as strings (2010-04-23 10:23:45.016).  While I space isn't the foremost
>> concern, we will be collecting this data 24/7 so we'll be creating many
>> columns over the long-term.
>>
> You could just get an 8 byte millisecond timestamp and store that as a part
of the key

>
>> I found https://issues.apache.org/jira/browse/CASSANDRA-16 which states
>> that the entire row must fit in memory.  Does this include the values as
>> well as the column names?
>>
> Yes. The option is to store one insert per row, you are not going to be
able to do backwards slices this way,  without extra index, but you can
scale mush better.

>
>> In considering the limits of cassandra and the best way to model this, we
>> would be adding 3.9 billion rows per year (assuming 125 Hz @ 24/7).
>>  However, I can't really think of a better way to model this...  So, am I
>> thinking about this all wrong or am I on the right track?
>>
>> Thanks,
>> Andrew
>>
>


-- 
Regards Erik


Re: Odd ring problems with 0.5.1

2010-04-23 Thread Anthony Molinaro
Turns out I needed to shut everything down completely, then start it all up
a rolling restart was still resulting in some nodes being confused about
what ring they were in.

I think the moral of all this, is any changes to the seed node must result
in a full restart of your cluster.  Also any use of removetoken is perilous.

Good news is I'm off of the old nodes, I'll need to figure out a way to
bulk load the data from some of the old sstables, but I think sstable2json
and a quick perl script to load might work out.

Then after that upgrade to 0.6.x

-Anthony

On Fri, Apr 23, 2010 at 02:22:11PM -0700, Anthony Molinaro wrote:
> 
> On Fri, Apr 23, 2010 at 01:17:21PM -0500, Jonathan Ellis wrote:
> > On Fri, Apr 23, 2010 at 1:12 PM, Anthony Molinaro
> >  wrote:
> > > I'm not sure how it would get this, maybe I need to restart my seed node?
> > 
> > It's worth a try.  Sounds like you found an unusual bug in gossip.
> 
> Damn, restarting the seed, resulted in the seed coming up in a new ring
> with 3 nodes which have been decommissioned.  Seems like restarting other
> nodes brings them into that ring (or at least the first few seem to be in
> the new ring).  I'll restart them all to see if I can't get to a consistent
> ring.  You know what might have happened, I changed the ip of the seed host
> in my /etc/hosts before starting to decommission, I bet I should have then
> restarted everything.  Oh well, hopefully most of my data is still viable.
> 
> I do still have all the old sstables lying around, can I just sstable2json
> then json2sstable and have it reload them?  Or do the sstables need to be
> keyed to the keyrange?  I guess I can sstable2json then create an import
> script to insert them via thrift?
> 
> > > When I run nodeprobe ring on the seed I don't see any of the hosts I
> > > decommissioned, but maybe they are still listed there somewhere?
> > 
> > 0.5 does leave decommissioned host information in gossip, but I'm not
> > sure how that applies to this problem.
> 
> I bet that was a red herring, I'm pretty convinced now this was all a
> result of me now restarting all the nodes after making a change to the
> seed.
> 
> -Anthony
> 
> -- 
> 
> Anthony Molinaro   

-- 

Anthony Molinaro   


Re: ORM in Cassandra?

2010-04-23 Thread aXqd
On Sat, Apr 24, 2010 at 1:36 AM, Ned Wolpert  wrote:
> There is nothing wrong with what you are asking. Some work has been done to
> get an ORM layer ontop of cassandra, for example, with a RubyOnRails
> project. I'm trying to simplify cassandra integration with grails with the
> plugin I'm writing.
> The problem is ORM solutions to date are wrapping a relational database.
> (The 'R' in ORM) Cassandra isn't a relational database so it does not map
> cleanly.

Thanks. I noticed this problem before. I just want to know, in the
first place, what exactly is the right way to model relations in
Cassandra(a no-relational database).
So far, I still have those entities, and, without foreign keys, I use
relational entities, which contains the IDs of both sides of
relations.
In some other cases, I just duplicate data, and maintain the relations
manually by updating all the data in the same time.

Is this the right way to go? Or what I am doing is still trying to
convert Cassandra to a RDBMS?

>
> On Fri, Apr 23, 2010 at 1:29 AM, aXqd  wrote:
>>
>> On Fri, Apr 23, 2010 at 3:03 PM, Benoit Perroud 
>> wrote:
>> > I understand the question more like : Is there already a lib which
>> > help to get rid of writing hardcoded and hard to maintain lines like :
>> >
>> > MyClass data;
>> > String[] myFields = {"name", "label", ...}
>> > List columns;
>> > for (String field : myFields) {
>> >    if (field == "name") {
>> >       columns.add(new Column(field, data.getName()))
>> >    } else if (field == "label") {
>> >      columns.add(new Column(field, data.getLabel()))
>> >    } else ...
>> > }
>> > (same for loading (instanciating) automagically the object).
>>
>> Yes, I am talking about this question.
>>
>> >
>> > Kind regards,
>> >
>> > Benoit.
>> >
>> > 2010/4/23 dir dir :
>> >>>So maybe it's weird to combine ORM and Cassandra, right? Is there
>> >>>anything we can take from ORM?
>> >>
>> >> Honestly I do not understand what is your question. It is clear that
>> >> you can not combine ORM such as Hibernate or iBATIS with Cassandra.
>> >> Cassandra it self is not a RDBMS, so you will not map the table into
>> >> the object.
>> >>
>> >> Dir.
>>
>> Sorry, English is not my mother tongue.
>>
>> I do understand I cannot combine ORM with Cassandra, because they are
>> totally different ways for building our data model. But I think there
>> are still something can be learnt from ORM to make Cassandra easier to
>> use, just as what ORM did to RDBMS before.
>>
>> IMHO, domain model is still intact when we design our software, hence
>> we need another way to map them to Cassandra's entity model. Relation
>> does not just go away in this case, hence we need another way to
>> express those relations and have a tool to set up Keyspace /
>> ColumnFamily automatically as what django's SYNCDB does.
>>
>> According to my limited experience with Cassandra, now, we do more
>> when we write, and less when we read/query. Hence I think the problem
>> lies exactly in how we duplicate our data to do queries.
>>
>> Please correct me if I got these all wrong.
>>
>> >>
>> >> On Fri, Apr 23, 2010 at 12:12 PM, aXqd  wrote:
>> >>>
>> >>> Hi, all:
>> >>>
>> >>> I know many people regard O/R Mapping as rubbish. However it is
>> >>> undeniable that ORM is quite easy to use in most simple cases,
>> >>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
>> >>> No-Relational solution.
>> >>> So maybe it's weird to combine ORM and Cassandra, right? Is there
>> >>> anything we can take from ORM?
>> >>> I just hate to write CRUD functions/Data layer for each object in even
>> >>> a disposable prototype program.
>> >>>
>> >>> Regards.
>> >>> -Tian
>> >>
>> >>
>> >
>
>
>
> --
> Virtually, Ned Wolpert
>
> "Settle thy studies, Faustus, and begin..."   --Marlowe
>


Re: Trove maps

2010-04-23 Thread Tatu Saloranta
On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez
 wrote:
> I will try to modify the code... what I like about Trove is that even for 
> regular maps (non primitive) there are no Entry objects created so there are 
> much less references to be gced

This could help, but how is iteration then handled? Are Map.Entry
instances created (and discarded) during iteration? (which could be a
net loss in some cases -- or maybe not, it's short-lived garbage vs
long-lived one if as part of long-living Map).
Just curious,

-+ Tatu +-


RE: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Stu Hood
Your keys cannot be an encoded as binary for OPP, since Cassandra will attempt 
to decode them as UTF-8, meaning that they may not come back in the same format.

0.7 supports byte keys using the ByteOrderedPartitioner, and tokens are 
specified using hex.

-Original Message-
From: "Mark Jones" 
Sent: Friday, April 23, 2010 10:55am
To: "user@cassandra.apache.org" 
Subject: RE: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

So if my keys are binary, is there any way to escape the keysequence in?

I have 20 bytes (any value 0x0-0xff is possible) as the key.

Are they compared as an array of bytes?  So that I can use truncation?

4 nodes, broken up by 0x00, 0x40, 0x80, 0xC0?


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Friday, April 23, 2010 10:22 AM
To: user@cassandra.apache.org
Subject: Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

a normal String from the same universe as your keys.

On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones  wrote:
> How is this specified?
>
> Is it a large hex #?
>
> A string of bytes in hex?
>
>
>
> http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.