Re: Non-latin implementation

2011-02-25 Thread Sasha Dolgy
Hi AJ,

I am storing simplified chinese data  in columns without any issues at
the moment

萨莎

I can retrieve the data, but haven't tried secondary indexes or
something a bit more advanced yet

-sd


On Thu, Feb 24, 2011 at 5:21 PM, A J  wrote:
> Hello,
> Have there been Cassandra implementations in non-latin languages. In
> particular: Mandarin (China) ,Devanagari (India), Korean (Korea)
> I am interested in finding if there are storage, sorting or other
> types of issues one should be aware of in these languages.
>
> Thanks.
>



-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Oleg Anastasyev
Sylvain Lebresne  datastax.com> writes:

> However, if that simple conflict detection/resolution mechanism is not good 
enough for some of your use case and you need to keep two concurrent updates, 
it 
is easy enough. Just make sure that the update don't end up in the same column. 
This is easily achieved by appending some unique identifier to the column name 
for instance. And when reading, do a slice and reconcile whatever you get back 
with whatever logic make sense. If you do that, congrats, you've roughly 
emulated what vector clocks would do. Btw, no locking or anything needed.

This solution is (much?) worse, than having vector clocks. It multiplies the 
amount of data and load to your system, forcing you to throw more nodes to the 
cluster, because:
* Number of columns at least doubles. Or even worse, if you cannot predict 
number of simultaneous processes accessing the same column, because you need 
then to add unique postfixes to columns of each of update, making them 
efficiently not updates, but inserts. If you have dataset, which updates often, 
you'll multiply number of columns and, so, the data size, by number of updates 
to your dataset. 
* These columns with uniq postfixes need to be merged somehow. Cassandra has 
nice background merge facility - named compaction - but it cannot work on such 
dataset, becase there is nothing to compact - every column is unique and has no 
overwritten generation.
* So, anyway, merge must be done - because logically this is still single 
column. And the only way is to read all columns with some prefix using 
get_slice 
call and resolve conflicts manually, returning freshest copy to client and 
deteling obsolete data. This makes app code complex, triggers additional load 
on 
cassandra cluster (it must do RR for several columns now instead of 1), 
triggers 
additional operations  (deletes of obsolete values).
* And finally, deleting obsolete data actually dont free space for 
GCPeriodTime. 
So your disks will be full, storing obsolete data for prolonged time.

In contrast, having vector clocks is more effective solution. It does not 
duplicates column names and values several times, it duplicates only timestamp 
by the number of your RF. And your logically single column is handled as 
single. 







Running multiple compactions concurrently

2011-02-25 Thread Daniel Josefsson
We experienced the java.lang.NegativeArraySizeException when upgrading to
0.7.2 in staging. The proposed solution (running compaction) seems to have
solved this. However it took a lot of time to run.

Is it safe to invoke a major compaction on all of the machines concurrently?

I can't see a reason why it wouldn't, but I want to be sure :)

Thanks,
Daniel


Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
 I am suggesting that your probably want to rethink your scheme design

> since partitioning by year is going to be bad performance since the
> old servers are going to be nothing more then expensive tape drives.
>

You fail to see the obvious

It is just the fact that most of the data is stale that makes the question
interesting in the first place, and I would obviously not have asked if
there would be an I/O throughput problem in doing this.

Now, when that is said, we tested a repair on a set of nodes that was 70-80%
full and no luck. Ran out of disk :(

Terje


Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
>
>
> @Thibaut Britz
> Caveat:Using simple strategy.
> This works because cassandra scans data at startup and then serves
> what it finds. For a join for example you can rsync all the data from
> the node below/to the right of where the new node is joining. Then
> join without bootstrap then cleanup both nodes. (also you have to
> shutdown the first node so you do not have a lost write scenario in
> the time between rsync and new node startup)
>
>
rsync all data from node to left/right..
Wouldn't that mean that you need 2x the data to recover...?

Terje


Re: Running multiple compactions concurrently

2011-02-25 Thread Gary Dusbabek
If your cluster has the overall IO capacity to perform a simultaneous
compaction on every node and still adequately service reads and
writes, then yes.  If you're concerned about availability, your best
bet will be to stagger the compactions.

Gary.


On Fri, Feb 25, 2011 at 04:24, Daniel Josefsson  wrote:
> We experienced the java.lang.NegativeArraySizeException when upgrading to
> 0.7.2 in staging. The proposed solution (running compaction) seems to have
> solved this. However it took a lot of time to run.
>
> Is it safe to invoke a major compaction on all of the machines concurrently?
>
> I can't see a reason why it wouldn't, but I want to be sure :)
>
> Thanks,
> Daniel
>
>


Re: Running multiple compactions concurrently

2011-02-25 Thread Daniel Josefsson
The compaction will be part of a Cassandra upgrade (where all nodes will
have to be taken down), so no clients will be hitting it until the
upgrade is complete. I just want to minimize the downtime.

Thanks, this is basically what I wanted to hear.

Daniel

On Fri, 2011-02-25 at 13:24 +, Gary Dusbabek wrote:

> If your cluster has the overall IO capacity to perform a simultaneous
> compaction on every node and still adequately service reads and
> writes, then yes.  If you're concerned about availability, your best
> bet will be to stagger the compactions.
> 
> Gary.
> 
> 
> On Fri, Feb 25, 2011 at 04:24, Daniel Josefsson  wrote:
> > We experienced the java.lang.NegativeArraySizeException when upgrading to
> > 0.7.2 in staging. The proposed solution (running compaction) seems to have
> > solved this. However it took a lot of time to run.
> >
> > Is it safe to invoke a major compaction on all of the machines concurrently?
> >
> > I can't see a reason why it wouldn't, but I want to be sure :)
> >
> > Thanks,
> > Daniel
> >
> >
> 
> __
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> __


-- 


Daniel Josefsson

Software Engineer

Shazam Entertainment Ltd 
26-28 Hammersmith Grove, London W6 7HA
w: www.shazam.com 

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA.  




__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread A J
He has a product to sell, so you can expect some advertising. But in
general, Stonebraker's articles are very deep (another one that
challenges general conceptions is
http://voltdb.com/voltdb-webinar-sql-urban-myths ) . He is the creator
of Postgres and considered a guru in databases by many.
And actually if you cannot let go of ACID and not satisfied with
traditional DBMS solutions, voltdb is worth considering. It ofcourse
solves a different problem(oltp) than what Cassandra does.


On Thu, Feb 24, 2011 at 5:20 PM, Edward Capriolo  wrote:
> On Thu, Feb 24, 2011 at 3:56 PM, A J  wrote:
>> While we are at it, there's more to consider than just CAP in distributed :)
>> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>>
>> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo  
>> wrote:
>>> On Thu, Feb 24, 2011 at 3:03 PM, A J  wrote:
 yes, that is difficult to digest and one has to be sure if the use
 case can afford it.

 Some other NOSQL databases deals with it differently (though I don't
 think any of them use atomic 2-phase commit). MongoDB for example will
 ask you to read from the node you wrote first (primary node) unless
 you are ok with eventual consistency. If the write did not make to
 majority of other nodes, it will be rolled-back from the original
 primary when it comes up again as a secondary.
 In some cases, you still could server either new value (that was
 returned as failed) or the old one. But it is different from Cassandra
 in the sense that Cassandra will never rollback.



 On Thu, Feb 24, 2011 at 2:47 PM, Anthony John  
 wrote:
> The leap of faith here is that an error does not mean a clean backing out 
> to
> prior state - as we are used to with databases. It means that the 
> operation
> in error could have gone through partially
>
> Again, this is not an absolutely unfamiliar territory and can be dealt 
> with.
> -JA
> On Thu, Feb 24, 2011 at 1:16 PM, A J  wrote:
>>
>> >>but could be broken in case of a failed write<<
>> You can think of a scenario where R + W >N still leads to
>> inconsistency even for successful writes. Say you keep W=1 and R=N .
>> Lets say the one node where a write happened with success goes down
>> before it made to the other N-1 nodes. Lets say it goes down for good
>> and is unrecoverable. The only option is to build a new node from
>> scratch from other active nodes. This will lead to a write that was
>> lost and you will end up serving stale copy of it.
>>
>> It is better to talk in terms of use cases and if cassandra will be a
>> fit for it. Otherwise unless you have W=R=N and fsync before each
>> write commit, there will be scope for inconsistency.
>>
>>
>> On Thu, Feb 24, 2011 at 1:25 PM, Anthony John 
>> wrote:
>> > I see the point - apologies for putting everyone through this!
>> > It was just militating against my mental model.
>> > In summary, here is my take away - simple stuff but - IMO - important 
>> > to
>> > conclude this thread (I hope):-
>> > 1. I was splitting hair over a failed ( partial ) Q Write. Such an 
>> > event
>> > should be immediately followed by the same write going to a connection
>> > on to
>> > another node ( potentially using connection caches of client
>> > implementations
>> > ) or a Read at CL of All. Because a write could have partially gone
>> > through.
>> > 2. Timestamps are used in determining the latest version ( correcting
>> > the
>> > false impression I was propagating)
>> > Finally, wrt "W + R > N for Q CL statement" holds, but could be broken
>> > in
>> > case of a failed write as it is unsure whether the new value got 
>> > written
>> > on
>> >  any server or not. Is that a fair characterization ?
>> > Bottom line - unlike traditional DBMS, errors do not ensure automatic
>> > cleanup and revert back, app code has to follow up if  immediate - and
>> > not
>> > eventual -  consistency is desired. I made that leap in almost all 
>> > cases
>> > - I
>> > think - but the case of a failed write.
>> > My bad and I can live with this!
>> > Regards,
>> > -JA
>> >
>> > On Thu, Feb 24, 2011 at 11:50 AM, Sylvain Lebresne
>> > 
>> > wrote:
>> >>
>> >> On Thu, Feb 24, 2011 at 6:33 PM, Anthony John 
>> >> wrote:
>> >>>
>> >>> Completely understand!
>> >>> All that I am quibbling over is whether a CL of quorum guarantees
>> >>> consistency or not. That is what the documentation says - right. IF
>> >>> for a CL
>> >>> of Q read - it depends on which node returns read first to determine
>> >>> the
>> >>> actual returned result or other more convoluted conditions , then a
>> >>> Quorum
>> >>> read/write is not consistent, by any definition.

Re: Changing comparators

2011-02-25 Thread Jonathan Ellis
Compaction assumes that the sstables it has as input are ordered
correctly (otherwise it would have to read the full row into memory to
re-sort).  So it would have to be a new operation, and not feasible in
general for larger-than-memory rows.  I don't think we'll ever add
this.

On Wed, Feb 23, 2011 at 6:23 PM, Narendra Sharma
 wrote:
> Today it is not possible to change the comparators (compare_with and
> compare_subcolumns_with). I went through the discussion on thread
> http://comments.gmane.org/gmane.comp.db.cassandra.user/12466.
>
> Does it make sense to atleast allow one way change i.e. from specific types
> to generic type? For eg change from TimeUUIDType or UTF8 to BytesType. This
> could be a manual process where users will do the schema change and then run
> major compaction on all the nodes to fix the ordering.
>
> Thanks,
> Naren
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: A simple script that creates multi node clusters on a single machine.

2011-02-25 Thread Jonathan Ellis
Nice!

On Wed, Feb 23, 2011 at 9:06 PM, Edward Capriolo  wrote:
> On the mailing list and IRC there are many questions about Cassandra
> internals. I understand where the questions are coming from because it
> took me a while to get a grip on it.
>
> However if you have a laptop with a descent amount of RAM 2 GB is
> enough for 3-5 nodes, (4GB is better). You can kick up a multi-node
> cluster right on your laptop. Then you can test failure/eventual
> consistent scenarios such as (insert to node A, kill node B, join node
> C) till your hearts content.
>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: losing connection to Cassandra

2011-02-25 Thread Jonathan Ellis
You should upgrade before wasting time troubleshooting such an old install.

On Thu, Feb 24, 2011 at 8:45 AM, Tomer B  wrote:
> Hi
> i'm using a 3 node cluster of cassandra 0.6.1 together with hector as api to
> java client.
> every few days I get a situation where I cannot connect to cassandra, other
> than that the data dir is filling up the whole disk space and the
> synchronization stops at these times, the exceptions I get are as following:
> Happended 3386 in 24H: POOL EXHAUSTED: 02:00:30.225 [MyThread[5]]: Unable to
> connect to cassandra node xxx.xx.xx.32:9160 will try the next node, Pool
> exhausted
> Happened 6848 in 24H CONNECTION REFUSED: 06:14:48.598 [MyThread[4]]: Unable
> to connect to cassandra node xxx.xx.xx.30:9160 will try the next node,
> Unable to open transport to xxx.xx.xx.30:9160 , java.net.ConnectException:
> Connection refused: connect
> Happened 84 times in 24H: NULL OUTPUTSTREAM: 06:14:48.504 [MyThread[2]]:
> async execution fail, Cannot write to null outputStream
> Happened 14 times in 24H: CONNECTION TIMED OUT: 06:15:08.019 [MyThread[0]]:
> Unable to connect to cassandra node xxx.xx.xx.31:9160 will try the next
> node, Unable to open transp ort to xxx.xx.xx.31:9160 ,
> java.net.ConnectException: Connection timed out: connect
> Can anyone assist or suggest what could be the problem? note that the node
> is funcioning well and it happens once every few days.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Exception in thread "main" java.lang.NoClassDefFoundError

2011-02-25 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/RunningCassandra may be useful, but
really you should be using the debian package:
http://wiki.apache.org/cassandra/DebianPackaging

2011/2/24 ko...@vivinavi.com :
> Hi everyone
>
> I am new to JAVA and Cassandra.
> I just get started to install Cassandra.
> My Machine is Debian 5.0.6.
> I installed jdk1.6.0_24 to /usr/local
> java -version is as following.
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
> javac -J-version is as following.
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)
>
> and then I installed apache-cassandra-0.6.12 to /user/local
>
> I add the following PATH on /etc/profile
> #for Java
> export JAVA_HOME="/usr/local/java"
> export CLASSPATH=".:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar;"
> export PATH="$JAVA_HOME/bin:$PATH"
>
> #for Java VM
> export JVM_OPTS="-Xmx1G -Xms512M -Xss256K"
>
> #for Cassandra
> export CASSANDRA_HOME="/usr/local/cassandra/bin"
> export CASSANDRA_CONF="/usr/local/cassandra/conf"
> export
> CASSANDRA_MAIN="/usr/local/cassandra/javadoc/org/apache/cassandra/thrift/CassandraDaemon.html"
> export CASSANDRA_INCLUDE="/usr/local/cassandra/bin/cassandra.in.sh"
> export PATH="$PATH:/usr/local/cassandra/bin"
>
> I did source /etc/profile.
> And checked $JAVA_HOME,$CLASS_PATH,$CASSANDRA_HOME etc.
>
> And then I started /usr/local/cassandra/bin/cassandra -f
> However I met the following Error message.
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> /usr/local/cassandra/javadoc/org/apache/cassandra/thrift/CassandraDaemon
> Caused by: java.lang.ClassNotFoundException:
> .usr.local.cassandra.javadoc.org.apache.cassandra.thrift.CassandraDaemon
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> Could not find the main class:
> .usr.local.cassandra.javadoc.org.apache.cassandra.thrift.CassandraDaemon. 
> Program
> will exit.
>
> I don't know what's wrong?
> I don't know what to do to solve this problem.
> I searched this error message and then found it but mostly for Win not
> Linux.
> My classpath is wrong? I can find only many html(inc.
> CassandraDaemon.html) files
> at /usr/local/cassandra/javadoc/org/apache/cassandra/thrift/.
> Is this OK?
> if my classpath is wrong , what is a correct path? (I can't find
> CassandraDaemon.java)
>
> Please advise me to solve this problem.
> Thank you for your help in advance.
>
> Best Regards
> Mac Kondo
>
> --
> *
> Mamoru Kondo
> Vivid Navigation,Inc.
> http://www.vivinavi.com
> ko...@vivinavi.com
> *
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jonathan Ellis
That article is heavily biased by "I am selling a competitor to Cassandra."

First, read Coda's original piece if you haven't:
http://codahale.com/you-cant-sacrifice-partition-tolerance/

Then, Jeff Darcy's response: http://pl.atyp.us/wordpress/?p=3110

On Thu, Feb 24, 2011 at 2:56 PM, A J  wrote:
> While we are at it, there's more to consider than just CAP in distributed :)
> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>
> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo  
> wrote:
>> On Thu, Feb 24, 2011 at 3:03 PM, A J  wrote:
>>> yes, that is difficult to digest and one has to be sure if the use
>>> case can afford it.
>>>
>>> Some other NOSQL databases deals with it differently (though I don't
>>> think any of them use atomic 2-phase commit). MongoDB for example will
>>> ask you to read from the node you wrote first (primary node) unless
>>> you are ok with eventual consistency. If the write did not make to
>>> majority of other nodes, it will be rolled-back from the original
>>> primary when it comes up again as a secondary.
>>> In some cases, you still could server either new value (that was
>>> returned as failed) or the old one. But it is different from Cassandra
>>> in the sense that Cassandra will never rollback.
>>>
>>>
>>>
>>> On Thu, Feb 24, 2011 at 2:47 PM, Anthony John  wrote:
 The leap of faith here is that an error does not mean a clean backing out 
 to
 prior state - as we are used to with databases. It means that the operation
 in error could have gone through partially

 Again, this is not an absolutely unfamiliar territory and can be dealt 
 with.
 -JA
 On Thu, Feb 24, 2011 at 1:16 PM, A J  wrote:
>
> >>but could be broken in case of a failed write<<
> You can think of a scenario where R + W >N still leads to
> inconsistency even for successful writes. Say you keep W=1 and R=N .
> Lets say the one node where a write happened with success goes down
> before it made to the other N-1 nodes. Lets say it goes down for good
> and is unrecoverable. The only option is to build a new node from
> scratch from other active nodes. This will lead to a write that was
> lost and you will end up serving stale copy of it.
>
> It is better to talk in terms of use cases and if cassandra will be a
> fit for it. Otherwise unless you have W=R=N and fsync before each
> write commit, there will be scope for inconsistency.
>
>
> On Thu, Feb 24, 2011 at 1:25 PM, Anthony John 
> wrote:
> > I see the point - apologies for putting everyone through this!
> > It was just militating against my mental model.
> > In summary, here is my take away - simple stuff but - IMO - important to
> > conclude this thread (I hope):-
> > 1. I was splitting hair over a failed ( partial ) Q Write. Such an event
> > should be immediately followed by the same write going to a connection
> > on to
> > another node ( potentially using connection caches of client
> > implementations
> > ) or a Read at CL of All. Because a write could have partially gone
> > through.
> > 2. Timestamps are used in determining the latest version ( correcting
> > the
> > false impression I was propagating)
> > Finally, wrt "W + R > N for Q CL statement" holds, but could be broken
> > in
> > case of a failed write as it is unsure whether the new value got written
> > on
> >  any server or not. Is that a fair characterization ?
> > Bottom line - unlike traditional DBMS, errors do not ensure automatic
> > cleanup and revert back, app code has to follow up if  immediate - and
> > not
> > eventual -  consistency is desired. I made that leap in almost all cases
> > - I
> > think - but the case of a failed write.
> > My bad and I can live with this!
> > Regards,
> > -JA
> >
> > On Thu, Feb 24, 2011 at 11:50 AM, Sylvain Lebresne
> > 
> > wrote:
> >>
> >> On Thu, Feb 24, 2011 at 6:33 PM, Anthony John 
> >> wrote:
> >>>
> >>> Completely understand!
> >>> All that I am quibbling over is whether a CL of quorum guarantees
> >>> consistency or not. That is what the documentation says - right. IF
> >>> for a CL
> >>> of Q read - it depends on which node returns read first to determine
> >>> the
> >>> actual returned result or other more convoluted conditions , then a
> >>> Quorum
> >>> read/write is not consistent, by any definition.
> >>
> >> But that's the point. The definition of consistency we are talking
> >> about
> >> has no meaning if you consider only a quorum read. The definition
> >> (which is
> >> the de facto definition of consistency in 'eventually consistent') make
> >> sense if we talk about a write followed by a read. And it is
> >> considering succeeding write followed by succeeding read.
> >> And that is the sta

Re: Fill disks more than 50%

2011-02-25 Thread Edward Capriolo
On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen
 wrote:
>>
>> @Thibaut Britz
>> Caveat:Using simple strategy.
>> This works because cassandra scans data at startup and then serves
>> what it finds. For a join for example you can rsync all the data from
>> the node below/to the right of where the new node is joining. Then
>> join without bootstrap then cleanup both nodes. (also you have to
>> shutdown the first node so you do not have a lost write scenario in
>> the time between rsync and new node startup)
>>
>
> rsync all data from node to left/right..
> Wouldn't that mean that you need 2x the data to recover...?
> Terje

Terje,

In your scenario where you are never updating running repair becomes
less important. I have an alternative for you. I have a program I call
the "RescueRanger" we use it to range-scan all our data, find old
entries and then delete them. However if we set that program to "read
only mode" and tell it to read at CL.ALL, It becomes a program that
read repairs data!

This is a tradeoff. Range scanning though all your data is not fast,
but it does not require the extra disk space. Kinda like merge sort vs
bubble sort.


Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread A J
Though you are not really implying that, I am not selling anything. I
don't work for VoltDB. I had other issues for my use case with the
software when I was evaluating it (their claim of durability is weak
according to me. Though it does not matter I'd rather they call
themselves NOSQL. they just give lip-service to SQL)
I'd rather not drink any sort of kool-aid, get all sides (whatever the
motive of the sides be) and be the judge myself for what I want to do.

The thread was by someone who seems to be having difficulty wrapping
head around the gives and takes of cassandra. maybe something else is
better for their use case.

Peace :)


On Fri, Feb 25, 2011 at 10:39 AM, Jonathan Ellis  wrote:
> That article is heavily biased by "I am selling a competitor to Cassandra."
>
> First, read Coda's original piece if you haven't:
> http://codahale.com/you-cant-sacrifice-partition-tolerance/
>
> Then, Jeff Darcy's response: http://pl.atyp.us/wordpress/?p=3110
>
> On Thu, Feb 24, 2011 at 2:56 PM, A J  wrote:
>> While we are at it, there's more to consider than just CAP in distributed :)
>> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>>
>> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo  
>> wrote:
>>> On Thu, Feb 24, 2011 at 3:03 PM, A J  wrote:
 yes, that is difficult to digest and one has to be sure if the use
 case can afford it.

 Some other NOSQL databases deals with it differently (though I don't
 think any of them use atomic 2-phase commit). MongoDB for example will
 ask you to read from the node you wrote first (primary node) unless
 you are ok with eventual consistency. If the write did not make to
 majority of other nodes, it will be rolled-back from the original
 primary when it comes up again as a secondary.
 In some cases, you still could server either new value (that was
 returned as failed) or the old one. But it is different from Cassandra
 in the sense that Cassandra will never rollback.



 On Thu, Feb 24, 2011 at 2:47 PM, Anthony John  
 wrote:
> The leap of faith here is that an error does not mean a clean backing out 
> to
> prior state - as we are used to with databases. It means that the 
> operation
> in error could have gone through partially
>
> Again, this is not an absolutely unfamiliar territory and can be dealt 
> with.
> -JA
> On Thu, Feb 24, 2011 at 1:16 PM, A J  wrote:
>>
>> >>but could be broken in case of a failed write<<
>> You can think of a scenario where R + W >N still leads to
>> inconsistency even for successful writes. Say you keep W=1 and R=N .
>> Lets say the one node where a write happened with success goes down
>> before it made to the other N-1 nodes. Lets say it goes down for good
>> and is unrecoverable. The only option is to build a new node from
>> scratch from other active nodes. This will lead to a write that was
>> lost and you will end up serving stale copy of it.
>>
>> It is better to talk in terms of use cases and if cassandra will be a
>> fit for it. Otherwise unless you have W=R=N and fsync before each
>> write commit, there will be scope for inconsistency.
>>
>>
>> On Thu, Feb 24, 2011 at 1:25 PM, Anthony John 
>> wrote:
>> > I see the point - apologies for putting everyone through this!
>> > It was just militating against my mental model.
>> > In summary, here is my take away - simple stuff but - IMO - important 
>> > to
>> > conclude this thread (I hope):-
>> > 1. I was splitting hair over a failed ( partial ) Q Write. Such an 
>> > event
>> > should be immediately followed by the same write going to a connection
>> > on to
>> > another node ( potentially using connection caches of client
>> > implementations
>> > ) or a Read at CL of All. Because a write could have partially gone
>> > through.
>> > 2. Timestamps are used in determining the latest version ( correcting
>> > the
>> > false impression I was propagating)
>> > Finally, wrt "W + R > N for Q CL statement" holds, but could be broken
>> > in
>> > case of a failed write as it is unsure whether the new value got 
>> > written
>> > on
>> >  any server or not. Is that a fair characterization ?
>> > Bottom line - unlike traditional DBMS, errors do not ensure automatic
>> > cleanup and revert back, app code has to follow up if  immediate - and
>> > not
>> > eventual -  consistency is desired. I made that leap in almost all 
>> > cases
>> > - I
>> > think - but the case of a failed write.
>> > My bad and I can live with this!
>> > Regards,
>> > -JA
>> >
>> > On Thu, Feb 24, 2011 at 11:50 AM, Sylvain Lebresne
>> > 
>> > wrote:
>> >>
>> >> On Thu, Feb 24, 2011 at 6:33 PM, Anthony John 
>> >> wrote:
>> >>>
>> >>> Completely understand!
>> >

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
Yeah - no worries - I don't think anyone was thinking you were trying to drink 
kool-aid or selling anything.  Jonathan was just pointing out thoughtful 
replies to his claims.

This past year, Michael Stonebraker with voltdb and other things seems to have 
tried to take advantage of momentum behind systems like cassandra (as well as 
the backlash against nosql) to make pretty bold claims, especially when 
considering that volt is an in memory database.  So 1) he's kind of been using 
his pedigree as credibility in selling a new product and 2) the voltdb 
marketing department makes heavy use of buzz words and hyperbole.

Nothing wrong with voltdb necessarily, it probably has its uses.  However, the 
way it's been pitched by the company and by Stonebraker in particular seems 
disingenuous, self-serving, and to me has very much tarnished his reputation as 
an objective luminary in the field of computer science.

Maybe I'm taking that too far, but now every time I hear a statement by him, I 
have a grain of salt at the ready.

On Feb 25, 2011, at 10:21 AM, A J wrote:

> Though you are not really implying that, I am not selling anything. I
> don't work for VoltDB. I had other issues for my use case with the
> software when I was evaluating it (their claim of durability is weak
> according to me. Though it does not matter I'd rather they call
> themselves NOSQL. they just give lip-service to SQL)
> I'd rather not drink any sort of kool-aid, get all sides (whatever the
> motive of the sides be) and be the judge myself for what I want to do.
> 
> The thread was by someone who seems to be having difficulty wrapping
> head around the gives and takes of cassandra. maybe something else is
> better for their use case.
> 
> Peace :)
> 
> 
> On Fri, Feb 25, 2011 at 10:39 AM, Jonathan Ellis  wrote:
>> That article is heavily biased by "I am selling a competitor to Cassandra."
>> 
>> First, read Coda's original piece if you haven't:
>> http://codahale.com/you-cant-sacrifice-partition-tolerance/
>> 
>> Then, Jeff Darcy's response: http://pl.atyp.us/wordpress/?p=3110
>> 
>> On Thu, Feb 24, 2011 at 2:56 PM, A J  wrote:
>>> While we are at it, there's more to consider than just CAP in distributed :)
>>> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>>> 
>>> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo  
>>> wrote:
 On Thu, Feb 24, 2011 at 3:03 PM, A J  wrote:
> yes, that is difficult to digest and one has to be sure if the use
> case can afford it.
> 
> Some other NOSQL databases deals with it differently (though I don't
> think any of them use atomic 2-phase commit). MongoDB for example will
> ask you to read from the node you wrote first (primary node) unless
> you are ok with eventual consistency. If the write did not make to
> majority of other nodes, it will be rolled-back from the original
> primary when it comes up again as a secondary.
> In some cases, you still could server either new value (that was
> returned as failed) or the old one. But it is different from Cassandra
> in the sense that Cassandra will never rollback.
> 
> 
> 
> On Thu, Feb 24, 2011 at 2:47 PM, Anthony John  
> wrote:
>> The leap of faith here is that an error does not mean a clean backing 
>> out to
>> prior state - as we are used to with databases. It means that the 
>> operation
>> in error could have gone through partially
>> 
>> Again, this is not an absolutely unfamiliar territory and can be dealt 
>> with.
>> -JA
>> On Thu, Feb 24, 2011 at 1:16 PM, A J  wrote:
>>> 
> but could be broken in case of a failed write<<
>>> You can think of a scenario where R + W >N still leads to
>>> inconsistency even for successful writes. Say you keep W=1 and R=N .
>>> Lets say the one node where a write happened with success goes down
>>> before it made to the other N-1 nodes. Lets say it goes down for good
>>> and is unrecoverable. The only option is to build a new node from
>>> scratch from other active nodes. This will lead to a write that was
>>> lost and you will end up serving stale copy of it.
>>> 
>>> It is better to talk in terms of use cases and if cassandra will be a
>>> fit for it. Otherwise unless you have W=R=N and fsync before each
>>> write commit, there will be scope for inconsistency.
>>> 
>>> 
>>> On Thu, Feb 24, 2011 at 1:25 PM, Anthony John 
>>> wrote:
 I see the point - apologies for putting everyone through this!
 It was just militating against my mental model.
 In summary, here is my take away - simple stuff but - IMO - important 
 to
 conclude this thread (I hope):-
 1. I was splitting hair over a failed ( partial ) Q Write. Such an 
 event
 should be immediately followed by the same write going to a connection
 on to
 another node ( potentia

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
And everyone has a bias - and I think most people working with any of these 
solutions realizes that.

I think it's interesting how many organizations use multiple data storage 
solutions versus just using one as they have different capabilities - like the 
recent Netflix news about using different data stores for different reasons.

On Feb 25, 2011, at 10:21 AM, A J wrote:

> Though you are not really implying that, I am not selling anything. I
> don't work for VoltDB. I had other issues for my use case with the
> software when I was evaluating it (their claim of durability is weak
> according to me. Though it does not matter I'd rather they call
> themselves NOSQL. they just give lip-service to SQL)
> I'd rather not drink any sort of kool-aid, get all sides (whatever the
> motive of the sides be) and be the judge myself for what I want to do.
> 
> The thread was by someone who seems to be having difficulty wrapping
> head around the gives and takes of cassandra. maybe something else is
> better for their use case.
> 
> Peace :)
> 
> 
> On Fri, Feb 25, 2011 at 10:39 AM, Jonathan Ellis  wrote:
>> That article is heavily biased by "I am selling a competitor to Cassandra."
>> 
>> First, read Coda's original piece if you haven't:
>> http://codahale.com/you-cant-sacrifice-partition-tolerance/
>> 
>> Then, Jeff Darcy's response: http://pl.atyp.us/wordpress/?p=3110
>> 
>> On Thu, Feb 24, 2011 at 2:56 PM, A J  wrote:
>>> While we are at it, there's more to consider than just CAP in distributed :)
>>> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>>> 
>>> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo  
>>> wrote:
 On Thu, Feb 24, 2011 at 3:03 PM, A J  wrote:
> yes, that is difficult to digest and one has to be sure if the use
> case can afford it.
> 
> Some other NOSQL databases deals with it differently (though I don't
> think any of them use atomic 2-phase commit). MongoDB for example will
> ask you to read from the node you wrote first (primary node) unless
> you are ok with eventual consistency. If the write did not make to
> majority of other nodes, it will be rolled-back from the original
> primary when it comes up again as a secondary.
> In some cases, you still could server either new value (that was
> returned as failed) or the old one. But it is different from Cassandra
> in the sense that Cassandra will never rollback.
> 
> 
> 
> On Thu, Feb 24, 2011 at 2:47 PM, Anthony John  
> wrote:
>> The leap of faith here is that an error does not mean a clean backing 
>> out to
>> prior state - as we are used to with databases. It means that the 
>> operation
>> in error could have gone through partially
>> 
>> Again, this is not an absolutely unfamiliar territory and can be dealt 
>> with.
>> -JA
>> On Thu, Feb 24, 2011 at 1:16 PM, A J  wrote:
>>> 
> but could be broken in case of a failed write<<
>>> You can think of a scenario where R + W >N still leads to
>>> inconsistency even for successful writes. Say you keep W=1 and R=N .
>>> Lets say the one node where a write happened with success goes down
>>> before it made to the other N-1 nodes. Lets say it goes down for good
>>> and is unrecoverable. The only option is to build a new node from
>>> scratch from other active nodes. This will lead to a write that was
>>> lost and you will end up serving stale copy of it.
>>> 
>>> It is better to talk in terms of use cases and if cassandra will be a
>>> fit for it. Otherwise unless you have W=R=N and fsync before each
>>> write commit, there will be scope for inconsistency.
>>> 
>>> 
>>> On Thu, Feb 24, 2011 at 1:25 PM, Anthony John 
>>> wrote:
 I see the point - apologies for putting everyone through this!
 It was just militating against my mental model.
 In summary, here is my take away - simple stuff but - IMO - important 
 to
 conclude this thread (I hope):-
 1. I was splitting hair over a failed ( partial ) Q Write. Such an 
 event
 should be immediately followed by the same write going to a connection
 on to
 another node ( potentially using connection caches of client
 implementations
 ) or a Read at CL of All. Because a write could have partially gone
 through.
 2. Timestamps are used in determining the latest version ( correcting
 the
 false impression I was propagating)
 Finally, wrt "W + R > N for Q CL statement" holds, but could be broken
 in
 case of a failed write as it is unsure whether the new value got 
 written
 on
  any server or not. Is that a fair characterization ?
 Bottom line - unlike traditional DBMS, errors do not ensure automatic
 cleanup and revert back, app code has to follow up if  immedia

2x storage

2011-02-25 Thread A J
I read in some cassandra notes that each node should be allocated
twice the storage capacity you wish it to contain. I think the reason
was during compaction another copy of SSTables have to be made before
the original ones are discarded.

Can someone confirm if that is actually true ? During compaction,
don't just a few SSTables are involved. Why should it be twice the
full storage ? If I keep some buffer, it really means that I can use
40% or so space only.


Many thanks.


Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 9:22 AM, A J  wrote:
> I read in some cassandra notes that each node should be allocated
> twice the storage capacity you wish it to contain. I think the reason
> was during compaction another copy of SSTables have to be made before
> the original ones are discarded.

This rule of thumb only exactly applies when you have a single CF. It
is better stated as "your node needs to have enough room to
successfully compact your largest columnfamily."

=Rob


Re: cassandra as user-profile data store

2011-02-25 Thread Tyler Hobbs
>
> I'm wondering if anyone has used cassandra as a datastore for a
> user-profile service.  I'm thinking of applications like behavioral
> targeting, where there are lots & lots of users (10s to 100s of millions),
> and lots & lots of data about them intermixed in, say, weblogs (probably TBs
> worth).  The idea would be to use Cassandra as a datastore for distributed
> parallel processing of the TBs of files (say on hadoop).  Then the resulting
> user-profiles would be query-able quickly.
>

Just to be clear, you're primarily interested in storing the processed data
(which you give examples of below) in Cassandra?


> Anyone know of that sort of application of Cassandra?  I'm trying to puzzle
> out just what the column family might look like.  Seems like a mix of
> time-oriented information (user x visits site y at time z), location
> information (user x appeared from ip x.y.z.a which is geo-location 31.20309,
> 120.10923), and derived information (because user x visited site y 15 times
> within a 10 day window, user x must be interested in buying a car).
>

For the time-oriented data, you generally want to dedicate one row  as a
timeline per user, using timestamps as column names.  I wouldn't expect any
of these to create extremely large rows, but if that's a possibility, you
should consider splitting the timelines into one row per year (or a smaller
time period) if needed.  If you have any need for an aggregate timeline with
a higher volume of data, different strategies apply.

How you store the location data depends on what you want to do with it.  If
you're only interested in going from user -> locations, not from location ->
users, then a couple of possibilities come to mind.  You might want a
timeline of locations that a user has appeared from, or you might want a
counter for each location a user has appeared from.  What would you like to
do with these?

As for the derived information, I think you would need to decide a little
more concretely exactly what data you'll have and and what you want to be
able to do with it.


> I don't have specifics as yet... just some general thoughts.
>

Let me know what specifics you can come up with and I'll try to give you
some more specific answers.  The devil is in the details when it comes to
data modeling in Cassandra!

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Acunu beta

2011-02-25 Thread Tim Moreton
I wanted to let everyone know that we're expanding our beta for the
Acunu Storage Platform, which comprises a modified version of
Cassandra that interfaces directly on to a storage stack reengineered
for "Big Data" workloads.

Acunu runs Cassandra applications unmodified, but provides (as we'll
be talking about on our blog in the coming weeks) higher, more
predictable performance, automated tiering over SSDs, very fast
rebuild from disk failure, and space-efficient clones of
ColumnFamilies, among many other things. We also have a suite of
Cassandra cluster deployment, management and monitoring tools.

Cassandra's a great project, and we have a lot of patches we've
started contributing back to the community. We'll be open sourcing our
storage stack, too.

Please get in touch if you're interested in putting it through its paces!

Tim

--
http://www.acunu.com | @acunu


Re: 2x storage

2011-02-25 Thread A J
OK. Is it also driven by type of compaction ? Does a minor compaction
require less working space than major compaction ?

On Fri, Feb 25, 2011 at 12:40 PM, Robert Coli  wrote:
> On Fri, Feb 25, 2011 at 9:22 AM, A J  wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
>
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
>
> =Rob
>


Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ron Siemens

I updated the cassandra version in the hector package from 7.0 to 7.2.  The 
occasional slow-down in the CF-index went away.  I then upped the heap to 
512MB, and the secondary-indexing then works.  Seems awfully memory hungry for 
my small dataset.  Even the CF-index was faster with more heap.  These are the 
times with Cassandra-0.7.2 and 512M heap.  Slightly different testing: I'm 
varying the index used which give different data size results.  It still 
surprises me that the CF index does substantially better.

Secondary Index

DEBUG Retrieved THS / 7293 rows, in 1051 ms
DEBUG Retrieved TRS / 7289 rows, in 1448 ms
DEBUG Retrieved BCS / 7788 rows, in 1553 ms
DEBUG Retrieved ARS / 7426 rows, in 1479 ms
DEBUG Retrieved CHS / 7290 rows, in 1575 ms
DEBUG Retrieved MS / 4523 rows, in 766 ms
DEBUG Retrieved PRS / 562 rows, in 40 ms
DEBUG Retrieved GGF / 1162 rows, in 122 ms
DEBUG Retrieved VET / 7313 rows, in 1193 ms
DEBUG Retrieved AUT / 7287 rows, in 1746 ms
DEBUG Retrieved LIT / 7291 rows, in 1331 ms

CF Index

DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms

On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:

> 
> I failed to mention: this is just doing repeated data retrievals using the 
> index.
> 
>> ...
>> 
>> Sample run: Secondary index.
>> 
>> DEBUG Retrieved THS / 7293 rows, in 2012 ms
>> DEBUG Retrieved THS / 7293 rows, in 1956 ms
>> DEBUG Retrieved THS / 7293 rows, in 1843 ms
> ...
> 



Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 10:14 AM, A J  wrote:
> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?

Yes, unless that minor compaction happens to involve all SStables due
to compaction thresholds, at which time it is a major compaction.

=Rob


Re: 2x storage

2011-02-25 Thread Tyler Hobbs
On Fri, Feb 25, 2011 at 12:14 PM, A J  wrote:

> OK. Is it also driven by type of compaction ? Does a minor compaction
> require less working space than major compaction ?
>

No, every so often a minor compaction ends up compacting all SSTables, so
it's effectively the same as a major compaction.

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: 2x storage

2011-02-25 Thread Tyler Hobbs
Ok, we are both correct here:

Generally, a minor compaction takes less space than a major, but
occasionally it does not.

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: 2x storage

2011-02-25 Thread A J
Thanks. What happens when my compaction fails for space reasons ?
Is no compaction possible till I add more space ?
I would assume writes are not impacted though the latency of reads
would increase, right ?

Also though writes are not seek-intensive, compactions are seek-intensive, no ?

On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs  wrote:
> Ok, we are both correct here:
>
> Generally, a minor compaction takes less space than a major, but
> occasionally it does not.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>
>


Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
It's nice to see some testing in this regard, however, it's worth pointing
out something that gets lost in CF index vs secondary index discussions.
What you're really proving is that get_slice (across columns) is faster than
get_indexed_slices (across keys).  For up to a certain size (and it would be
nice if there were some emperical testing to determine what that size is),
get_slice should be one of the most performant operations Cassandra can do.
CF index approaches are basically all about getting your data into a
situation where you can use get_slice to quickly perform the search.  The
reasons for using Cassandra's built in secondary index support, IMHO, is
that (1) it's easy to use whereas CF indexes are managed by the client  and
(2) there's concern about how large an index you'd be able to effectively
store in a CF index row.  The first point is more about Cassandra being
easier for newcomers, the latter point is something I'd like to see some
more data around.  Maybe you want to run your tests up to much larger sizes
and see if there's a point where the results change?  FWIW, I recently
switched back to CF-based indexes from secondary indexes, largely for the
flexibility in the types of queries that became possible, but it's nice to
see there's some performance benefit.  The other thing would be good to look
at is timing the overhead of what it takes to update your index as you
change the values that are being indexed.



On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens wrote:

>
> I updated the cassandra version in the hector package from 7.0 to 7.2.  The
> occasional slow-down in the CF-index went away.  I then upped the heap to
> 512MB, and the secondary-indexing then works.  Seems awfully memory hungry
> for my small dataset.  Even the CF-index was faster with more heap.  These
> are the times with Cassandra-0.7.2 and 512M heap.  Slightly different
> testing: I'm varying the index used which give different data size results.
>  It still surprises me that the CF index does substantially better.
>
> Secondary Index
>
> DEBUG Retrieved THS / 7293 rows, in 1051 ms
> DEBUG Retrieved TRS / 7289 rows, in 1448 ms
> DEBUG Retrieved BCS / 7788 rows, in 1553 ms
> DEBUG Retrieved ARS / 7426 rows, in 1479 ms
> DEBUG Retrieved CHS / 7290 rows, in 1575 ms
> DEBUG Retrieved MS / 4523 rows, in 766 ms
> DEBUG Retrieved PRS / 562 rows, in 40 ms
> DEBUG Retrieved GGF / 1162 rows, in 122 ms
> DEBUG Retrieved VET / 7313 rows, in 1193 ms
> DEBUG Retrieved AUT / 7287 rows, in 1746 ms
> DEBUG Retrieved LIT / 7291 rows, in 1331 ms
>
> CF Index
>
> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms
>
> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:
>
> >
> > I failed to mention: this is just doing repeated data retrievals using
> the index.
> >
> >> ...
> >>
> >> Sample run: Secondary index.
> >>
> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms
> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms
> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms
> > ...
> >
>
>


Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Mohit Anchlia
Does it mean that we should design data model such that row keys
actually become columns (and create secondary index) so that the data
retrieval is faster. I am soon setting up big test instances to test
all this.

On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff  wrote:
> It's nice to see some testing in this regard, however, it's worth pointing
> out something that gets lost in CF index vs secondary index discussions.
> What you're really proving is that get_slice (across columns) is faster than
> get_indexed_slices (across keys).  For up to a certain size (and it would be
> nice if there were some emperical testing to determine what that size is),
> get_slice should be one of the most performant operations Cassandra can do.
> CF index approaches are basically all about getting your data into a
> situation where you can use get_slice to quickly perform the search.  The
> reasons for using Cassandra's built in secondary index support, IMHO, is
> that (1) it's easy to use whereas CF indexes are managed by the client  and
> (2) there's concern about how large an index you'd be able to effectively
> store in a CF index row.  The first point is more about Cassandra being
> easier for newcomers, the latter point is something I'd like to see some
> more data around.  Maybe you want to run your tests up to much larger sizes
> and see if there's a point where the results change?  FWIW, I recently
> switched back to CF-based indexes from secondary indexes, largely for the
> flexibility in the types of queries that became possible, but it's nice to
> see there's some performance benefit.  The other thing would be good to look
> at is timing the overhead of what it takes to update your index as you
> change the values that are being indexed.
>
>
>
> On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens 
> wrote:
>>
>> I updated the cassandra version in the hector package from 7.0 to 7.2.
>>  The occasional slow-down in the CF-index went away.  I then upped the heap
>> to 512MB, and the secondary-indexing then works.  Seems awfully memory
>> hungry for my small dataset.  Even the CF-index was faster with more heap.
>>  These are the times with Cassandra-0.7.2 and 512M heap.  Slightly different
>> testing: I'm varying the index used which give different data size results.
>>  It still surprises me that the CF index does substantially better.
>>
>> Secondary Index
>>
>> DEBUG Retrieved THS / 7293 rows, in 1051 ms
>> DEBUG Retrieved TRS / 7289 rows, in 1448 ms
>> DEBUG Retrieved BCS / 7788 rows, in 1553 ms
>> DEBUG Retrieved ARS / 7426 rows, in 1479 ms
>> DEBUG Retrieved CHS / 7290 rows, in 1575 ms
>> DEBUG Retrieved MS / 4523 rows, in 766 ms
>> DEBUG Retrieved PRS / 562 rows, in 40 ms
>> DEBUG Retrieved GGF / 1162 rows, in 122 ms
>> DEBUG Retrieved VET / 7313 rows, in 1193 ms
>> DEBUG Retrieved AUT / 7287 rows, in 1746 ms
>> DEBUG Retrieved LIT / 7291 rows, in 1331 ms
>>
>> CF Index
>>
>> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
>> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
>> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
>> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
>> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
>> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
>> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
>> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
>> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
>> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
>> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms
>>
>> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:
>>
>> >
>> > I failed to mention: this is just doing repeated data retrievals using
>> > the index.
>> >
>> >> ...
>> >>
>> >> Sample run: Secondary index.
>> >>
>> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms
>> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms
>> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms
>> > ...
>> >
>>
>
>


Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
At the risk of recapitulating a conversation that seems to happen with some
frequency on this list, the answer is going to boil down to "depends on your
data model", but using rows as indexes is one of the core usage patterns of
Cassandra, whether to store the list of keys to rows in another column
family as column names or to build inverted indexes.  That's why columns are
sorted and can be easily retrieved by sort range, so you can do things like
that.  If you're building test instances, then you're going to find out the
answer of what's best for your particular application pretty quickly.  I
think the best advice I've ever seen on this list about how to do something
with Cassandra has been "do a test with both and see what happens", and of
course, share what you find with the rest of us :)


On Fri, Feb 25, 2011 at 12:10 PM, Mohit Anchlia wrote:

> Does it mean that we should design data model such that row keys
> actually become columns (and create secondary index) so that the data
> retrieval is faster. I am soon setting up big test instances to test
> all this.
>
> On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff  wrote:
> > It's nice to see some testing in this regard, however, it's worth
> pointing
> > out something that gets lost in CF index vs secondary index discussions.
> > What you're really proving is that get_slice (across columns) is faster
> than
> > get_indexed_slices (across keys).  For up to a certain size (and it would
> be
> > nice if there were some emperical testing to determine what that size
> is),
> > get_slice should be one of the most performant operations Cassandra can
> do.
> > CF index approaches are basically all about getting your data into a
> > situation where you can use get_slice to quickly perform the search.  The
> > reasons for using Cassandra's built in secondary index support, IMHO, is
> > that (1) it's easy to use whereas CF indexes are managed by the client
> and
> > (2) there's concern about how large an index you'd be able to effectively
> > store in a CF index row.  The first point is more about Cassandra being
> > easier for newcomers, the latter point is something I'd like to see some
> > more data around.  Maybe you want to run your tests up to much larger
> sizes
> > and see if there's a point where the results change?  FWIW, I recently
> > switched back to CF-based indexes from secondary indexes, largely for the
> > flexibility in the types of queries that became possible, but it's nice
> to
> > see there's some performance benefit.  The other thing would be good to
> look
> > at is timing the overhead of what it takes to update your index as you
> > change the values that are being indexed.
> >
> >
> >
> > On Fri, Feb 25, 2011 at 10:23 AM, Ron Siemens 
> > wrote:
> >>
> >> I updated the cassandra version in the hector package from 7.0 to 7.2.
> >>  The occasional slow-down in the CF-index went away.  I then upped the
> heap
> >> to 512MB, and the secondary-indexing then works.  Seems awfully memory
> >> hungry for my small dataset.  Even the CF-index was faster with more
> heap.
> >>  These are the times with Cassandra-0.7.2 and 512M heap.  Slightly
> different
> >> testing: I'm varying the index used which give different data size
> results.
> >>  It still surprises me that the CF index does substantially better.
> >>
> >> Secondary Index
> >>
> >> DEBUG Retrieved THS / 7293 rows, in 1051 ms
> >> DEBUG Retrieved TRS / 7289 rows, in 1448 ms
> >> DEBUG Retrieved BCS / 7788 rows, in 1553 ms
> >> DEBUG Retrieved ARS / 7426 rows, in 1479 ms
> >> DEBUG Retrieved CHS / 7290 rows, in 1575 ms
> >> DEBUG Retrieved MS / 4523 rows, in 766 ms
> >> DEBUG Retrieved PRS / 562 rows, in 40 ms
> >> DEBUG Retrieved GGF / 1162 rows, in 122 ms
> >> DEBUG Retrieved VET / 7313 rows, in 1193 ms
> >> DEBUG Retrieved AUT / 7287 rows, in 1746 ms
> >> DEBUG Retrieved LIT / 7291 rows, in 1331 ms
> >>
> >> CF Index
> >>
> >> DEBUG Retrieved THS / 7293 rows, in 17 + 759 ms
> >> DEBUG Retrieved TRS / 7289 rows, in 19 + 734 ms
> >> DEBUG Retrieved BCS / 7788 rows, in 23 + 736 ms
> >> DEBUG Retrieved ARS / 7426 rows, in 23 + 1448 ms
> >> DEBUG Retrieved CHS / 7290 rows, in 18 + 638 ms
> >> DEBUG Retrieved MS / 4523 rows, in 32 + 622 ms
> >> DEBUG Retrieved PRS / 562 rows, in 2 + 50 ms
> >> DEBUG Retrieved GGF / 1162 rows, in 3 + 79 ms
> >> DEBUG Retrieved VET / 7313 rows, in 17 + 686 ms
> >> DEBUG Retrieved AUT / 7287 rows, in 17 + 758 ms
> >> DEBUG Retrieved LIT / 7291 rows, in 17 + 745 ms
> >>
> >> On Feb 24, 2011, at 3:39 PM, Ron Siemens wrote:
> >>
> >> >
> >> > I failed to mention: this is just doing repeated data retrievals using
> >> > the index.
> >> >
> >> >> ...
> >> >>
> >> >> Sample run: Secondary index.
> >> >>
> >> >> DEBUG Retrieved THS / 7293 rows, in 2012 ms
> >> >> DEBUG Retrieved THS / 7293 rows, in 1956 ms
> >> >> DEBUG Retrieved THS / 7293 rows, in 1843 ms
> >> > ...
> >> >
> >>
> >
> >
>


Re: My responses to this mailing list interpreted as SPAM

2011-02-25 Thread Aaron Morton
If you search the list there is some discussion about this. Best advice is to 
send in plain text. https://issues.apache.org/jira/browse/INFRA-3356

Personally I prefer the emails to have the whole discussion.

Aaron

On 25/02/2011, at 4:55 AM, Anthony John  wrote:

> Do not copy the entire thread, only hit reply!
> 
> It seems as the thread grows in responses, the spam word count somehow kicks 
> in.
> 
> Thx,
> 
> -JA
> 
> On Thu, Feb 24, 2011 at 9:44 AM, Sasha Dolgy  wrote:
> have you tried replying without copying in the entire conversation
> thread to the message?
> 
> On Thu, Feb 24, 2011 at 1:40 PM, Anthony John  wrote:
> > To the list owners - the error text that gmail comes back with is below
> > Now I understand that much of what I write is spam quality, so the mail
> > filter might actually be smart ;).
> > New posts works, as this one hopefully will. If is on reply that I have a
> > problem. Any pointers to avoid this situation will be super useful.
> 


RE: memtable_flush_after_mins setting not working

2011-02-25 Thread Jeffrey Wang
I just noticed this thread. Does this mean that (assuming the same setup of an 
empty keyspace and CFs added later) if I have a CF that I write to for some 
time, but not enough to hit the flush limits, it will never get flushed until 
the server is restarted? I believe this is causing commit logs to not be 
deleted, which is taking up a ton of disk space (in addition to a bunch of 
small memtables being stuck in memory).

-Jeffrey

From: Ching-Cheng Chen [mailto:cc...@evidentsoftware.com]
Sent: Thursday, February 17, 2011 8:52 AM
To: user@cassandra.apache.org
Cc: Jonathan Ellis
Subject: Re: memtable_flush_after_mins setting not working

https://issues.apache.org/jira/browse/CASSANDRA-2183

Regards,

Chen

www.evidentsoftware.com
On Thu, Feb 17, 2011 at 11:47 AM, Ching-Cheng Chen 
mailto:cc...@evidentsoftware.com>> wrote:
Certainly, I'll open a ticket to track this issue.

Regards,

Chen

www.evidentsoftware.com

On Thu, Feb 17, 2011 at 11:42 AM, Jonathan Ellis 
mailto:jbel...@gmail.com>> wrote:
Your analysis sounds correct to me.  Can you open a ticket on
https://issues.apache.org/jira/browse/CASSANDRA ?

On Thu, Feb 17, 2011 at 10:17 AM, Ching-Cheng Chen
mailto:cc...@evidentsoftware.com>> wrote:
> We have observed the behavior that memtable_flush_after_mins setting not
> working occasionally.   After some testing and code digging, we finally
> figured out what going on.
> The memtable_flush_after_mins won't work on certain condition with current
> implementation in Cassandra.
>
> In org.apache.cassandra.db.Table,  the scheduled flush task is setup by the
> following code during construction.
>
> int minCheckMs = Integer.MAX_VALUE;
>
> for (ColumnFamilyStore cfs : columnFamilyStores.values())
> {
> minCheckMs = Math.min(minCheckMs, cfs.getMemtableFlushAfterMins() * 60 *
> 1000);
> }
> Runnable runnable = new Runnable()
> {
>public void run()
>{
>for (ColumnFamilyStore cfs : columnFamilyStores.values())
>{
>cfs.forceFlushIfExpired();
>}
>}
> };
> flushTask = StorageService.scheduledTasks.scheduleWithFixedDelay(runnable,
> minCheckMs, minCheckMs, TimeUnit.MILLISECONDS);
>
> Now for our application, we will create a keyspacewithout any columnfamily
> first.  And only add needed columnfamily later depends on request.
> However, when keyspacegot created (without any columnfamily ), the above
> code will actually schedule a fixed delay flush check task with
> Integer.MAX_VALUE ms
> since there is no columnfamily yet.
> Later when you add columnfamily to this empty keyspace, the initCf() method
> in Table.java doesn't check whether the scheduled flush check task interval
> need
> to be updated or not.   To fix this, we'd need to restart the Cassandra
> after columnfamily added into the keyspace.
> I would suggest that add additional logic in initCf() method to recreate a
> scheduled flush check task if needed.
> Regards,
> Chen
> www.evidentsoftware.com


--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


--
www.evidentsoftware.com



Re: 2x storage

2011-02-25 Thread A J
Another related question:
Can the minor compactions across nodes be staggered so that I can
control how many nodes are compacting at any given point ?

On Fri, Feb 25, 2011 at 2:01 PM, A J  wrote:
> Thanks. What happens when my compaction fails for space reasons ?
> Is no compaction possible till I add more space ?
> I would assume writes are not impacted though the latency of reads
> would increase, right ?
>
> Also though writes are not seek-intensive, compactions are seek-intensive, no 
> ?
>
> On Fri, Feb 25, 2011 at 1:44 PM, Tyler Hobbs  wrote:
>> Ok, we are both correct here:
>>
>> Generally, a minor compaction takes less space than a major, but
>> occasionally it does not.
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>>
>


CassandraForums.com

2011-02-25 Thread kh jo
Hi Guys,
for all of those who prefer forums over mailing lists, I setup a forum for 
cassandra, please have a look

http://www.cassandraforums.com/

thanks
Jo



  

Re: memtable_flush_after_mins setting not working

2011-02-25 Thread Jonathan Ellis
Yes.

On Fri, Feb 25, 2011 at 4:29 PM, Jeffrey Wang  wrote:
> I just noticed this thread. Does this mean that (assuming the same setup of
> an empty keyspace and CFs added later) if I have a CF that I write to for
> some time, but not enough to hit the flush limits, it will never get flushed
> until the server is restarted? I believe this is causing commit logs to not
> be deleted, which is taking up a ton of disk space (in addition to a bunch
> of small memtables being stuck in memory).
>
>
>
> -Jeffrey
>
>
>
> From: Ching-Cheng Chen [mailto:cc...@evidentsoftware.com]
> Sent: Thursday, February 17, 2011 8:52 AM
> To: user@cassandra.apache.org
> Cc: Jonathan Ellis
> Subject: Re: memtable_flush_after_mins setting not working
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-2183
>
>
>
> Regards,
>
>
>
> Chen
>
>
>
> www.evidentsoftware.com
>
> On Thu, Feb 17, 2011 at 11:47 AM, Ching-Cheng Chen
>  wrote:
>
> Certainly, I'll open a ticket to track this issue.
>
>
>
> Regards,
>
>
>
> Chen
>
>
>
> www.evidentsoftware.com
>
>
>
> On Thu, Feb 17, 2011 at 11:42 AM, Jonathan Ellis  wrote:
>
> Your analysis sounds correct to me.  Can you open a ticket on
> https://issues.apache.org/jira/browse/CASSANDRA ?
>
> On Thu, Feb 17, 2011 at 10:17 AM, Ching-Cheng Chen
>  wrote:
>> We have observed the behavior that memtable_flush_after_mins setting not
>> working occasionally.   After some testing and code digging, we finally
>> figured out what going on.
>> The memtable_flush_after_mins won't work on certain condition with current
>> implementation in Cassandra.
>>
>> In org.apache.cassandra.db.Table,  the scheduled flush task is setup by
>> the
>> following code during construction.
>>
>> int minCheckMs = Integer.MAX_VALUE;
>>
>> for (ColumnFamilyStore cfs : columnFamilyStores.values())
>> {
>>     minCheckMs = Math.min(minCheckMs, cfs.getMemtableFlushAfterMins() * 60
>> *
>> 1000);
>> }
>> Runnable runnable = new Runnable()
>> {
>>    public void run()
>>    {
>>        for (ColumnFamilyStore cfs : columnFamilyStores.values())
>>        {
>>            cfs.forceFlushIfExpired();
>>        }
>>    }
>> };
>> flushTask = StorageService.scheduledTasks.scheduleWithFixedDelay(runnable,
>> minCheckMs, minCheckMs, TimeUnit.MILLISECONDS);
>>
>> Now for our application, we will create a keyspacewithout any columnfamily
>> first.  And only add needed columnfamily later depends on request.
>> However, when keyspacegot created (without any columnfamily ), the above
>> code will actually schedule a fixed delay flush check task with
>> Integer.MAX_VALUE ms
>> since there is no columnfamily yet.
>> Later when you add columnfamily to this empty keyspace, the initCf()
>> method
>> in Table.java doesn't check whether the scheduled flush check task
>> interval
>> need
>> to be updated or not.   To fix this, we'd need to restart the Cassandra
>> after columnfamily added into the keyspace.
>> I would suggest that add additional logic in initCf() method to recreate a
>> scheduled flush check task if needed.
>> Regards,
>> Chen
>> www.evidentsoftware.com
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>
> --
> www.evidentsoftware.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


How does node failure detection work in Cassandra?

2011-02-25 Thread tijoriwala.ritesh

Hi,
I would like to know internals of how does node failure detection work in
Cassandra? And in absence of any network partition, do all nodes see the
same view of live nodes? Is there a concept of Coordinator/Election? If yes,
how is merge handled after network partition heals? 

thanks,
Ritesh
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-node-failure-detection-work-in-Cassandra-tp6066415p6066415.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How does node failure detection work in Cassandra?

2011-02-25 Thread Brandon Williams
On Fri, Feb 25, 2011 at 5:32 PM, tijoriwala.ritesh <
tijoriwala.rit...@gmail.com> wrote:

>
> Hi,
> I would like to know internals of how does node failure detection work in
> Cassandra?


http://bit.ly/phi_accrual


> Is there a concept of Coordinator/Election?


No.

-Brandon


Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 2:41 PM, A J  wrote:
> Can the minor compactions across nodes be staggered so that I can
> control how many nodes are compacting at any given point ?

Not without some crazy scheme where you control the compaction
thresholds dynamically via some external mechanism. You probably don't
actually want to do that? You generally want a system which can
tolerate minor compaction..

=Rob


Re: 2x storage

2011-02-25 Thread Terje Marthinussen
Cassandra never compacts more than one column family at the time?

Regards,
Terje

On 26 Feb 2011, at 02:40, Robert Coli  wrote:

> On Fri, Feb 25, 2011 at 9:22 AM, A J  wrote:
>> I read in some cassandra notes that each node should be allocated
>> twice the storage capacity you wish it to contain. I think the reason
>> was during compaction another copy of SSTables have to be made before
>> the original ones are discarded.
> 
> This rule of thumb only exactly applies when you have a single CF. It
> is better stated as "your node needs to have enough room to
> successfully compact your largest columnfamily."
> 
> =Rob


Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 4:55 PM, Terje Marthinussen
 wrote:
> Cassandra never compacts more than one column family at the time?

Nope, compaction is single threaded currently.

https://issues.apache.org/jira/browse/CASSANDRA-2191
https://issues.apache.org/jira/browse/CASSANDRA-2191

=Rob