Re: Node Inconsistency

2011-01-11 Thread Peter Schuller
> I have few more questions: > > 1. If we change the write/delete consistency level to ALL, do we > eliminate the data inconsistency among nodes (since the delete > operations will apply to ALL replicas)? > > 2. My understanding is that "Read Repair" doesn't handle tombstones. > How about "Node Too

map-reduce failure

2011-01-11 Thread Or Yanay
Hi all, I am using 0.6.8 across 5 machines with ~30G of data on each machine. I am trying to run a map-reduce query (Both with my own Java code and Pig) and failing after about 30 minutes (see stack trace and details below). I have followed this wiki page

Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-01-11 Thread Michael Fortin
This my understanding of 0.* releases. - They're not considered production ready by the maintainers - They subject to changes that break backwards compatibility - Generally poorly documented because the api is so volatile - Previous releases are unsupported for 1.* releases - The maintainer is say

Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Chris Burroughs
https://issues.apache.org/jira/browse/CASSANDRA-1417 http://www.riptano.com/blog/whats-new-cassandra-066 My naive reading of CASSANDRA-1417 was that it could be used to save the row cache to disk. Empirically it appears to only save the row keys, and then reads each row. In my case I set the row

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Edward Capriolo
On Tue, Jan 11, 2011 at 9:54 AM, Chris Burroughs wrote: > https://issues.apache.org/jira/browse/CASSANDRA-1417 > http://www.riptano.com/blog/whats-new-cassandra-066 > > My naive reading of CASSANDRA-1417 was that it could be used to save the > row cache to disk.  Empirically it appears to only sav

Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-01-11 Thread Eric Evans
On Tue, 2011-01-11 at 09:23 -0500, Michael Fortin wrote: > This my understanding of 0.* releases. > - They're not considered production ready by the maintainers > - They subject to changes that break backwards compatibility > - Generally poorly documented because the api is so volatile > - Previous

[RELEASE] 0.7.0 (and 0.6.9)

2011-01-11 Thread Eric Evans
As some of you may already be aware, 0.7.0 has been officially released. You are free to start your upgrades, though not all at once, you'll spoil your supper! I apologize to anyone that might have noticed artifacts published as early as Sunday and were confused by the lack of announcement, I was

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Peter Schuller
> https://issues.apache.org/jira/browse/CASSANDRA-1417 [snip, row cache saving only keys] > Is this the intentional implementation?  Are there any reason not to > just the entire row to disk to allow for faster startup? Intentional (in the sense of "not a mistake"), but see: https://issues.a

Re: [RELEASE] 0.7.0 (and 0.6.9)

2011-01-11 Thread Joseph Stein
Many thanks to those that put in all the hard work, time, dedication, etc for another awesome release !!! /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop */ On Tue, Jan 11, 2011 at 12:23 PM, Eric Evans wrote: > > As some of you may already be aware, 0.7.0 has been o

upgrading to 0.7 from 0.6.x

2011-01-11 Thread Dave Viner
Hi all, I'm reading the upgrading notes in the NEWS.txt file, but I don't see how/where the data from the 0.6 cluster is actually migrated. Current, in the file, it says: The process to upgrade is: 1) run "nodetool drain" on _each_ 0.6 node. When drain finishes (log message "Node is

Re: upgrading to 0.7 from 0.6.x

2011-01-11 Thread Peter Schuller
> The process to upgrade is: > 1) run "nodetool drain" on _each_ 0.6 node. When drain finishes (log >message "Node is drained" appears), stop the process. > 2) Convert your storage-conf.xml to the new cassandra.yaml using >"bin/config-converter". > 3) Rename any of your

Advice wanted on modeling

2011-01-11 Thread Steven Mac
Hi, I've been experimenting quite a bit with Cassandra and think I'm getting to understand it, but I would like some advice on modeling my data in Cassandra for an application I'm developing. The application will have a large number of records, with the records consisting of a fixed part and

Ruby database migrations for Cassandra - ActiveColumn

2011-01-11 Thread Mike Wynholds
Happy new year all- I just wanted to mention that I have released a new Cassandra data management gem called ActiveColumn. The first major feature is ActiveRecord-like database migrations. The gem is young but it works and is well documented, and I'm very interested in feedback. Blog post: http

Re: Ruby database migrations for Cassandra - ActiveColumn

2011-01-11 Thread Jonathan Ellis
Nice work, Mike! I tweeted your project a few days ago. :) On Tue, Jan 11, 2011 at 12:18 PM, Mike Wynholds wrote: > Happy new year all- > I just wanted to mention that I have released a new Cassandra data > management gem called ActiveColumn.  The first major feature is > ActiveRecord-like datab

Re: Ruby database migrations for Cassandra - ActiveColumn

2011-01-11 Thread Ryan King
Awesome and great to see you're using our fauna cassandra gem. :) -ryan On Tue, Jan 11, 2011 at 10:18 AM, Mike Wynholds wrote: > Happy new year all- > I just wanted to mention that I have released a new Cassandra data > management gem called ActiveColumn.  The first major feature is > ActiveReco

Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-01-11 Thread Michael Fortin
Thanks for your thoughtful and detailed replies Eric, it's much appreciated. Mike On Jan 11, 2011, at 11:23 AM, Eric Evans wrote: > On Tue, 2011-01-11 at 09:23 -0500, Michael Fortin wrote: >> This my understanding of 0.* releases. >> - They're not considered production ready by the maintainers >

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Chris Burroughs
On 01/11/2011 10:11 AM, Edward Capriolo wrote: > I think because the RowCache is only saved periodically it could be > out of sync. IE saved at 12:00 changed at 12:01 then the row cache > would consistently return the wrong results since it never looks at > the disk again. I guess saving the row ca

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Chris Burroughs
On 01/11/2011 12:23 PM, Peter Schuller wrote: >> Is this the intentional implementation? Are there any reason not to >> just the entire row to disk to allow for faster startup? > > Intentional (in the sense of "not a mistake"), but see: > >https://issues.apache.org/jira/browse/CASSANDRA-1625

RE: Need some beginner help with Eclipse+Hector with Cassandra 0.7

2011-01-11 Thread tamara.alexander
What about this logger error? I'm getting it too, and I am also running simple code with Hector and Eclipse: log4j:WARN No appenders could be found for logger (me.prettyprint.cassandra.connection.CassandraHostRetryService). log4j:WARN Please initialize the log4j system properly. log4j:WARN See ht

Re: Need some beginner help with Eclipse+Hector with Cassandra 0.7

2011-01-11 Thread Nate McCall
Add "-verbose" to the command-line options for the launch configuration so you can see the classpath. It sounds like log4j.properties is not being found. (Depending on your project setup, you may need to add this file to the classpath explicitly). On Tue, Jan 11, 2011 at 1:36 PM, wrote: > What a

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Peter Schuller
> But now I need two knobs:  "Max size of row cache" (best optimal steady > state hit rate) and "number of row cache items to read in on startup" > (so that the ROW-READ-STAGE does not need to drop packets and node can > be restarted in a reasonable amount of time). Good idea IMO. File a jira tick

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Peter Schuller
> This makes total sense and is obvious in hindsight.  But wouldn't such a > hypothetical "stale" row cache on be corrected by read repair (in other > words useless for write heavy workloads, not a problem for read heavy)? It's not quite that simple. For example, suppose you write to the cluster a

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Chris Burroughs
On 01/11/2011 02:56 PM, Peter Schuller wrote: >> But now I need two knobs: "Max size of row cache" (best optimal steady >> state hit rate) and "number of row cache items to read in on startup" >> (so that the ROW-READ-STAGE does not need to drop packets and node can >> be restarted in a reasonable

Why my posts are marked as spam?

2011-01-11 Thread Oleg Tsvinev
Whatever I do, it happens :(

about the insert data

2011-01-11 Thread raoyixuan (Shandy)
I get confuse about the node which the data insert.I connect to one node by Cassandra-cli, and insert some data ,whether this node is the coordinate node? As I know, if I set the random partitioner, the coordinate node is whose the hash value bigger than the key’s hash value 华为技术有限公司 Huawei Te

Re: Node Inconsistency

2011-01-11 Thread Vram Kouramajian
Thanks Peter for the reply. We are currently "fixing" our inconsistent data (since we have master data saved) . We will follow your suggestion and we will run Node Repair tool more often in the future. However, what happens to data inserted/deleted after Node Repair tool runs (i.e., between Node R

how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Koert Kuipers
I would like to do a get_range_slices for all keys (which are strings) that start with the same substring x (for example "com.google"). How do I do that? start_key = x abd end_key = x doesn't seem to do the job... thanks koert

Re: about the insert data

2011-01-11 Thread Tyler Hobbs
The "coordinator node" (as it is referred to in the documentation) is the node which you connect to and make the request to. It has nothing to do with which node(s) are replicas for the data. - Tyler 2011/1/11 raoyixuan (Shandy) > I get confuse about the node which the data insert.I connect t

Re: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Tyler Hobbs
That type of operation only works (directly) when using an OrderPreservingPartitioner. There are a lot of downsides to OPP: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ You can instead order your keys alphabetically as column names in a row (o

RE: about the insert data

2011-01-11 Thread raoyixuan (Shandy)
Thanks tyler So the node I connect to Is the coordinate node. right? But what’s the process about the replica? Firstly, the data will be inserted by the coordinate node. Secondly, it will find the first replica node based by the partitioner ,such randompartitioner, Thirdly, it will replicate th

Re: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#range_rp also, start==end==x means "give me back exactly row x, if it exists." IF you were using OPP you'd need end=y. On Tue, Jan 11, 2011 at 7:45 PM, Koert Kuipers wrote: > I would like to do a get_range_slices for all keys (which are strings) that > start

RE: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Koert Kuipers
Ok I see get_range_slice is really only useful for paging with RP... So if I were using OPP (which I am not) and I wanted all keys starting with "com.google", what should my start_key and end_key be? -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, Janua

Re: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Roshan Dawrani
On Wed, Jan 12, 2011 at 7:41 AM, Koert Kuipers < koert.kuip...@diamondnotch.com> wrote: > Ok I see get_range_slice is really only useful for paging with RP... > > So if I were using OPP (which I am not) and I wanted all keys starting with > "com.google", what should my start_key and end_key be? >

Re: Confused about CASSANDRA-1417; saving row cache

2011-01-11 Thread Chris Burroughs
On 2011-01-11 15:41, Chris Burroughs wrote: > On 01/11/2011 02:56 PM, Peter Schuller wrote: >>> But now I need two knobs: "Max size of row cache" (best optimal steady >>> state hit rate) and "number of row cache items to read in on startup" >>> (so that the ROW-READ-STAGE does not need to drop pac

Re: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Aaron Morton
If you were using OPP and get_range_slices then set the start_key to be "com.google" and the end_key to be "". Get is slices of say 1,000 (use the last key read as the next start_ket) and when you see the first key that does not start with com.google top making calls.If you move the data from rows

unsubscribe

2011-01-11 Thread Nichole Kulobone

RE: [RELEASE] 0.7.0 (and 0.6.9)

2011-01-11 Thread Viktor Jevdokimov
Congratulations!!! Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 Konstitucijos pr. 23, LT-08105 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended

Re: Why my posts are marked as spam?

2011-01-11 Thread Arijit Mukherjee
I think this happens for RTF. Some of the mails in the post are RTF, and the reply button creates an RTF reply - that's when it happens. Wonder how the mail to which I replied was in RTF... Arijit On 12 January 2011 05:28, Oleg Tsvinev wrote: > Whatever I do, it happens :( -- "And when the n

Re: [RELEASE] 0.7.0 (and 0.6.9)

2011-01-11 Thread Shinpei Ohtani
Congratulations for 0.7 and also 0.6.9!!! On Wed, Jan 12, 2011 at 3:29 PM, Viktor Jevdokimov wrote: > Congratulations!!! > > > Best regards/ Pagarbiai > > Viktor Jevdokimov > Senior Developer > > Email: viktor.jevdoki...@adform.com > Phone: +370 5 212 3063 > Fax: +370 5 261 0453 > > Konstitucijos

Re: how to do a get_range_slices where all keys start with same string

2011-01-11 Thread Arijit Mukherjee
I have a follow on question on this. I have a super column family like this: I store some events keyed by a subscriber id, and for each such "row", I have a number of super columns which are keyed by an event time stamp. For example: subscriber1 { ts11 { some columns} ts12 { some col