unsubscribe

2011-02-08 Thread Lubos Pusty
unsubscribe

Re: Best Approaches for Developer Integration

2011-02-08 Thread Stephen Connolly
On 8 February 2011 06:40, Paul Brown wrote: > > On Feb 7, 2011, at 10:28 PM, Paul Querna wrote: >> So, I guess this is coming down to: >>  1) Has anyone built any easy to install packages of Cassandra? > > I didn't find it necessary.  I implemented a simple embedding wrapper for > Cassandra so th

Re: Do supercolumns have a purpose?

2011-02-08 Thread David Boxenhorn
Shaun, I agree with you, but marking them as deprecated is not good enough for me. I can't easily stop using supercolumns. I need an upgrade path. On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: > > I'm a newbie here, but, with apologies for my presumptuousness, I think you > should deprecate

cassandra-cli (output) broken for super columns

2011-02-08 Thread Timo Nentwig
This is not what it's supposed to be like, is it? [default@foo] get foo[page-field]; => (super_column=20110208, (column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, timestamp=1297159430471000) (column=82f4c650-2d53-1

Re: OOM during batch_mutate

2011-02-08 Thread Patrik Modesto
On Tue, Feb 8, 2011 at 00:05, Jonathan Ellis wrote: > Sounds like the keyspace was created on the 32GB machine, so it > guessed memtable sizes that are too large when run on the 16GB one. > Use "update column family" from the cli to cut the throughput and > operations thresholds in half, or to 1/4

Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Seeing that discussion here about indexes not supported in superCFs, and less than clear future of superCFs altogether, I was thinking about getting a modicum of same functionality with serialized objects inside columns. This way the column key becomes sort of analog of supercolumn key, and I hand

Re: cassandra-cli (output) broken for super columns

2011-02-08 Thread Stephen Connolly
On 8 February 2011 10:38, Timo Nentwig wrote: > This is not what it's supposed to be like, is it? > > [default@foo] get foo[page-field]; > => (super_column=20110208, >     (column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, > timestamp=1297159430471000) >  

Re: cassandra-cli (output) broken for super columns

2011-02-08 Thread Timo Nentwig
On Feb 8, 2011, at 13:41, Stephen Connolly wrote: > On 8 February 2011 10:38, Timo Nentwig wrote: >> This is not what it's supposed to be like, is it? Looks alright: >> [default@foo] get foo[page-field]; >> => (super_column=20110208, >> (column=82f4c650

Re: Best way to detect/fix bitrot today?

2011-02-08 Thread Anand Somani
I should have clarified we have 3 copies, so in that case as long as 2 match we should be ok? Even if there were checksumming at the SStable level, I assume it has to check and report these errors on compaction (or node repair)? I have seen some JIRA open on these issues ( 47 and 1717), but if I

Re: Best way to detect/fix bitrot today?

2011-02-08 Thread Shaun Cutts
One thing that we're doing for (guaranteed) immutable data is to use MD5 signatures as keys... this will also prevent duplication, and it will allow detection (if not correction) of bitrot at the app level easy. On Feb 8, 2011, at 9:23 AM, Anand Somani wrote: > I should have clarified we have 3

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
So the empty row will be ultimately removed then? Is there a way to for the GC to verify this? Thanks, -Kal On Tue, Feb 8, 2011 at 2:21 AM, Stu Hood wrote: > The expired columns were converted into tombstones, which will live for the > GC timeout. The "empty" row will be cleaned up when those to

Re: time to live rows

2011-02-08 Thread David Boxenhorn
I hope you don't consider this a hijack of the thread... What I'd like to know is the following: The GC removes TTL rows some time after they expire, at its convenience. But will they stop being returned as soon as they expire? (This is the expected behavior...) On Tue, Feb 8, 2011 at 5:11 PM, K

Subcolumn Indexing

2011-02-08 Thread Jeremy.Truelove
I had a question on a sentence about the data model and how things are stored and retrieved that I came across in the O'Reilly book in the Data Model chapter. "Cassandra does not index subcolumns, so when you load a super column into memory, all of its columns are loaded as well." Does this jus

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 *494996* pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSPar

Re: time to live rows

2011-02-08 Thread Sylvain Lebresne
> So the empty row will be ultimately removed then? Is there a way to > for the GC to verify this? > Set a GcGraceSecond very low and force a major compaction. > > Thanks, > -Kal > > On Tue, Feb 8, 2011 at 2:21 AM, Stu Hood wrote: > > The expired columns were converted into tombstones, which wi

Re: time to live rows

2011-02-08 Thread Sylvain Lebresne
> > I hope you don't consider this a hijack of the thread... > > What I'd like to know is the following: > > The GC removes TTL rows some time after they expire, at its convenience. > But will they stop being returned as soon as they expire? (This is the > expected behavior...) > It is the individ

Re: Finding the intersection results of column sets of two rows

2011-02-08 Thread Aklin_81
Amongst two rows, where I need to find the common columns. I will not have more than 200 columns(in 99% cases) for the 1st row. But the 2nd row where I need to find these columns may have even around a million valueless columns. A point to note is:- These calculations are all done for **writing th

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
I'm trying to set the gc_grace_seconds column family parameter but no luck.. I got the name of it from the comment in cassandra.yaml: # - gc_grace_seconds: specifies the time to wait before garbage #collecting tombstones (deletion markers). defaults to 864000 (10 #days). See ht

Re: Best Approaches for Developer Integration

2011-02-08 Thread Eric Evans
On Mon, 2011-02-07 at 22:28 -0800, Paul Querna wrote: > For example, for CouchDB has CouchDBX > which at least on OSX present a very easy to use installer, data > browser, and GUI. You just run CouchDBX.app, and then your > application can build out the rest of your d

Re: time to live rows

2011-02-08 Thread Sylvain Lebresne
Not very logically, It's actually gc_grace, not gc_grace_seconds in the CLI. On Tue, Feb 8, 2011 at 5:34 PM, Kallin Nagelberg wrote: > I'm trying to set the gc_grace_seconds column family parameter but no > luck.. I got the name of it from the comment in cassandra.yaml: > > # - gc_grace_sec

Re: OOM during batch_mutate

2011-02-08 Thread Chris Burroughs
On 02/07/2011 06:05 PM, Jonathan Ellis wrote: > Sounds like the keyspace was created on the 32GB machine, so it > guessed memtable sizes that are too large when run on the 16GB one. > Use "update column family" from the cli to cut the throughput and > operations thresholds in half, or to 1/4 to be

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
Thanks, gc_grace works in the CLI. However, I'm not observing the desired effect. I am setting TTL on a single column in my column family, and I see the columns disappear when using 'list Session' (my columnfamily) in the CLI. I created the column family with gc_grace = 60, and after observing for

Re: Cassandra memory consumption

2011-02-08 Thread Jonathan Ellis
I missed the part where you explained where you're getting your numbers from. On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon wrote: > It is really weird that I am the only one to have this issue. > I restarted Cassandra today and already the memory compution is over the > limit : > > root  1

Re: Best Approaches for Developer Integration

2011-02-08 Thread Jonathan Ellis
On Tue, Feb 8, 2011 at 10:38 AM, Eric Evans wrote: > I'm sure you already know this, but for the benefit of others, users of > Debian-based systems (yes, some of us do develop on Linux :) can apt-get > a package from the projects repository[1]. > > Installing the package is enough to get a complet

Re: Cassandra memory consumption

2011-02-08 Thread Ryan King
Which jvm and version are you using? -ryan On Tue, Feb 8, 2011 at 7:32 AM, Victor Kabdebon wrote: > It is really weird that I am the only one to have this issue. > I restarted Cassandra today and already the memory compution is over the > limit : > > root  1739  4.0 24.5 664968 494996 pts/4 

Re: OOM during batch_mutate

2011-02-08 Thread Jonathan Ellis
No, on 0.6 copying settings for a 32GB machine to a 16GB machine would also be a great way to OOM. The difference is that you had to set memtable thresholds globally in the xml file in 0.6, instead of being able to do it per-columnfamily from the cli. On Tue, Feb 8, 2011 at 10:40 AM, Chris Burrou

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis > I missed the part where you explained where you're g

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Information on the system : *Debian 5* *Jvm :* victor@testhost:~/database/apache-cassandra-0.6.6$ java -version java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) *RAM :* 2Go 2011/2/8 Victor Kabdebon > So

Re: time to live rows

2011-02-08 Thread Sylvain Lebresne
Did you force a major compaction (with jconsole or nodetool) after gc_grace has elapsed ? On Tue, Feb 8, 2011 at 5:46 PM, Kallin Nagelberg wrote: > Thanks, gc_grace works in the CLI. > > However, I'm not observing the desired effect. I am setting TTL on a > single column in my column family, and

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
Yes I did, on the org.apache.cassandra.db.ColumnFamilies.Main.Session object. -Kal On Tue, Feb 8, 2011 at 12:00 PM, Sylvain Lebresne wrote: > Did you force a major compaction (with jconsole or nodetool) after gc_grace > has elapsed ? > On Tue, Feb 8, 2011 at 5:46 PM, Kallin Nagelberg > wrote: >

Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread Dave Revell
Yes, this works well for me. I have no SCFs but many columns contain JSON. Depending on your time/space/compatibility tradeoffs you can obviously pick you own serialization method. Best, Dave On Feb 8, 2011 4:33 AM, "buddhasystem" wrote: > > Seeing that discussion here about indexes not supporte

Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Thanks for the comment! In my case, I want to store various time slices as indexes, so the content can be serialized as comma-separated concatenation of unique object IDs. Example: on 20101204, multiple clouds experienced a variety of errors in job execution. In addition, multiple users ran (or fa

Re: Best Approaches for Developer Integration

2011-02-08 Thread Sal Fuentes
Perhaps some of you may already be aware, but for the benefits of others: 1) https://github.com/fauna/cassandra does have a cassandra_helper script which will download and install Cassandra for development/testing purposes (although the cassandra_script might need to be updated to use 0.7) 2) For

Re: Subcolumn Indexing

2011-02-08 Thread Benjamin Coverston
Does this just mean the exhaustive list of the column names not all the values? No, this means the entire supercolumn, names and values. When the client tries to access any subcolumn in the supercolumn it has to read the entire supercolumn. So if I have a super column that has a map of key

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
I'm thinking if this row expiry notion doesn't pan out then I might create a 'lastAccessed' column with a secondary index (i think that's right) on it. Then I can periodically run a query to find all lastAccessed columns less than a certain value and manually delete them. Sound reasonable? -Kal O

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-08 Thread Aaron Morton
For completeness there are a couple of things in the config file that may be interesting if you run into issues.- column_index_size_in_kb defines how big a row has to get before an index is written for the row. Without an index the entire row must be read to find a column. - in_memory_compaction_li

RE: time to live rows

2011-02-08 Thread Jeremiah Jordan
You will have the same problem. You just have to learn to ignore empty rows when you query data. See articles on delete mentioned earlier. >>> >> >>> > http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives >>> >> >>> > http://wiki.apache.org/cassandra/FAQ#range_ghosts -Original Message

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-08 Thread Benjamin Coverston
On 2/4/11 11:58 PM, Ertio Lew wrote: Yes, a disadvantage of more no. of CF in terms of memory utilization which I see is: - if some CF is written less often as compared to other CFs, then the memtable would consume space in the memory until it is flushed, this memory space could have been much

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-08 Thread Ertio Lew
Thanks for adding up Benjamin! On Wed, Feb 9, 2011 at 1:40 AM, Benjamin Coverston wrote: > > > On 2/4/11 11:58 PM, Ertio Lew wrote: >> >> Yes, a disadvantage of more no. of CF in terms of memory utilization >> which I see is: - >> >> if some CF is written less often as compared to other CFs, then

Re: How do secondary indices work

2011-02-08 Thread Aaron Morton
Moving to the user group.On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:Hello, I'd like some information about how secondary indices work under the hood. 1) Is data stored in some external data structure, or is it stored in an actual Cassandra table, as columns within column families?

Re: cassandra-cli (output) broken for super columns

2011-02-08 Thread Aaron Morton
t; This is not what it's supposed to be like, is it? > > Looks alright: > >>> [default@foo] get foo[page-field]; >>> => (super_column=20110208, >>>(column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, >>> timestamp=1297159430471000) &g

Re: How do secondary indices work

2011-02-08 Thread Aaron Morton
AFAIK this was the ticket the original work was done under https://issues.apache.org/jira/browse/CASSANDRA-1415also  http://www.datastax.com/docs/0.7/data_model/secondary_indexesand  http://pycassa.github.com/pycassa/tutorial.html#indexes may help(sorry on reflection the email prob did not need to

RE: Subcolumn Indexing

2011-02-08 Thread Jeremy.Truelove
Thanks, I just wanted to make sure I understand how it worked. Sounds like the additional mapping vs super column method will work better for my purposes. From: Benjamin Coverston [mailto:ben.covers...@datastax.com] Sent: Tuesday, February 08, 2011 2:22 PM To: user@cassandra.apache.org Subject: R

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
I did read those articles, but I didn't know know that deleting all the columns on a row was equivalent to deleting the row. Like I mentioned, I did delete all the columns from all my rows and then forced compaction before and after gc_grace had passed, but all the rows still exist. If they never d

Re: Finding the intersection results of column sets of two rows

2011-02-08 Thread Aaron Morton
Makes sense, use a get_slice() against the second row and pass in the column names. Should e fine. If you run into performance issues look at slice_buffer_size and column_index_size in the config. Aaron On 9/02/2011, at 5:16 AM, Aklin_81 wrote: > Amongst two rows, where I need to find the c

Re: time to live rows

2011-02-08 Thread Benjamin Coverston
On 2/8/11 1:23 PM, Kallin Nagelberg wrote: I did read those articles, but I didn't know know that deleting all the columns on a row was equivalent to deleting the row. Like I mentioned, I did delete all the columns from all my rows and then forced compaction before and after gc_grace had passed,

Re: Finding the intersection results of column sets of two rows

2011-02-08 Thread Aklin_81
Thank you so much Aaron !! On Wed, Feb 9, 2011 at 2:11 AM, Aaron Morton wrote: > Makes sense, use a get_slice() against the second row and pass in the column > names. Should e fine. > > If you run into performance issues look at slice_buffer_size and > column_index_size in the config. > > Aaron

Re: time to live rows

2011-02-08 Thread Kallin Nagelberg
What's the secret recipe that I'm missing? I tried forcing compaction on my column family's JMX bean (org.apache.cassandra.db.ColumnFamilies.Main.Session) in jconsole, after gc_grace had passed (i set it to 60). Thanks, -Kal On Tue, Feb 8, 2011 at 3:46 PM, Benjamin Coverston wrote: > > On 2/8/11

error casting Column to SuperColumn during compaction. ? CASSANDRA-1992 ?

2011-02-08 Thread Aaron Morton
I got the error below on an newish 0.7.0 cluster with the following...- no schema changes. - original RF at 1, changed to 3 via cassandra-cli and repair run- stable node membership, i.e. no nodes added Was thinking it may have to do with  CASSANDRA-1992 (see http://www.mail-archive.com/user@cassand

Re: Cassandra memory consumption

2011-02-08 Thread Aaron Morton
When you attach to the JVM with JConsole how much non heap memory and how much heap memory is reported on the memory tab?Xmx controls the total size of the heap memory, which excludes the permanent generation. seehttp://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizin

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
I will do that in the future and I will post my results here ( I upgraded the server to debian 6 to see if there is any change, so memory is back to normal). I will report in a few days. In the meantime I am open to any suggestion... 2011/2/8 Aaron Morton > When you attach to the JVM with JConso

Re: error casting Column to SuperColumn during compaction. ? CASSANDRA-1992 ?

2011-02-08 Thread Jonathan Ellis
Looks like https://issues.apache.org/jira/browse/CASSANDRA-1992. On Tue, Feb 8, 2011 at 3:40 PM, Aaron Morton wrote: > I got the error below on an newish 0.7.0 cluster with the following... > - no schema changes. > - original RF at 1, changed to 3 via cassandra-cli and repair run > - stable node

cassandra memory is huge

2011-02-08 Thread Blaze Dowis
Why is it that when I start cassandra, it is taking up to 1G of memory? and how can I lessen this? here is a small portion of the startup dump. INFO 12:33:45,539 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1297208025539.log INFO 12:34:00,034 switching in a fresh Memt

Re: cassandra memory is huge

2011-02-08 Thread Joshua Partogi
Do you have loads of data? 1GB is quite reasonable knowing that 8GB is the recommended RAM size http://wiki.apache.org/cassandra/CassandraHardware Kind regards, Joshua. On Wed, Feb 9, 2011 at 10:48 AM, Blaze Dowis wrote: > Why is it that when I start cassandra, it is taking up to 1G of memory? a

Re: cassandra memory is huge

2011-02-08 Thread Aaron Morton
the JVM heap size is set in conf/cassandra-env.shIf not set it will use half the system memory. AaronOn 09 Feb, 2011,at 01:33 PM, Joshua Partogi wrote:Do you have loads of data? 1GB is quite reasonable knowing that 8GB is the recommended RAM size http://wiki.apache.org/cassandra/CassandraHardware

Re: Cassandra memory consumption

2011-02-08 Thread Edward Capriolo
On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon wrote: > I will do that in the future and I will post my results here ( I upgraded > the server to debian 6 to see if there is any change, so memory is back to > normal). I will report in a few days. > In the meantime I am open to any suggestion... >

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Yes I have, but I have to add that this is a server where there is so little data (2.0 Mo of text, rougly a book) that even if there were an overhead due to those things it would be minimal. I don't understand what's eating up all that memory, is it because of Linux that has difficulty getting rid

Re: Best way to detect/fix bitrot today?

2011-02-08 Thread Peter Schuller
> I should have clarified we have 3 copies, so in that case as long as 2 match > we should be ok? As far as I can think of, no. Whatever the reconciliation of two columns results in, is what the cluster is expected to converge to. So in the case of identical keys and mismatched values, tie breakin

ApplicationState Schema has drifted from DatabaseDescriptor

2011-02-08 Thread Aaron Morton
I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest stable 0.7 build "2011-02-08_20-41-25" (upgraded  node was jb-cass1 below). This is a long email, you can jump to the end and help me out by checking something on your  07 cluster. This is the value from o.a.c.gms.FailureD

Re: Best way to detect/fix bitrot today?

2011-02-08 Thread Peter Schuller
> One thing that we're doing for (guaranteed) immutable data is to use MD5 > signatures as keys... this will also prevent duplication, and it will allow > detection (if not correction) of bitrot at the app level easy. Yes. Another option is to checksum keys and/or values themselves by effectively

regarding space taken by different column families in Cassandra

2011-02-08 Thread abhinav prakash rai
I am using 4 column family in my application , the result of cfstats for space taken by different CF are as below- CF1-Space used (live) :7196159547 Space used (total): 14214373706 CF2- Space used (live) :2456495851 Space used (total): 906