Fetch all rows with entry in one specific column

2011-08-16 Thread Jens Hartung
Hi, is there a way to fetch all row, where the value of one specific column has a entry? And when yes, is this supported by CQL? In normal SQL the statement would call like "SELECT key FROM table WHERE column IS NOT NULL;" I searched the CQL pages and CLI pages on Datastax.com and found nothin

Cassandra London: failure modes and HBase

2011-08-16 Thread Dave Gardner
Hi all, I'm pleased to announce our next Cassandra meetup on 5th September in London. http://www.meetup.com/Cassandra-London/events/29668191/ We will be looking at failure modes in Cassandra (how it deals with nodes failing and returning etc..) as well as a comparison with HBase. It's a great o

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
but it seems the row cache is cluster wide, how will the change of row cache affect the read speed? On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis wrote: > Or leave row cache enabled but disable cache saving (and remove the > one already on disk). > > On Sun, Aug 14, 2011 at 5:05 PM, aaron mor

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
I saw alot slicequeryfilter things if changed the log level to DEBUG. just thought even bring up a new node will be faster than start the old one. it is wired DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:225@13130688454743

Re: Reg loading of hdfs file to Cassandra

2011-08-16 Thread Thamizh
Hi, I am using Cassandra-0.8.4 and Hadoop-0.20.2. Has this "bulk-loading" supported from Hadoop-API? Regards, Thamizhannal P --- On Wed, 10/8/11, Jonathan Ellis wrote: From: Jonathan Ellis Subject: Re: Reg loading of hdfs file to Cassandra To: user@cassandra.apache.org Date: Wednesday, 10

What causes dropped messages?

2011-08-16 Thread David Boxenhorn
How can I tell what's causing dropped messages? Is it just too much activity? I'm not getting any other, more specific messages, just these: WARN [ScheduledTasks:1] 2011-08-15 11:33:26,136 MessagingService.java (line 504) Dropped 1534 MUTATION messages in the last 5000ms WARN [ScheduledTasks:1] 2

Cassandra adding 500K + Super Column Family

2011-08-16 Thread Renato Bacelar da Silveira
I am wondering about a certain volume situation. I currently load a Keyspace with a certain amount of SCFs. Each SCF (Super Column Family) represents an entity. Each Entity may have up to 6000 values. I am planning to have 500,000 Entities (SCF) with 6000 Columns (within Super Columns - numbe

Re: ColumnFamilyOutputFormat problem

2011-08-16 Thread Jian Fang
If you look at the source code and you will find there is no log message in the ColumnFamilyOutputFormat class and the related classes. How to trace the problem then? No one actually got this working? On Thu, Aug 11, 2011 at 6:10 PM, aaron morton wrote: > Turn the logging up in cassandra or your

Truncate column families

2011-08-16 Thread Philippe
Hello, what are the guarantees regarding truncates issued through the CLI ? I have a 3 node ring at RF=3. No writes going to the keyspace at issue here. I go to the CLI on one of the nodes and issue a truncate on all CF of the keyspace. I run a list [CF] and make sure there is no data. When I run

dropping secondary indexes

2011-08-16 Thread Dan Kuebrich
I think I've dropped all the indexes on a CF, but I see traces of them in the CLI output of show keyspaces. I see a few validators left behind, and one "built index". (output below) 1. Is there a better way to check schema for indexes? 2. I can't drop the "built" one so I assume they're all gone

Re: Scalability question

2011-08-16 Thread Philippe
Teijo, Unfortunately my data set really does grow because it s a time series. I'm going to add a trick to aggregate old data but it will still grow. How often do you repair per day (or is it really continuous ?) I've been running experiments and I wonder if your decision to perform continuous rep

Re: Truncate column families

2011-08-16 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2950 On Tue, Aug 16, 2011 at 8:45 AM, Philippe wrote: > Hello, what are the guarantees regarding truncates issued through the CLI ? > I have a 3 node ring at RF=3. No writes going to the keyspace at issue here. > I go to the CLI on one of the nodes

Re: Truncate column families

2011-08-16 Thread Philippe
The title and the comments describe a node restarting. This is not my case. could it still be the same thing ? As always, I appreciate the quick answers you guys provide ! 2011/8/16 Jonathan Ellis > https://issues.apache.org/jira/browse/CASSANDRA-2950 > > On Tue, Aug 16, 2011 at 8:45 AM, Philip

Re: What causes dropped messages?

2011-08-16 Thread Jeremy Hanna
http://wiki.apache.org/cassandra/FAQ#dropped_messages As to what's causing them - look in the logs and it will do the equivalent of a nodetool tpstats right after the dropped messages messages. That should give you a clue as to why there are dropped messages - which thread pools are backed up

Re: Truncate column families

2011-08-16 Thread Jonathan Ellis
Probably not. In fact, I can't think of a scenario where truncated data could reappear w/o a restart, assuming the truncate completes successfully on all nodes. Is this 0.8.4? Can you reproduce with a toy cluster using https://github.com/pcmanus/ccm ? On Tue, Aug 16, 2011 at 9:44 AM, Philippe

Re: Truncate column families

2011-08-16 Thread Philippe
I'll try and report in September (I'm "on vacation" but trying to get the cluster to run !). Thanks 2011/8/16 Jonathan Ellis > Probably not. In fact, I can't think of a scenario where truncated > data could reappear w/o a restart, assuming the truncate completes > successfully on all nodes. I

Counter Column Family Inconsistent Node

2011-08-16 Thread Ryan Lowe
[default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats => (counter=APP, value=7503) => (counter=FILEUPLOAD, value=155) => (counter=MQUPLOAD, value=4726775) => (counter=PAGES, value=131948) => (counter=REST, value=3) => (counter=SOAP, value=44) => (counter=WS, va

Re: Counter Column Family Inconsistent Node

2011-08-16 Thread Ryan Lowe
yeah, sorry about that... pushed click before I added my comments. I have a cluster of 5 nodes using 0.8.4 where I am using counters. One one of my nodes, every time I do a list command I get different results. The counters jump all over the place. Any ideas? I have run nodetool repair on all

Partitioning, tokens, and sequential keys

2011-08-16 Thread David McNelis
We are currently running a three node cluster where we assigned the initial tokens using the Python script that is in the Wiki, and we're currently using the Random Partitioner, RF=1, Cassandra 0.8 from the Riptano RPM however we're seeing one node taken on over 60% of the data as we load data.

Re: Counter Column Family Inconsistent Node

2011-08-16 Thread Jonathan Ellis
May be the same as https://issues.apache.org/jira/browse/CASSANDRA-3006 ? On Tue, Aug 16, 2011 at 12:20 PM, Ryan Lowe wrote: > yeah, sorry about that... pushed click before I added my comments. > I have a cluster of 5 nodes using 0.8.4 where I am using counters.  One one > of my nodes, every time

Re: Partitioning, tokens, and sequential keys

2011-08-16 Thread Jonathan Ellis
what tokens did you end up using? are you sure it's actually due to different amounts of rows? have you run cleanup and compact to make sure it's not unused data / obsolete replicas taking up the space? On Tue, Aug 16, 2011 at 1:41 PM, David McNelis wrote: > We are currently running a three nod

Re: Partitioning, tokens, and sequential keys

2011-08-16 Thread David McNelis
Currently we have the initial_token for the seed node blank, and then the three tokens we ended up with are: 56713727820156410577229101238628035242 61396109050359754194262152792166260437 113427455640312821154458202477256070485 I would assume that we'd want to take the node that is 613961090503597

Re: Counter Column Family Inconsistent Node

2011-08-16 Thread Ryan Lowe
Actually I think it was more related to our servers getting their time out of sync... after finding this article: http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/ I checked our servers, and sure enough, 2 of them were out

RE: CQL query using 'OR' in WHERE clause

2011-08-16 Thread Deeter, Derek
Thanks, Jonathan! -Derek -- Derek Deeter, Sr. Software Engineer Intuit Financial Services (818) 597-5932 (x76932)5601 Lindero Canyon Rd. derek.dee...@digitalinsight.com Westlake, CA 91362   -Original Message- F

Re: Partitioning, tokens, and sequential keys

2011-08-16 Thread Jonathan Ellis
Yes, that looks about right. Totally baffled how the wiki script could spit out those tokens for a 3-node cluster. On Tue, Aug 16, 2011 at 2:04 PM, David McNelis wrote: > Currently we have the initial_token for the seed node blank, and then the > three tokens we ended  up with are: > 56713727820

Re: Unable to repair a node

2011-08-16 Thread Philippe
I'm still trying different stuff. Here are my latest findings, maybe someone will find them useful: - I have been able to repair some small column families by issuing a repair [KS] [CF]. When testing on the ring with no writes at all, it still takes about 2 repairs to get "consistent" log

Re: HOW TO select a column or all columns that start with X

2011-08-16 Thread Alvin UW
Hello, can anyone give an explanation of start.addComponent("abc", StringSerializer.get()) ; end.addComponent("abc", StringSerializer.get(), "UTF8Type", AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL) ; Suppose my composite column names are like ("bob", 1982), ("bob", 1976). There are m

Re: Unable to repair a node

2011-08-16 Thread Jonathan Ellis
On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote: > I have been able to repair some small column families by issuing a repair > [KS] [CF]. When testing on the ring with no writes at all, it still takes > about 2 repairs to get "consistent" logs for all AES requests. I think I linked these in anoth

Re: Unable to repair a node

2011-08-16 Thread Philippe
Even more interesting behavior : a repair on a CF has consequences on other CFs. I didn't expect that. There are no writes being issued to the cluster yet the logs indicate that - SSTableReader has opened dozens and dozens of files, most of them unrelated to the CF being repaired - compa

Re: Cassandra in Multiple Datacenters Active - Standby configuration

2011-08-16 Thread Oleg Tsvinev
Hi all, I followed instructions here: http://wiki.apache.org/cassandra/Operations#Token_selection to create a Cassandra cluster spanning two datacenters. Now I see that nodes belonging to DC2 datacenter own 0% of the ring. I would expect them to own 50%. Does anyone have an idea what's going on h

Re: Unable to repair a node

2011-08-16 Thread Philippe
Thanks for the pointers, responses inline. On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote: > > I have been able to repair some small column families by issuing a repair > > [KS] [CF]. When testing on the ring with no writes at all, it still takes > > about 2 repairs to get "consistent" logs for

Re: Cassandra for numerical data set

2011-08-16 Thread aaron morton
> > 2) > I'm doing batch writes to the database (pulling data from multiple resources > and put them together). I wish to know if there's some better methods to > improve the writing efficiency since it's just about the same speed as MySQL, > when writing sequentially. Seems like the commit

Re: Max heap not sticking?

2011-08-16 Thread aaron morton
Ian did you sort this out ? What values do you see passed in when you run ps aux | grep cassandra ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16/08/2011, at 1:08 PM, Ian Danforth wrote: > All, > > When I connect to a no

Re: upgrade from 0.7.6 to 0.8.4

2011-08-16 Thread aaron morton
Yes can get there with a rolling restart. Check the NEWS.txt file for info on how to upgrade https://github.com/apache/cassandra/blob/cassandra-0.8.4/NEWS.txt One thing to be aware of is that repairs can fail until all the data files are at the same version. This ticket is for better logging a

Re: Fetch all rows with entry in one specific column

2011-08-16 Thread aaron morton
None that I am aware of. Secondary indexes require an equality operator. You would need to build a custom secondary index if you needed to do this. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16/08/2011, at 9:33 PM, Jens Har

Re: node restart taking too long

2011-08-16 Thread aaron morton
the logs say it took a long time to read a saved row cache. Try removing the files from the saved_caches dir as Jonathan suggested. The collecting log lines with the INT max count are indicative of the IdentityQueryFilter. One of the places it is used is when adding rows to the cache. Cheers

Re: node restart taking too long

2011-08-16 Thread Teijo Holzer
Hi, yes, we saw exactly the same messages. We got rid of these by doing the following: * Set all row & key caches in your CFs to 0 via cassandra-cli * Kill Cassandra * Remove all files in the saved_caches directory * Start Cassandra * Slowly bring back row & key caches (if desired, we left them

Re: Unable to repair a node

2011-08-16 Thread Philippe
One last thought : what happens when you ctrl-c a nodetool repair ? Does it stop the repair on the server ? If not, then I think I have multiple repairs still running. Is there any way to check this ? Thanks 2011/8/16 Philippe > Even more interesting behavior : a repair on a CF has consequences

Re: Scalability question

2011-08-16 Thread Teijo Holzer
Hi, Unfortunately my data set really does grow because it s a time series. I'm going to add a trick to aggregate old data but it will still grow. That's fine, then you need to scale horizontally. Simply add a new node when the load on a node exceeds a threshold (ballpark figure here is a max

Re: Cassandra for numerical data set

2011-08-16 Thread Yi Yang
Thanks Aaron. >> 2) >> I'm doing batch writes to the database (pulling data from multiple resources >> and put them together). I wish to know if there's some better methods to >> improve the writing efficiency since it's just about the same speed as >> MySQL, when writing sequentially. See

Re: Cassandra adding 500K + Super Column Family

2011-08-16 Thread aaron morton
Are you planning to create 500,000 Super Column Families or 500,000 rows in a single Super Column Family ? The former is a somewhat crazy. Cassandra schemas typically have up to a few tens of Column Families. Each column family involves a certain amount of memory overhead, this is now automati

Re: ColumnFamilyOutputFormat problem

2011-08-16 Thread aaron morton
I suggested turning up the logging to see if the server processed a batch_mutate call. This is done from the CassandraServer class (https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/thrift/CassandraServer.java#L531) , not the CFOF. The first step will be to

Re: dropping secondary indexes

2011-08-16 Thread aaron morton
> I think I've dropped all the indexes on a CF, but I see traces of them in the > CLI output of show keyspaces. I see a few validators left behind, and one > "built index". (output below) What process did you use to drop the indexes ? You need to use update column family and not include the c

Re: Cassandra adding 500K + Super Column Family

2011-08-16 Thread Yi Yang
Sounds like it's a similar case as mine. The files are definitely, extremely big, 10x space overhead should be a good case if you are just putting values into it. I'm currently testing CASSANDRA-674 and hopes the better SSTable can solve the space overhead problem. Please follow my e-mail t

Re: Unable to repair a node

2011-08-16 Thread aaron morton
ctrl-c will not stop the repair. You kind of check things by looking at netstat compationstats , that will just tell you if there are compactions backing up. Not necessarily if they are validation compactions used during repairs. You can trawl the logs to look for messages from the AntiEntropy

Re: Cassandra for numerical data set

2011-08-16 Thread Yi Yang
BTW, If I'm going to insert a SCF row with ~400 columns and ~50 subcolumns under each column, how often should I do a mutation? per column or per row? On Aug 16, 2011, at 3:24 PM, Yi Yang wrote: > > Thanks Aaron. > >>> 2) >>> I'm doing batch writes to the database (pulling data from multiple

Re: Cassandra for numerical data set

2011-08-16 Thread aaron morton
> Is that because cassandra really cost a huge disk space? The general design approach is / has been that storage space is cheap and plentiful. > Well my target is to simply get the 1.3T compressed to 700 Gig so that I can > fit it into a single server, while keeping the same level of performa

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
does this need to be cluster wide? or I could just modify the caches on one node? since I could not connect to the node with cassandra-cli, it says "connection refused" [default@unknown] connect node2/9160; Exception connecting to node2/9160. Reason: Connection refused. so if I change the cac

apply deserializer to "list" cmd in cli?

2011-08-16 Thread Yang
[default@mykeyspace] list myCF; Using default limit of 100 --- RowKey: 49505f4652 => (counter=0004c77581888c00, value=161326) => (counter=00040001c77581888c00, value=161326) here the Rowkey is actually ascii encoding for "IP_FR", for "get" command we could specify "as

Re: Unable to repair a node

2011-08-16 Thread Philippe
> > ctrl-c will not stop the repair. > Ok, so that's why I've been seeing logs of repairs on other CFs That's probably the 2280 issue. Data from all CF's is streamed over > Ah, I get it now. Thanks > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorto

Bulk loading into live data

2011-08-16 Thread Philippe
http://www.datastax.com/dev/blog/bulk-loading indicates that "it is perfectly reasonable to load data into a live, active cluster." So lets say my cluster has a single KS & CF and it contains a key "test" with a SC named "Cass" and a normal subcolumn named "Data" that has value 1. If I SSTLoad da