Re: too many open files - maybe a fd leak in indexslicequeries

2011-03-31 Thread Jonathan Ellis
Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count towar

Re: Requests stuck on production cluster

2011-03-31 Thread Jonathan Ellis
What's going on in the logs? CPU? i/o? On Thu, Mar 31, 2011 at 4:20 AM, Or Yanay wrote: > Hi all, > > > > My production cluster reads got stuck. > > The ring gives: > > > > Address Status State LoadOwns > Token > > >

Endless minor compactions after heavy inserts

2011-03-31 Thread Sheng Chen
I've got a single node of cassandra 0.7.4, and I used the java stress tool to insert about 100 million records. The inserts took about 6 hours (45k inserts/sec) but the following minor compactions last for 2 days and the pending compaction jobs are still increasing. >From jconsole I can read the M

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
Yup, I screwed up the token setting, my bad. Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this weekend). Key and row caching was disabled

Re: nodetool cfstathistogram error

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 8:25 PM, mcasandra wrote: > It looks like if I use system schema it fails. Is it because of > LocalPartitioner? > > I ran with other keyspace and got following output. > > Offset SSTables Write Latency Read Latency Row Size Column Count > 1 0 0 0 0 0 > 2 0 0 0 0 0 > 179 0 0

A Simple scenario, Help needed

2011-03-31 Thread Prasanna Rajaperumal
Hi All, I am trying out a very simple scenario and I dont seem to get it working. It would be great if I am pointed to some things here. I have set up a 2 node cluster, cassandra.yaml being the default and same for each other than the seed: being each other and I have set the Thrift RPC addres

Re: Ditching Cassandra

2011-03-31 Thread Edward Capriolo
Gregori, Congrats on writing the fud-liest post of the month award. Firstly if you don't like updates give up on computers and software. Especally give up on anything that has to do with nosql because it is fast evolving. If you think you have a problem with the cassandra api, then what you really

Re: Does anyone build 0.7.4 on IDEA?

2011-03-31 Thread Maki Watanabe
ant on my command line had completed without error. Next I tried to build cassandra 0.7.4 in eclipse, and had luck. So I'll explore cassandra code with eclipse, rather than IDEA. maki 2011/3/31 Maki Watanabe : > Not yet. I'll try. > > maki > > 2011/3/31 Tommy Tynjä : >> Have you assured you are a

Re: nodetool cfstathistogram error

2011-03-31 Thread mcasandra
It looks like if I use system schema it fails. Is it because of LocalPartitioner? I ran with other keyspace and got following output. Offset SSTables Write Latency Read Latency Row Size Column Count 1 0 0 0 0 0 2 0 0 0 0 0 179 0 0 0 320 320 Can someone please help me understand the output in fi

nodetool cfstathistogram error

2011-03-31 Thread mcasandra
Cassandra 7.4: nodetool -h `hostname` cfhistograms system schema Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at $Proxy5.getRecentReadLatencyHistogramMicros(Unknown Source) at org.apache.cassandra.tools.NodeCmd.printCfHistograms(NodeCmd.java:452)

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Aaron Turner
On Thu, Mar 31, 2011 at 4:19 PM, Ryan King wrote: > We have a solution for time series data on cassandra at Twitter that > we'd like to open source, but it requires 0.8/trunk so we're not going > to release it until that's stable. > > See > http://www.slideshare.net/kevinweil/rainbird-realtime-an

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 6:15 PM, Eric Gilmore wrote: > A script that I have says the following: > > $ python ctokens.py > How many nodes are in your cluster? 2 > node 0: 0 > node 1: 85070591730234615865843651857942052864 > > The first token should be zero, for the reasons discussed here: > http://

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Paul Choi
Just finished looking at the slides. It looks awesome! On 3/31/11 4:19 PM, "Ryan King" wrote: >We have a solution for time series data on cassandra at Twitter that >we'd like to open source, but it requires 0.8/trunk so we're not going >to release it until that's stable. > >See >http://www.slid

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-31 Thread aaron morton
It iterates over all the SSTables and disk and estimates the number of keys by looking at how big the index is. It does not count the actual keys. aaron On 31 Mar 2011, at 17:46, Sheng Chen wrote: > I just found an estmateKeys() method of the ColumnFamilyStoreMBean. > Is there any indication

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Ryan King
We have a solution for time series data on cassandra at Twitter that we'd like to open source, but it requires 0.8/trunk so we're not going to release it until that's stable. See http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 -ryan On Thu, Mar 31, 2011 at

Re: Cassandra error Insufficient space to compact

2011-03-31 Thread aaron morton
Where are the connection refused messages ? Are they client side ? Can you cannot to the cluster with nodetool and run the ring command ? Aaron On 31 Mar 2011, at 11:44, Anurag Gujral wrote: > I restarted the cassandra node with more disks when I try to connect to > cassandra i get connection

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread David Hawthorne
I know cloudkick is doing something like this, and we're developing our own in-house method, but it would be nice for there to be a generically-available package that would do this. Lately I've been wishing that someone would take graphite (written in python) and put the frontend on top of cass

Re: Attempt to assign id to existing column family.

2011-03-31 Thread aaron morton
There is no reason to change the RF on the system keyspace, it should probably not be allowed. The system keyspace uses a LocalPartitioner and it's data is not replicated through the same mechanism as a user keyspace. Aaron On 31 Mar 2011, at 10:22, Jeremy Stribling wrote: > On 03/30/2011

RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Aaron Turner
I've been looking at replacing our PostgreSQL backend for RTG (a SNMP based polling and graphing solution for network traffic/ports) with something using Cassandra in order to solve our scalability and redundancy requirements. Based on a lot of what I've read, Cassandra is an ideal data store for

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-31 Thread aaron morton
It does not have a yaml file, so am assuming it's the default Random Partitioner. Aaron On 1 Apr 2011, at 04:51, Drew Kutcharian wrote: > Thanks Aaron, > > I have already checked out Twissandra. I was mainly looking to see how > Secondary Indexes can be used and how they effect Data Modeling

Re: How to determine if repair need to be run

2011-03-31 Thread Eric Gilmore
Peter, I want to join everyone else thanking you for helping out so much with this thread, and especially for pointing out the problems with the DS docs on this topic. We have some corrections posted today, and will keep looking to improve the information. On Thu, Mar 31, 2011 at 3:11 PM, Peter S

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Eric Gilmore
A script that I have says the following: $ python ctokens.py How many nodes are in your cluster? 2 node 0: 0 node 1: 85070591730234615865843651857942052864 The first token should be zero, for the reasons discussed here: http://www.datastax.com/dev/tutorials/getting_started_0_7/configuring#initial

too many open files - maybe a fd leak in indexslicequeries

2011-03-31 Thread Roland Gude
I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does any

Re: How to determine if repair need to be run

2011-03-31 Thread Peter Schuller
> Thanks a lot for elaborating on repairs.    Still, it's a bit fuzzy to me why > it is so important to run a repair before the GCGraceSeconds kicks in.   Does > this mean a delete does not get "replicated" ?   In other words when I delete > something on a node, doesn't cassandra set tombstones

Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken

Re: How to determine if repair need to be run

2011-03-31 Thread Jonathan Colby
Peter - Thanks a lot for elaborating on repairs.Still, it's a bit fuzzy to me why it is so important to run a repair before the GCGraceSeconds kicks in. Does this mean a delete does not get "replicated" ? In other words when I delete something on a node, doesn't cassandra set tombstones

Re: Not able to set ZERO consistency level

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 2:53 PM, Peter Schuller wrote: >> Only the following Levels are provided, I am wondering if the ZERO >> consistency level is removed in Cassandra 0.7.X ? > > Yes, it's gone. > >> If so, Could you please explain why was it removed and what is the best >> option I have given

Re: Not able to set ZERO consistency level

2011-03-31 Thread Peter Schuller
> Only the following Levels are provided, I am wondering if the ZERO > consistency level is removed in Cassandra 0.7.X ? Yes, it's gone. > If so, Could you please explain why was it removed and what is the best > option I have given my context. https://issues.apache.org/jira/browse/CASSANDRA-160

Not able to set ZERO consistency level

2011-03-31 Thread Prasanna Rajaperumal
Hi, I am dealing with reporting with not so important data and I am okay with data being lost. I would like to minimize the time taken for the actual data insert. I am using Cassandra 0.7.4 If it matter, using Hector to connect to Cassandra cZERO consistency level in Thrift Generated code org.ap

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-31 Thread Drew Kutcharian
Thanks Aaron, I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can be used and how they effect Data Modeling. There doesn't seem to be a lot of coverage on them. In addition, I couldn't tell what kind of Partitioner is Twissandra using and why. cheers,

Re: pycassa refresh server_list

2011-03-31 Thread Tyler Hobbs
ConnectionPool has a set_server_list() method that you can use to update the list of servers. (It appears this method did not make it into the docs; I'll make sure it gets in there.) Pycassa doesn't make any attempt to update the server list automatically right now. By the way, there is a pycass

Netstats out of sync?

2011-03-31 Thread buddhasystem
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the "source" node reports progress of the stream, whereas on the receving end netstats states that progress = 0. Did anyone see that? Do I need both nodes listed as seeds in cassandra.yaml? TIA/ -- View this message in context: ht

pycassa refresh server_list

2011-03-31 Thread A J
In the pycassa.pool.ConnectionPool class, I can specify all the nodes in server_list parameter. But overtime, when nodes get decomissioned and new nodes with new IPs get added, how can the server_list parameter be refereshed ? Do I have to modify it manually, or is there a way to update the list au

Re: How to determine if repair need to be run

2011-03-31 Thread mcasandra
If I am not wrong node repair need to be run on all the nodes in staggerred manner. It is required to take care of tombstones. Please correct me team if I am wrong :) See Distributed Deletes: http://wiki.apache.org/cassandra/Operations -- View this message in context: http://cassandra-user-in

Re: How to determine if repair need to be run

2011-03-31 Thread Peter Schuller
> silly question, would every cassandra installation need to have manual > repairs done on it? > > It would seem cassandra's "read repair" and regular compaction would take > care of keeping the data clean. > > Am I missing something? See my previous posts in this thread for the distinct reasons

Re: Working backwards from production to staging/dev

2011-03-31 Thread ian douglas
Thanks Edward, Anyone able to provide some answers for the other questions? On 03/26/2011 07:25 AM, Edward Capriolo wrote: On Fri, Mar 25, 2011 at 2:11 PM, ian douglas wrote: On 03/25/2011 10:12 AM, Jonathan Ellis wrote: On Fri, Mar 25, 2011 at 11:59 AM, ian douglaswrote: (we're runnin

Re: How to determine if repair need to be run

2011-03-31 Thread Jonathan Colby
silly question, would every cassandra installation need to have manual repairs done on it? It would seem cassandra's "read repair" and regular compaction would take care of keeping the data clean. Am I missing something? On Mar 30, 2011, at 7:46 PM, Peter Schuller wrote: >> I just wanted t

Re: Two column families or One super column family?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo wrote: > Hi Aaron, > > Thank you for your reply, i appreciate the suggestions you made. > > Yesterday i managed to get everything (our main read) in one CF, with the > use of a structure in a value like you suggested. > > Designing a new data model is diff

changing replication strategy and effects on replica nodes

2011-03-31 Thread Jonathan Colby
From my understanding of replica copies, cassandra picks which nodes to replicate the data based on replication strategy, and those same "replica partner" nodes are always used according to token ring distribution. If you change the replication strategy, does cassandra pick new nodes to repl

Re: Cassandra take a snapshot after a column family update

2011-03-31 Thread Roberto Bentivoglio
Ok, we'll do it for sure! Thanks, Roberto On 31 March 2011 14:56, aaron morton wrote: > Next time it happens take a note of the snapshot folder, different > processes name the folder differently. It may help track down what created > the snapshot. > > Cheers > Aaron > > On 31 Mar 2011, at 01:13

unsuscribe

2011-03-31 Thread Dario Bravo
-- Darío Bravo

Re: add new data directory to cassandra

2011-03-31 Thread aaron morton
AFAIK Cassandra will just pick the directory with the most space. Also AFAIK using multiple directories should only be considered a safety valve to fix problems such as the one you describe see http://www.mail-archive.com/user@cassandra.apache.org/msg07874.html Aaron On 31 Mar 2011, at 15:1

Re: Using RowMutations with super columns

2011-03-31 Thread aaron morton
The CassandraBulkLoader example is written to use Super Columns, so seems odd. Do you have the rest of the error stack ? Aaron On 31 Mar 2011, at 04:54, George Ciubotaru wrote: > Hello, > > I’m using CassandraBulkLoader.java > (https://svn.apache.org/repos/asf/cassandra/trunk/contrib/bmt

Re: Cassandra take a snapshot after a column family update

2011-03-31 Thread aaron morton
Next time it happens take a note of the snapshot folder, different processes name the folder differently. It may help track down what created the snapshot. Cheers Aaron On 31 Mar 2011, at 01:13, Roberto Bentivoglio wrote: > Hi Aaron, > I already deleted the snapshot folder unfortunately. > We

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-31 Thread aaron morton
Drew, The Twissandra project is a twitter clone in cassandra, it may give you some insight into how things can be modelled https://github.com/thobbs/twissandra If you are just starting then consider something like... - CF to hold the user, their data and their network l

Re: Does anyone build 0.7.4 on IDEA?

2011-03-31 Thread Maki Watanabe
Not yet. I'll try. maki 2011/3/31 Tommy Tynjä : > Have you assured you are able to build Cassandra outside > of IDEA, e.g. on command line? > > Best regards, > Tommy > @tommysdk > > On Thu, Mar 31, 2011 at 3:56 AM, Maki Watanabe > wrote: >> Hello, >> >> I'm trying to build and run cassandra 0.7

Re: Does anyone build 0.7.4 on IDEA?

2011-03-31 Thread Tommy Tynjä
I had troubles setting up my Cassandra IDE on IntelliJ IDEA 10 as well. The problems were related to IDEA not finding all the libraries necessary so I had to make sure all necessary libraries were downloaded and that hadoop directories etc were marked as source-folders in the project. I don't recog

Re: memtable_threshold

2011-03-31 Thread ruslan usifov
Fo all who reply on this topic, thanks, for you patience and explanations

Inconsistent results in queries with secondary index and index expression

2011-03-31 Thread Muga Nishizawa
Hi, When I iteratively get data with secondary index and index clause, result of data acquired by consistency level "one" is different from the one by consistency level "quorum". The one by consistecy level "one" is correct result. But the one by consistecy level "quorum" is incorrect and some d

Re: Any way to get different unique time UUIDs for the same time value?

2011-03-31 Thread Roshan Dawrani
Thanks a lot for sharing your inputs, guys... On Thu, Mar 31, 2011 at 6:47 AM, Drew Kutcharian wrote: > Hi Ed, > > Cool, I guess we both read/interpreted his post differently and gave two > valid answers ;) > > - Drew > > On Mar 30, 2011, at 5:40 PM, Ed Anuff wrote: > > > Hey Drew, I'm somewhat

RE: Requests stuck on production cluster

2011-03-31 Thread Or Yanay
I am using Cassandra 0.7.0 and Random Partitioner. From: Or Yanay [mailto:o...@peer39.com] Sent: Thursday, March 31, 2011 12:20 PM To: user@cassandra.apache.org Subject: Requests stuck on production cluster Hi all, My production cluster reads got stuck. The ring gives: Address Status St

Requests stuck on production cluster

2011-03-31 Thread Or Yanay
Hi all, My production cluster reads got stuck. The ring gives: Address Status State LoadOwnsToken 146231632500721020374621781629360107476 10.39.21.7 Up Normal 118.86 GB 18.15% 696879268146680791533

Re: Two column families or One super column family?

2011-03-31 Thread T Akhayo
Hi Aaron, Thank you for your reply, i appreciate the suggestions you made. Yesterday i managed to get everything (our main read) in one CF, with the use of a structure in a value like you suggested. Designing a new data model is different from what i'm used to, but if you keep in mind that you d

Re: Naming "issue" on nodetool repair command

2011-03-31 Thread Peter Schuller
> Woud you cassandra team think to add an alias name for nodetool > "repair" command? That thought has crossed my mind lately too; particularly in one of the recent threads. The problem seems analogous to 'fsck', and the distinction between fully expected by-design behavior needing fsck/repair is