Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun
Hello, I'm running Cassandra 0.6.0 on a cluster and have an application that needs to read all rows from a column family using the Cassandra Thrift API. Ideally, I'd like to be able to do this by having all nodes in the cluster read in parallel (i.e., each node reads a disjoint set of rows th

Question about hinted handoff

2010-07-08 Thread ChingShen
Hi all, Please consider this case: (RF=1, CL=ONE) 1. I have A, B and C nodes. 2. A node is a coordinator node, it sends a request to B node to do write operation. 3. B node is down during write operation, so return failure message to client, and write a hint to C node. 4. B node comes b

Re: Query on delete a column inside a super column

2010-07-08 Thread Moses Dinakaran
As per my knowledge in phpCassa I didnt find any option to remove a column from the supercolumn, The remove method removes the whole super column from the key, will check with thrift api. Through mutation object insert/update happens but removing a column dosent happen. Thank you all. Regards Mo

How to stop Cassandra running in embeded mode

2010-07-08 Thread Andriy Kopachevsky
Hi, we are trying to set up intergation testing for Cassanrda, so we need to run and stop it as embeded service. Don't have any problem to start cassandra: import org.apache.cassandra.contrib.utils.service.CassandraServiceDataCleaner; class SomeTestClass { @Before public void setup() thr

Re: Why so many commitlogs ?

2010-07-08 Thread Anty
Hi:Jonathan I have found out what's going wrong. I change the configuration 1440 which prevent memtables of LocaitonInfo and HintsColumnFamily to flush if there are a few hint records writen to many commitlog segment. On Thu, Jul 8, 2010 at 9:43 AM, Anty wrote: > > > On Thu, Jul 8, 2010 at 9:21 A

Re: Question about hinted handoff

2010-07-08 Thread Anty
On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote: > Hi all, > > Please consider this case: (RF=1, CL=ONE) > > 1. I have A, B and C nodes. > 2. A node is a coordinator node, it sends a request to B node to do write > operation. > No ,will not choose B , write the data locally in Node A. if RF

Gossip round time

2010-07-08 Thread ChingShen
Hi, I found the http://www.slideshare.net/adorepump/cassandra-nosql ppt, that mentioned "State disseminated in* O(logN)* rounds where N is the number of nodes in the cluster." about gossip on page 11. Is it wrong to draw on page 15? does it need round 4? Thanks. Shen

Re: Question about hinted handoff

2010-07-08 Thread Anty
Sorry I am wrong .Miss the CF=one. On Thu, Jul 8, 2010 at 5:27 PM, Anty wrote: > > > On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote: > >> Hi all, >> >> Please consider this case: (RF=1, CL=ONE) >> >> 1. I have A, B and C nodes. >> 2. A node is a coordinator node, it sends a request to B

Re: Question about hinted handoff

2010-07-08 Thread ChingShen
So, am I correctly? Shen On Thu, Jul 8, 2010 at 5:33 PM, Anty wrote: > Sorry I am wrong .Miss the CF=one. > > > On Thu, Jul 8, 2010 at 5:27 PM, Anty wrote: > >> >> >> On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote: >> >>> Hi all, >>> >>> Please consider this case: (RF=1, CL=ONE) >>> >>> 1

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-08 Thread Julie
Jonathan Ellis gmail.com> writes: > "SSTables that are obsoleted by a compaction are deleted > asynchronously when the JVM performs a GC. You can force a GC from > jconsole if necessary, but Cassandra will force one itself if it > detects that it is low on space. A compaction marker is also added

Re: High CPU usage on all nodes without any read or write

2010-07-08 Thread Olivier Rosello
Hi, Thank you for your help. I don't know if data is writing too fast to the cluster, but I don't think so (nodes are heavy, big CPU, 12GB RAM...) and there is no so much data (2000 inserts/sec for about 300 KB/sec of raw data). I trashed all data yesterday 6pm (GMT+2) and launched all again.

Re: Question about hinted handoff

2010-07-08 Thread Anty
On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote: > Hi all, > > Please consider this case: (RF=1, CL=ONE) > > 1. I have A, B and C nodes. > 2. A node is a coordinator node, it sends a request to B node to do write > operation. > 3. B node is down during write operation, so return failure m

Re: Question about hinted handoff

2010-07-08 Thread ChingShen
If so, when does hinted handoff work? On Thu, Jul 8, 2010 at 9:55 PM, Anty wrote: > > > On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote: > >> Hi all, >> >> Please consider this case: (RF=1, CL=ONE) >> >> 1. I have A, B and C nodes. >> 2. A node is a coordinator node, it sends a request to

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 12:45 AM, ChingShen wrote: > hmm... I'm really confused. > The http://wiki.apache.org/cassandra/API document mentioned that if write > ConsistencyLevel=ANY that "Ensure the write has been written to at least 1 > node, including hinted recipients.", I couldn't imagine this ca

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Jonathan Ellis
"CFRR does this. Is this possible?" I guess I don't understand the question. :) On Thu, Jul 8, 2010 at 2:21 AM, Brent N. Chun wrote: > Hello, > > I'm running Cassandra 0.6.0 on a cluster and have an application that needs > to read all rows from a column family using the Cassandra Thrift API. >

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
Thanks Jonathan Ellis, I want to make sure that after A return failure message to client at CL.ONE, *does A write a hint to C?* If so, although the write operation is failed, but the data is still stored in C? if B comes back up, then C forwards to B? Shen On Thu, Jul 8, 2010 at 10:08 PM, Jona

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 10:23 AM, ChingShen wrote: > Thanks Jonathan Ellis, > >   I want to make sure that after A return failure message to client at > CL.ONE, does A write a hint to C? No. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cass

Use of multiple Keyspaces

2010-07-08 Thread Dwight Smith
Hi I am new to Cassandra and am preparing a data model for use in a production environment, and need to decide if using multiple keyspaces has any benefit. There are basically two types of data; the first, large numbers ( ~1750K) of entries which are written, very few reads, and then rem

Re: Backing up the data stored in cassandra

2010-07-08 Thread Jonathan Ellis
see http://wiki.apache.org/cassandra/Operations On Thu, Jul 8, 2010 at 12:50 AM, Dave Viner wrote: > Hi all, > What is the recommended strategy for backing up the data stored inside > cassandra? > I realized that Cass. is a distributed database, and with a decent > replication factor, backups are

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Thomas Heller
Hey, > Is > this possible in 0.6.0? (Note: for the next startToken, I was just planning > on computing the MD5 digest of the last key directly since I'm accessing > Cassandra through Thrift.) Can't speak for 0.6.0 but it works for 0.6.3. Just implemented this in ruby (minus the parallel par

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
Hmm.. as you mentioned that it will *write a hint *and report success at CL.ANY, does the hinted handoff only work at CL.ANY? Thanks. On Thu, Jul 8, 2010 at 11:29 PM, Jonathan Ellis wrote: > On Thu, Jul 8, 2010 at 10:23 AM, ChingShen > wrote: > > Thanks Jonathan Ellis, > > > > I want to make

Understanding atomicity in Cassandra

2010-07-08 Thread Stuart Langridge
Hi, Cassandra people! We're looking at Cassandra as a possible replacement for some parts of our database structures, and on an early look I'm a bit confused about atomicity guarantees and rollbacks and such, so I wanted to ask what standard practice is for dealing with the sorts of situation I ou

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
On Thu, Jul 8, 2010 at 9:02 AM, ChingShen wrote: > Hmm.. as you mentioned that it will write a hint and report success at > CL.ANY, does the hinted handoff only work at CL.ANY? > Still no. Hints are written when nodes are down, regardless of CL, unless HH is disabled. CL does not influence whet

Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
There is a memtable per CF, regardless of how many keyspaces you have.  I'd pay more attention to the delete/compaction side of things if you are going to be doing that many deletions. Also, your mail client's formatting is broken. b On Thu, Jul 8, 2010 at 8:45 AM, Dwight Smith wrote: > Hi > >

Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
(and I'm sure someone will correct me if I am wrong on that) On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black wrote: > There is a memtable per CF, regardless of how many keyspaces you have.

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black wrote: > On Thu, Jul 8, 2010 at 9:02 AM, ChingShen wrote: >> Hmm.. as you mentioned that it will write a hint and report success at >> CL.ANY, does the hinted handoff only work at CL.ANY? >> > > Still no.  Hints are written when nodes are down, regar

RE: Use of multiple Keyspaces

2010-07-08 Thread Dwight Smith
Thanks - I found on Wiki that the memtables and sstables are on a per CF basis. Sorry about the mail client formatting - I have no choice - corporate controlled:) Now I am concerned about the deletions - what areas should I investigate to understand the concerns you raise? Thanks again -Or

Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
as rcoli just reminded me, i should be more clear that it is 1 _active_ memtable per CF, but there may be several pending flush. space from deletions is only reclaimed after GCGraceSeconds has elapsed AND a major compaction is run. default for the former is 10 days. the latter is not automatic.

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
Important safety tip, I did not know that. On Thu, Jul 8, 2010 at 11:31 AM, Jonathan Ellis wrote: > On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black wrote: >> On Thu, Jul 8, 2010 at 9:02 AM, ChingShen wrote: >>> Hmm.. as you mentioned that it will write a hint and report success at >>> CL.ANY, do

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
To clarify, this requires the coordinator know nodes are down. If the nodes are marked UP, but do not confirm the writes, this behavior does not seem possible. On Thu, Jul 8, 2010 at 11:31 AM, Jonathan Ellis wrote: > On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black wrote: >> On Thu, Jul 8, 2010 a

Re: Coke Products at Digg?

2010-07-08 Thread malcolm smith
I thought it was NoCola solutions or NotOnlyCola rather than UnCola. On Wed, Jul 7, 2010 at 11:55 AM, Miguel Verde wrote: > Dr. Pepper has recently been picked up by Coca Cola as well. I wonder if > the UnCola solutions like 7Up and Fanta are just a fad? > > > On Wed, Jul 7, 2010 at 10:50 AM, M

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun
Hi Jonathan, The code snippet below was from the repository. I mentioned 0.6.0 specifically just to confirm that reading a CF using token-based range queries with the RandomPartitioner should (or shouldn't) also work in that version. I've seen discussions about whether range queries are now s

http://scale.metaoptimize.com/

2010-07-08 Thread Ran Tavory
Just found this site and thought it might be interesting to folks on this list. http://scale.metaoptimize.com/ It's a stack-overflow style qna site, in their words: > A community interested in scalability, high availability, data stores, > NoSQL, distributed computing, parallel computing, cloud co

Visual Tools for Cassandra

2010-07-08 Thread Torla, William
Does anybody know of any recently developed UI based tools for Cassandra? Ideally a tool capable of seeing nodes across a cluster would be preferred. The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
Right, if the nodes are marked up but do not confirm the writes, it will result in a TimedOutException. (It still won't generate hinted writes). To summarize: hinted writes are only generated when Cassandra (a) knows a target is down ahead of time and (b) still has enough UP targets to satisfy th

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Jonathan Ellis
There have been a number of bug fixes to this since 0.6.0 -- as Thomas said, it works in 0.6.3. (Although there is one related bug scheduled to be fixed in 0.6.4, https://issues.apache.org/jira/browse/CASSANDRA-1042) On Thu, Jul 8, 2010 at 2:06 PM, Brent N. Chun wrote: > Hi Jonathan, > > The cod

Re: Visual Tools for Cassandra

2010-07-08 Thread Eben Hewitt
Suguru Namura's Web Console may have some of what you need: http://github.com/suguru/cassandra-webconsole Eben On Thu, Jul 8, 2010 at 1:00 PM, Torla, William wrote: > Does anybody know of any recently developed UI based tools for Cassandra? > Idea

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun
Jonathan Ellis wrote: There have been a number of bug fixes to this since 0.6.0 -- as Thomas said, it works in 0.6.3. (Although there is one related bug scheduled to be fixed in 0.6.4, https://issues.apache.org/jira/browse/CASSANDRA-1042) Ah, this is exactly one of the cases I've been seeing!

Re: Coke Products at Digg?

2010-07-08 Thread Daniel Jue
We've developed a beverage API called Koozie which allows drinkers to remain soda agnostic. It supports all popular canned liquids and Drink Injection thought its integrated Inversion Of Can container. On Thu, Jul 8, 2010 at 2:55 PM, malcolm smith wrote: > I thought it was NoCola solutions or Not

Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun
Thomas Heller wrote: Hey, Is this possible in 0.6.0? (Note: for the next startToken, I was just planning on computing the MD5 digest of the last key directly since I'm accessing Cassandra through Thrift.) Can't speak for 0.6.0 but it works for 0.6.3. Just implemented this in ruby (minus

get_range_slices

2010-07-08 Thread Jonathan Shook
Should I ever expect multiples of the same key (with non-empty column sets) from the same get_range_slices call? I've verified that the column data is identical byte-for-byte, as well, including column timestamps?

Why is cassandra named cassandra?

2010-07-08 Thread ChingShen
Hi, Why is cassandra named cassandra? Thanks. Shen

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
On Fri, Jul 9, 2010 at 4:32 AM, Jonathan Ellis wrote: > If the coordinator knows it can't achieve the requested CL it won't do > any writes, hinted or otherwise, and will immediately report > UnavailableException to the client. > To summarize: hinted writes are only generated when Cassandra (a) >

Re: get_range_slices

2010-07-08 Thread Mike Malone
I think the answer to your question is no, you shouldn't. I'm feeling far too lazy to do even light research on the topic, but I remember there being a bug where replicas weren't consolidated and you'd get a result set that included data from each replica that was consulted for a query. That could

Re: Digg 4 Preview on TWiT

2010-07-08 Thread Jeremy Davis
That is an interesting statistic. 1 TB per node? Care to share any more info on the specs of this cluster? Drive types/Cores per node/etc... -JD On Tue, Jul 6, 2010 at 12:01 PM, Prashant Malik wrote: > This is a ridiculous statement by some newbie I guess , We today have a 150 > node Cassandra

Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 10:45 PM, ChingShen wrote: > Ok, If so, I suppose that A sends requests to B, C and D nodes(RF=3) at > CL.QUORUM, but D is down, then return success message to the client, and A > write a hint to E node? until D comes back up then E forwards the data to D? If it knows that

Re: High CPU usage on all nodes without any read or write

2010-07-08 Thread Peter Schuller
> But in Cassandra output log : > r...@cassandra-2:~#  tail -f /var/log/cassandra/output.log >  INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600 reclaimed > leaving 1684169392 used; max is 6563430400 >  INFO 15:32:09,875 GC for ConcurrentMarkSweep: 1363 ms, 4296991416 reclaimed >