Re: hinted handoff disabling trade-offs

2013-03-19 Thread aaron morton
> I think I understand what it means for > application-level data, but the part I'm not entirely sure about is > what it could mean for Cassandra internals. Internally it means the write will not be retries to nodes that were either down or did not ack before rpc_timeout. That's all. If you are

Re: Secondary Indexes

2013-03-19 Thread aaron morton
> - Will that result in Cassandra creating 18 new column families, > one for each index? Inserts will be slower, as each insert will potentially result in 18 additional inserts. This is just the same as a RDBMS, more indexes == more insert work. > - If a given column is not specified in any rows

Re: Reading a counter column value

2013-03-19 Thread aaron morton
it's just a standard get_slice call, the ColumnOrSuperColumn result has either Column, SuperColumn, CounterColumn or SuperCounterColumn set. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/03/2013, at 2:09 AM, C

Re: MultiInput/MultiGet CF in MapReduce

2013-03-19 Thread aaron morton
I would be looking at Hive or Pig, rather than writing the MapReduce. There is an example in the source cassandra distribution, or you can look at Data Stax Enterprise to start playing with Hive. Typically with hadoop queries you want to query a lot of data, if you are only querying a few row

Re: CQL in MapReduce

2013-03-19 Thread aaron morton
> It is possible to execute CQL statements in MapReduce? No as far as I know. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/03/2013, at 1:30 PM, Alicia Leong wrote: > Hi All, > > It is possible to execute CQL

Re: Unable to fetch large amount of rows

2013-03-19 Thread aaron morton
> I have 1000 timestamps, and for each timestamp, I have 500K different > MACAddress. So you are trying to read about 2 million columns ? 500K MACAddresses each with 3 other columns? > When I run the following query, I get RPC Timeout exceptions: What is the exception? Is it a client side soc

Re: Backup solution

2013-03-19 Thread aaron morton
IMHO this is a bad idea. * You secondary DC will have no redundancy, when you restart it you will be relying on HH and nodetool repair. * If your secondary DC machine fails so does the singe copy of your backup. * There will be additional management overhead for managing an unbalanced DC. * R

Re: Errors on replica nodes halt repair

2013-03-19 Thread aaron morton
It's easier to understand what's happening if you provide the full error message. It looks like out of order data in the files, nodetool scrub can fix that error. Try repairing a single CF at a time so you can work out which one is failing. Cheers - Aaron Morton Freelance C

Re: secondary index problem

2013-03-19 Thread aaron morton
> Seems like when we have updates that are large (10k rows in one mutate) the > problem is more likely to occur. 10K rows in one mutate is a very bad idea. It will take the nodes a long time to process them, risking time out, and it will essentially starving other requests. You should also sp

Re: Waiting on read repair?

2013-03-19 Thread aaron morton
> I checked the flushwriter thread pool stats and saw this: > Pool NameActive Pending Completed Blocked > All time blocked > FlushWriter 1 586183 >1 17582 That's not good. Is the IO system over utili

Re: Cassandra 1.2.0H startup exception

2013-03-19 Thread aaron morton
There is a write in the commit log that has an invalid row key. The logs will say which file it was replaying, try removing that and restarting it. What client were you using to write to the cluster? What are you using for the keys ? Note: this could result in data loss on the one node becaus

Re: Cassandra Compression and Wide Rows

2013-03-19 Thread Sylvain Lebresne
That's just describing what compression is about. Compression (not in C*, in general) is based on recognizing repeated pattern. So yes, in that sense, static column families are more likely to yield better compression ratio because it is more likely to have repeated patterns in the compressed bloc

RE: Unable to fetch large amount of rows

2013-03-19 Thread Pushkar Prasad
Aaron, Thanks for your reply. Here are the answers to questions you had asked: I am trying to read all the rows which have a particular TimeStamp. In my data base, there are 500 K entries for a particular TimeStamp. That means about 40 MB of data. The query returns fine if I request for lesser n

Re: Cassandra 1.2.0H startup exception

2013-03-19 Thread 杨辉强
Hi, Aaron Morton: I thought it would be OK when I delete all the commit log. After delete and restarting, it still failed. So I stop all the write/read client and restart it, it still throw an exception: DEBUG 16:32:16,265 Resetting pool for /xx.xx.xx.xx DEBUG 16:32:16,265 No bootstrapping,

Speaking in Bangalore

2013-03-19 Thread aaron morton
I'm going to be speaking at the Bangalore Cassandra meetup on Friday April 5, hosted at Apigee. For details see http://www.meetup.com/Apache-Cassandra/events/108524582/ Aaron - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com

RE: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-19 Thread moshe.kranc
This obscure feature of Cassandra is called "haunted handoff". Happy (early) April Fools :) From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 18, 2013 7:45 PM To: user@cassandra.apache.org Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10 As you see,

recv_describe_keyspace bug in org.apache.cassandra.thrift.Cassandra ?

2013-03-19 Thread cscetbon.ext
Hi, I'm testing Pig (0.11) with Cassandra (1.2.2). I've noticed that when the column family is created without WITH COMPACT STORAGE clause, Pig can't find it :( After searching in the code, I've found that the issue comes from the function recv_describe_keyspace. This function returns a KsDef w

Re: How to configure linux service for Cassandra

2013-03-19 Thread Andrew Cobley
Does this help you ? https://github.com/acobley/CassandraStartup it was built for Raspbian, but might help you. Andy On 19 Mar 2013, at 11:10, Roshan mailto:codeva...@gmail.com>> wrote: Hi I want to start the cassandra as a service. At the moment it is starting as a background task. Cassand

Debian repository update pb

2013-03-19 Thread bastien vigneron
Hello, It seem to the 11x debian repository as not been updated since 1.1.9 release : root@cnode01:~# apt-cache show cassandra Package: cassandra Version: 1.1.9 Architecture: all Maintainer: Eric Evans Installed-Size: 11539 Depends: openjdk-6-jre-headless (>= 6b11) | java6-runtime, jsvc (>= 1.0)

Bootstrapping a node in 1.2.2

2013-03-19 Thread Andrew Bialecki
I've got a 3 node cluster in 1.2.2 and just bootstrapped a new node into it. For each of the existing nodes, I had num tokens set to 256 and for the new node I also had it set to 256, however after bootstrapping into the cluster, "nodetool status " for my main keyspace which has RF=2 now reports:

RE: How to configure linux service for Cassandra

2013-03-19 Thread Jason Kushmaul | WDA
I'm not sure about the Cent OS version, but you could utilize the hard work that datastax has done with their community edition RPMs, an init script is installed for you. -Original Message- From: Roshan [mailto:codeva...@gmail.com] Sent: Tuesday, March 19, 2013 7:10 AM To: cassand

Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam
Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Hiller, Dean
I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so stre

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam
Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean wrote: > I have not done this as of yet but from all that I have read your best > option is to follow the replac

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Hiller, Dean
Since you "cleared" out that node, it IS the replacement node. Dean From: Jabbar Azam mailto:aja...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Tuesday, March 19, 2013 9:29 AM To: "user@cassandra.apache.org

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam
Yes you're probably right. I don't really understand the token generation so was reluctant to do that. I'll install linux on the faulty node now and let you know what happens. On 19 March 2013 15:38, Hiller, Dean wrote: > Since you "cleared" out that node, it IS the replacement node. > > Dean >

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Alain RODRIGUEZ
In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. (http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change t

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Jabbar Azam
Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ wrote: > In 1.2, you may want to use the nodetool removenode if your server i > broken or unreachable, else I guess nodetool decommission remains the good > way to remove a node. ( > http://

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Marco Matarazzo
Is nodetool removenode / decommission actually needed having a RF > 1 ? What does it do, exactly ? Il giorno 19/mar/2013, alle ore 16:45, Alain RODRIGUEZ ha scritto: > In 1.2, you may want to use the nodetool removenode if your server i broken > or unreachable, else I guess nodetool decommiss

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Alain RODRIGUEZ
Decommission doesn't need a RF > 1 since it is run from the node being removed from the cluster. It gives the data to the next node in the ring, that will be responsible for it before leaving. Removenode (At least if it is like the old removetoken) use replicas to dispatch the data to their new nod

Re: Debian repository update pb

2013-03-19 Thread Sylvain Lebresne
> It seem to the 11x debian repository as not been updated since 1.1.9 > release : > I believe what happened is that I reverted it to 1.1.9 when releasing 1.2.3 due to a bad rsync. Anyway, it should be fixed now. Thanks for reporting it. -- Sylvain > > root@cnode01:~# apt-cache show cassandra

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Marco Matarazzo
I'm still missing something, please excuse me. Let's say, for example, that I have a 4 node cluster with a replica factor of 2. One node goes down and I have to reinstall it. In the meantime the cluster still works and data is read and written. After a while the node is reinstalled, same IP is

Re: Secondary Indexes

2013-03-19 Thread Mayank
Thanks guys. I am working with Andy on this project. Further questions on the secondary indexes: Assuming we have 1000 columns in 1 row of the column family and about 900 of them have NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. If I query for columns which have Named

Re: Waiting on read repair?

2013-03-19 Thread Jasdeep Hundal
No secondary indexes and I don't think it's IO. Commit log/data are on separate SSD's (and C* is the only user of these disks), and the total amount of data being written to the cluster is much less than the SSD's are capable of writing (including replication I think we're at about 10% of what the

Re: Cassandra Compression and Wide Rows

2013-03-19 Thread Drew Kutcharian
Thanks Sylvain. So C* compression is block based and has nothing to do with format of the rows. On Mar 19, 2013, at 1:31 AM, Sylvain Lebresne wrote: > That's just describing what compression is about. Compression (not in C*, in > general) is based on recognizing repeated pattern. > > So yes,

java.io.IOException: FAILED_TO_UNCOMPRESS(5) exception when running nodetool rebuild

2013-03-19 Thread Ondřej Černoš
Hi all, I am running into strange error when bootstrapping Cassandra cluster in multiple datacenter setup. The setup is as follows: 3 nodes in AWS east, 3 nodes somewhere on Rackspace/Openstack. I use my own snitch based on EC2MultiRegionSnitch (it just adds some ec2 avalability zone parsing capa

Re: secondary index problem

2013-03-19 Thread Brett Tinling
We are using CL ONE for mutates. As for the large batches, yes, our use pattern has exceeded the initial understanding. We plan to rewrite this bit, but it has not been a problem so far (or maybe this index thing is the problem that forces the rewrite?). On the rare timeout, we retry... I h

Re: How to configure linux service for Cassandra

2013-03-19 Thread Wei Zhu
Are you looking for something like this http://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-services-chkconfig.html Thanks. -Wei - Original Message - From: "Jason Kushmaul | WDA" To: "user@cassandra.apache.org" Sent: Tuesday, March 19, 2013 5:58:37 AM Subject: RE: How to config

Re: Recovering from a faulty cassandra node

2013-03-19 Thread Wei Zhu
Hi Dean, If you are not using VNode and try to replace the node, use the new token as old token -1, not +1. The reason is that, the assignment of token is clock wise along the ring. If you set your new token to be old token -1, the new node will take over all the data of the old node except for

Cassandra 1.2.2 | Unexpected Connection Pool Shutdown

2013-03-19 Thread radu.manolescu
We have recently upgraded to C* 1.2.2 from 1.0.2, and we have started seeing errors such as the one below. Our app collects changes and then flushes them out to C* in a batch. Sometimes (at high volume) we see the following error: The log shows this error repeated for each host in the ring (total

Truncate behaviour

2013-03-19 Thread Víctor Hugo Oliveira Molinar
Hello guys! I'm researching the behaviour for truncate operations at cassandra. Reading the oficial wiki page(http://wiki.apache.org/cassandra/API) we can understand it as: *"Removes all the rows from the given column family."* And reading the DataStax page( http://www.datastax.com/docs/1.0/refer

Java client options for C* v1.2

2013-03-19 Thread Marko Asplund
Hi, I'm about to start my first Cassandra project and am a bit puzzled by the multitude of different client options available for Java. Are there any good comparisons of the different options that's been done recently? I'd like choose a client that - is feature complete (provides access to all C

Re: Java client options for C* v1.2

2013-03-19 Thread Víctor Hugo Oliveira Molinar
I guess Hector fits your requirements. The last release is pretty new. But i'd suggest you to take a look at astyanax too. On Tue, Mar 19, 2013 at 6:34 PM, Marko Asplund wrote: > Hi, > > I'm about to start my first Cassandra project and am a bit puzzled by > the multitude of different client opti

Re: Truncate behaviour

2013-03-19 Thread Wei Zhu
There is setting in the cassandra.yaml file which controls that. # Whether or not a snapshot is taken of the data before keyspace truncation # or dropping of column families. The STRONGLY advised default of true # should be used to provide data safety. If you set this flag to false, you will # l

Re: Truncate behaviour

2013-03-19 Thread Víctor Hugo Oliveira Molinar
Hum, my bad. Thank you! On Tue, Mar 19, 2013 at 11:55 PM, Wei Zhu wrote: > There is setting in the cassandra.yaml file which controls that. > > > # Whether or not a snapshot is taken of the data before keyspace truncation > # or dropping of column families. The STRONGLY advised default of true

Continuing high CPU usage (98%) after cassandra1.2.0 startup.

2013-03-19 Thread 杨辉强
Hi, Every time I restart the cassandra server, the cpu usage continue to be very high(98%) for days. But I have no reading or writing to this server. I have tried the follow cmd: date; date `date +"%m%d%H%M%C%y.%S"`; date; It doesn't work. The tail of system.log: DEBUG [Thrift:1701] 2013-0

Re: recv_describe_keyspace bug in org.apache.cassandra.thrift.Cassandra ?

2013-03-19 Thread aaron morton
By design. There may be a plan to change in the future, I'm not aware of one though. CQL 3 tables created without COMPACT STORAGE store all keys and columns using Composite Types. They also store some additional columns you may not expect. If you want to interrop with thrift based API's like

Re: Secondary Indexes

2013-03-19 Thread aaron morton
> Assuming we have 1000 columns in 1 row of the column family and about 900 of > them have > > NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. Am assuming you mean 1,000 rows not columns. > does Cassandra > optimized this in any way by fetching only the 10 versus the 9

Overheads in fetching many (500K) rows for a partitionID

2013-03-19 Thread Pushkar Prasad
With the following schema: - TimeStamp - Device ID - Device Name - Device Owner - Device Color PKEY (TimeStamp, DeviceID) Each record is 40 bytes. I'm trying to fetch all the rows for a particular TimeStamp (partitionID). Select * from schema where TimeStamp = '.' There ar