cassandra performance

2013-03-24 Thread 张刚
Hello, I am new to Cassandra.I do some test on a single machine. I install Cassandra with a binary tarball distribution. I create a CF to store the data that get from MySQL. The CF has the same fields as the table in MySQL. So it looks like a table. I do the same select from the CF in Cassandra and

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie
Thanks, aaron. On Sun, Mar 24, 2013 at 1:45 AM, aaron morton wrote: > But a error is thrown saying "can not parse name as hex bytes". > > If the comparator is Bytes then the column names need to be a hex string. > > The easiest thing to do is create a CF where the comparator is UTF8Type so > you

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie
On Sun, Mar 24, 2013 at 1:45 AM, aaron morton wrote: > But a error is thrown saying "can not parse name as hex bytes". > > If the comparator is Bytes then the column names need to be a hex string. > > The easiest thing to do is create a CF where the comparator is UTF8Type so > you can use string c

Re: cassandra performance

2013-03-24 Thread dong.yajun
Hello, I'd suggest you to take look at the difference between Nosql and RDMS. Best, On Sun, Mar 24, 2013 at 5:15 PM, 张刚 wrote: > Hello, > I am new to Cassandra.I do some test on a single machine. I install > Cassandra with a binary tarball distribution. > I create a CF to store the data that

Re: cassandra performance

2013-03-24 Thread cem
Hi, Could you provide some other details about your schema design and queries? It is very hard to tell anything. Regards, Cem On Sun, Mar 24, 2013 at 12:40 PM, dong.yajun wrote: > Hello, > > I'd suggest you to take look at the difference between Nosql and RDMS. > > Best, > > On Sun, Mar 24, 2

TimeUUID Order Partitioner

2013-03-24 Thread Carlos Pérez Miguel
Hi, I store in my system rows where the key is a UUID version1, TimeUUID. I would like to maintain rows ordered by time. I know that in this case, it is recomended to use an external CF where column names are UUID ordered by time. But in my use case this is not possible, so I would like to use a c

Re: cassandra performance

2013-03-24 Thread 张刚
For example,each row represent a job record,it has fields like "user","site","CPUTime","datasize","JobType"... The fields in CF is fixed,just like a table.The query like this "select CPUTime,User,site from CF(or tablename) where user=xxx and Jobtype=xxx" Best regards 2013/3/24 cem > Hi, > > Co

Re: cassandra performance

2013-03-24 Thread Derek Williams
Biggest advantage of Cassandra is it's ability to scale linearly as more nodes are added and it's ability to handle node failures. Also to get the maximum performance from Cassandra you need to be making multiple requests in parallel. On Sun, Mar 24, 2013 at 3:15 AM, 张刚 wrote: > Hello, > I am

Re: Many to one type of replication.

2013-03-24 Thread aaron morton
> From this mailing list I found this Github project that is doing something > similar by looking at the commit logs: > https://github.com/carloscm/cassandra-commitlog-extract IMHO tailing the logs is fragile, and you may be better off handling it at the application level. > But is there other

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-24 Thread aaron morton
> I could imagine a scenario where a hint was replayed to a replica after all > replicas had purged their tombstones Scratch that, the hints are TTL'd with the lowest gc_grace. Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379 Cheers - Aaron Morton Freelance Ca

Re: Stream fails during repair, two nodes out-of-memory

2013-03-24 Thread aaron morton
> compaction needs some disk I/O. Slowing down our compaction will improve > overall system performance. Of course, you don't want to go too slow and fall > behind too much. In this case I was thinking of the memory use. Compaction tasks are a bit like a storm of reads. If you are having problem

Re: Observation on shuffling vs adding/removing nodes

2013-03-24 Thread aaron morton
> We initially tried to run a shuffle, however it seemed to be going really > slowly (very little progress by watching "cassandra-shuffle ls | wc -l" after > 5-6 hours and no errors in logs), My guess is that shuffle not designed to be as efficient as possible as it is only used once. Was it con

Re: High disk I/O during reads

2013-03-24 Thread aaron morton
> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn > xvdap10.13 0.00 1.07 0 16 > xvdb474.20 13524.5325.33 202868380 > xvdc469.87 13455.7330.40 201836456 Perchan

Re: Cassandra - conflict resolution for column updates with identical timestamp

2013-03-24 Thread aaron morton
It's always been like that see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Column.java#L231 Chees - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/03/2013, at 4:18 PM, dong.yajun wrote:

Re: create secondary index on column family

2013-03-24 Thread aaron morton
> > I tried to wrap 'name' to bytes('name'), but it would throw "can not parse > FUNCTION_CALL as hex bytes", seems this does not work. What was the statement you used and what was the error. > So the stored bytes are the same, right? Yes. - Aaron Morton Freelance Cassandra

Re: TimeUUID Order Partitioner

2013-03-24 Thread aaron morton
The best thing to do is start with a look at ByteOrderedPartitoner and AbstractByteOrderedPartitioner. You'll want to create a new TimeUUIDToken extends Token and a new UUIDPartitioner that extends AbstractPartitioner<> Usual disclaimer that ordered partitioners cause problems with load balanc

Re: cassandra performance

2013-03-24 Thread aaron morton
> "select CPUTime,User,site from CF(or tablename) where user=xxx and > Jobtype=xxx" Even thought cassandra has tables and looks like a RDBMS it's not. Queries with multiple secondary index clauses will not perform as well as those with none. There is plenty of documentation here http://www.da

Re: Backup strategies in a multi DC cluster

2013-03-24 Thread aaron morton
> There are advantages and disadvantages in both approaches. What are people > doing in their production systems? Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to get things off node. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Re: Backup strategies in a multi DC cluster

2013-03-24 Thread Jabbar Azam
Thanks Aaron. I have a hypothetical question. Assume you have four nodes and a snapshot is taken. The following day if a node goes down and data is corrupt through user error then how do you use the previouus nights snapshots? Would you replace the faulty node first and then restore last nights

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie
On Mon, Mar 25, 2013 at 1:35 AM, aaron morton wrote: > I tried to wrap 'name' to bytes('name'), but it would throw "can not parse >> FUNCTION_CALL as hex bytes", seems this does not work. >> >>> What was the statement you used and what was the error. > OK, I have tried using ascii code 6e616d65(n

Re: Observation on shuffling vs adding/removing nodes

2013-03-24 Thread Andrew Bialecki
Wouldn't shock me if shuffle wasn't all that performant (and not knock on shuffle...our case is somewhat specific). We added 3 nodes with num_tokens=256 and worked great, the load was evenly spread. On Sun, Mar 24, 2013 at 1:14 PM, aaron morton wrote: > We initially tried to run a shuffle, howev