date:20130324

cassandra performance

2013-03-24 Thread 张刚

Hello,
I am new to Cassandra.I do some test on a single machine. I install
Cassandra with a binary tarball distribution.
I create a CF to store the data that get from MySQL. The CF has the same
fields as the table in MySQL. So it looks like a table.
I do the same select from the CF in Cassandra and the table in MySQL,and I
find the processing time of MySQL is better than Cassandra.
So,I wander what are the advantages of Cassandra compare MySQL and how to
improve the performance of Cassandra.
Is this the right way to use Cassandra.

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie

Thanks, aaron.

On Sun, Mar 24, 2013 at 1:45 AM, aaron morton wrote:

> But a error is thrown saying "can not parse name as hex bytes".
>
> If the comparator is Bytes then the column names need to be a hex string.
>
> The easiest thing to do is create a CF where the comparator is UTF8Type so
> you can use string column names.
>

I wonder how can I change to a hex string, I tried like "ab", it's OK. But
If I enter "abcd", it throw "can not parse as hex bytes" also.
So is there a way to convert a string to hex form in Cassandra-cli? Or
other feasible way?

>
>
just that the UTF8Type needs to be validated before storing the data into
> database and BytesType need not to?
>
> It takes *very* little additional effort.
>

So the stored bytes are the same, right?

>
>
Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/03/2013, at 12:10 AM, Xu Renjie  wrote:
>
> Sorry, continued:
>I have created a column family User with no parameters specified, just
>  create column family User.
>  Then I checked that the default comparator is BytesType.
>
>   Then I want to create secondary index on one column like below:
>   update column family User with column_metadata=[{column_name:name,
> validation_class:BytesType, index_type:0}];
> But a error is thrown saying "can not parse name as hex bytes".
>
> So I wonder under this situation, is it possible to create index using
> cassandra-cli, if possible, how?
>
> Furthermore, I wonder what's the difference of type BytesType and UTF8Type
> and other types underlying.
> If I store string 'name' into database, do they have the same internal
> bytes stored in Cassandra,
> just that the UTF8Type needs to be validated before storing the data into
> database and BytesType need not to?
>
>
> On Fri, Mar 22, 2013 at 7:00 PM, Xu Renjie  wrote:
>
>> Hello, guys:
>>I am new to Cassandra. I am currently using cassandra-cli(version
>> 1.1.6).
>>
>
>
>

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie

On Sun, Mar 24, 2013 at 1:45 AM, aaron morton wrote:

> But a error is thrown saying "can not parse name as hex bytes".
>
> If the comparator is Bytes then the column names need to be a hex string.
>
> The easiest thing to do is create a CF where the comparator is UTF8Type so
> you can use string column names.
>
> And currently  our column families are all of Bytes, since Cassandra
cannot update the comparator, it's not easy to change to UTF8Type.
I tried to wrap 'name' to bytes('name'), but it would throw "can not parse
FUNCTION_CALL as hex bytes", seems this does not work.

> just that the UTF8Type needs to be validated before storing the data into
> database and BytesType need not to?
>
> It takes *very* little additional effort.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/03/2013, at 12:10 AM, Xu Renjie  wrote:
>
> Sorry, continued:
>I have created a column family User with no parameters specified, just
>  create column family User.
>  Then I checked that the default comparator is BytesType.
>
>   Then I want to create secondary index on one column like below:
>   update column family User with column_metadata=[{column_name:name,
> validation_class:BytesType, index_type:0}];
> But a error is thrown saying "can not parse name as hex bytes".
>
> So I wonder under this situation, is it possible to create index using
> cassandra-cli, if possible, how?
>
> Furthermore, I wonder what's the difference of type BytesType and UTF8Type
> and other types underlying.
> If I store string 'name' into database, do they have the same internal
> bytes stored in Cassandra,
> just that the UTF8Type needs to be validated before storing the data into
> database and BytesType need not to?
>
>
> On Fri, Mar 22, 2013 at 7:00 PM, Xu Renjie  wrote:
>
>> Hello, guys:
>>I am new to Cassandra. I am currently using cassandra-cli(version
>> 1.1.6).
>>
>
>
>

Re: cassandra performance

2013-03-24 Thread dong.yajun

Hello,

 I'd suggest you to take look at the difference between Nosql and RDMS.

Best,

On Sun, Mar 24, 2013 at 5:15 PM, 张刚  wrote:

> Hello,
> I am new to Cassandra.I do some test on a single machine. I install
> Cassandra with a binary tarball distribution.
> I create a CF to store the data that get from MySQL. The CF has the same
> fields as the table in MySQL. So it looks like a table.
> I do the same select from the CF in Cassandra and the table in MySQL,and I
> find the processing time of MySQL is better than Cassandra.
> So,I wander what are the advantages of Cassandra compare MySQL and how to
> improve the performance of Cassandra.
> Is this the right way to use Cassandra.
>
>


-- 
*Ric Dong*

Re: cassandra performance

2013-03-24 Thread cem

Hi,

Could you provide some other details about your schema design and queries?
It is very hard to tell anything.

Regards,
Cem

On Sun, Mar 24, 2013 at 12:40 PM, dong.yajun  wrote:

> Hello,
>
>  I'd suggest you to take look at the difference between Nosql and RDMS.
>
> Best,
>
> On Sun, Mar 24, 2013 at 5:15 PM, 张刚  wrote:
>
>> Hello,
>> I am new to Cassandra.I do some test on a single machine. I install
>> Cassandra with a binary tarball distribution.
>> I create a CF to store the data that get from MySQL. The CF has the same
>> fields as the table in MySQL. So it looks like a table.
>> I do the same select from the CF in Cassandra and the table in MySQL,and
>> I find the processing time of MySQL is better than Cassandra.
>> So,I wander what are the advantages of Cassandra compare MySQL and how to
>> improve the performance of Cassandra.
>> Is this the right way to use Cassandra.
>>
>>
>
>
> --
> *Ric Dong*
>
>

TimeUUID Order Partitioner

2013-03-24 Thread Carlos Pérez Miguel

Hi,

I store in my system rows where the key is a UUID version1, TimeUUID. I
would like to maintain rows ordered by time. I know that in this case, it
is recomended to use an external CF where column names are UUID ordered by
time. But in my use case this is not possible, so I would like to use a
custom Partitioner in order to do this. If I use ByteOrderedPartitioner
rows are not correctly ordered because of the way a UUID stores the
timestamp. What is needed in order to implement my own Partitioner?

Thank you.

Carlos Pérez Miguel

Re: cassandra performance

2013-03-24 Thread 张刚

For example,each row represent a job record,it has fields like
"user","site","CPUTime","datasize","JobType"...
The fields in CF is fixed,just like a table.The query like this "select
CPUTime,User,site from CF(or tablename) where user=xxx and Jobtype=xxx"

Best regards


2013/3/24 cem 

> Hi,
>
> Could you provide some other details about your schema design and queries?
> It is very hard to tell anything.
>
> Regards,
> Cem
>
>
> On Sun, Mar 24, 2013 at 12:40 PM, dong.yajun  wrote:
>
>> Hello,
>>
>>  I'd suggest you to take look at the difference between Nosql and RDMS.
>>
>> Best,
>>
>> On Sun, Mar 24, 2013 at 5:15 PM, 张刚  wrote:
>>
>>> Hello,
>>> I am new to Cassandra.I do some test on a single machine. I install
>>> Cassandra with a binary tarball distribution.
>>> I create a CF to store the data that get from MySQL. The CF has the same
>>> fields as the table in MySQL. So it looks like a table.
>>> I do the same select from the CF in Cassandra and the table in MySQL,and
>>> I find the processing time of MySQL is better than Cassandra.
>>> So,I wander what are the advantages of Cassandra compare MySQL and how
>>> to improve the performance of Cassandra.
>>> Is this the right way to use Cassandra.
>>>
>>>
>>
>>
>> --
>> *Ric Dong*
>>
>>
>

Re: cassandra performance

2013-03-24 Thread Derek Williams

Biggest advantage of Cassandra is it's ability to scale linearly as more
nodes are added and it's ability to handle node failures.

Also to get the maximum performance from Cassandra you need to be making
multiple requests in parallel.

On Sun, Mar 24, 2013 at 3:15 AM, 张刚  wrote:

> Hello,
> I am new to Cassandra.I do some test on a single machine. I install
> Cassandra with a binary tarball distribution.
> I create a CF to store the data that get from MySQL. The CF has the same
> fields as the table in MySQL. So it looks like a table.
> I do the same select from the CF in Cassandra and the table in MySQL,and I
> find the processing time of MySQL is better than Cassandra.
> So,I wander what are the advantages of Cassandra compare MySQL and how to
> improve the performance of Cassandra.
> Is this the right way to use Cassandra.
>
>

-- 
Derek Williams

Re: Many to one type of replication.

2013-03-24 Thread aaron morton

> From this mailing list I found this Github project that is doing something 
> similar by looking at the commit logs: 
> https://github.com/carloscm/cassandra-commitlog-extract
IMHO tailing the logs is fragile, and you may be better off handling it at the 
application level. 

> But is there other options around using a custom replication strategy?
There is no such thing as "one directional" replication. For example 
replication everything from DC 1 to DC 2, but do not replicate from DC 2 to DC 
1. 
You may be better off reducing the number of clusters and then running one 
transactional and one analytical DC.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 3:42 AM, Francois Richard  wrote:

> Hi,
> 
> We currently run our Cassandra deployment with multiple independent clusters. 
>  The clusters are totally self contain in terms of redundancy and independent 
> from each others.  We have a "sharding "layer higher in our stack to dispatch 
> the requests to the right application stack and this stack connects to his 
> associated Cassandra cluster. All the cassandra clusters are identical in 
> terms of hosted keyspaces, column families, replication factor ...
> 
> At this point I am investigating ways to build a central cassandra cluster 
> that could contain all the data from all the other cassandra clusters and I 
> am wondering how to best do it.  The goal is to have a global view of our 
> data and to be able to do some massive crunching on it.
> 
> For sure we can build some ETL type of job that would figure out the data 
> that was updated, extract it, and load it to the central cassandra cluster.  
> From this mailing list I found this Github project that is doing something 
> similar by looking at the commit logs: 
> https://github.com/carloscm/cassandra-commitlog-extract
> 
> But is there other options around using a custom replication strategy?  Any 
> other general suggestions ?
> 
> Thanks,
> 
> FR 
> 
> -- 
> _
> 
> Francois Richard
> 
> 
>

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-24 Thread aaron morton

> I could imagine a  scenario where a hint was replayed to a replica after all 
> replicas had purged their tombstones
Scratch that, the hints are TTL'd with the lowest gc_grace. 
Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 6:24 AM, aaron morton  wrote:

>> Beside the joke, would hinted handoff really have any role in this issue?
> I could imagine a  scenario where a hint was replayed to a replica after all 
> replicas had purged their tombstones. That seems like a long shot, it would 
> need one node to be down for the write and all up for the delete and for all 
> of them to have purged the tombstone. But maybe we should have a max age on 
> hints so it cannot happen. 
> 
> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
> 
> Ensuring no hints are in place during an upgrade would work around. I tend to 
> make sure hints and commit log are clear during an upgrade. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/03/2013, at 7:54 AM, Arya Goudarzi  wrote:
> 
>> Beside the joke, would hinted handoff really have any role in this issue? I 
>> have been struggling to reproduce this issue using the snapshot data taken 
>> from our cluster and following the same upgrade process from 1.1.6 to 
>> 1.1.10. I know snapshots only link to active SSTables. What if these 
>> returned rows belong to some inactive SSTables and some bug exposed itself 
>> and marked them as active? What are the possibilities that could lead to 
>> this? I am eager to find our as this is blocking our upgrade.
>> 
>> On Tue, Mar 19, 2013 at 2:11 AM,  wrote:
>> This obscure feature of Cassandra is called “haunted handoff”.
>> 
>>  
>> 
>> Happy (early) April Fools J
>> 
>>  
>> 
>> From: aaron morton [mailto:aa...@thelastpickle.com] 
>> Sent: Monday, March 18, 2013 7:45 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
>> 
>>  
>> 
>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>> be the case as successful repairs where done every night prior to the 
>> upgrade. 
>> 
>> Could this be explained by writes occurring during the upgrade process ? 
>> 
>>  
>> 
>> I found this bug which touches timestamp and tomstones which was fixed in 
>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>> 
>> Me neither, but the issue was fixed in 1.1.0
>> 
>>  
>> 
>>  It appears that the repair task that I executed after upgrade, brought back 
>> lots of deleted rows into life.
>> 
>> Was it entire rows or columns in a row?
>> 
>> Do you know if row level or column level deletes were used ? 
>> 
>>  
>> 
>> Can you look at the data in cassanca-cli and confirm the timestamps on the 
>> columns make sense ?  
>> 
>>  
>> 
>> Cheers
>> 
>>  
>> 
>> -
>> 
>> Aaron Morton
>> 
>> Freelance Cassandra Consultant
>> 
>> New Zealand
>> 
>>  
>> 
>> @aaronmorton
>> 
>> http://www.thelastpickle.com
>> 
>>  
>> 
>> On 16/03/2013, at 2:31 PM, Arya Goudarzi  wrote:
>> 
>> 
>> 
>> 
>> Hi,
>> 
>>  
>> 
>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running 
>> repairs. It appears that the repair task that I executed after upgrade, 
>> brought back lots of deleted rows into life. Here are some logistics:
>> 
>>  
>> 
>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 
>> 
>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
>> 
>> - Upgrade to : 1.1.10 with all other settings the same;
>> 
>> - Successful repairs were being done on this cluster every night;
>> 
>> - Our clients use nanosecond precision timestamp for cassandra calls;
>> 
>> - After upgrade, while running repair I say some log messages like this in 
>> one node:
>> 
>>  
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2--e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>> /23.20.207.56 have 2223 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2--e9b2bd8ea1bd] Endpoints /XX.250.43 and 
>> /23.20.207.56 have 161 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2--e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>> /23.20.250.43 have 2294 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 
>> AntiEntropyService.java (line 789) [repair 
>> #0990f320-8da9-11e2--e9b2bd8ea1bd] App is fully synced (13 remaining 
>> colum

Re: Stream fails during repair, two nodes out-of-memory

2013-03-24 Thread aaron morton

> compaction needs some disk I/O. Slowing down our compaction will improve 
> overall system performance. Of course, you don't want to go too slow and fall 
> behind too much.
In this case I was thinking of the memory use. 
Compaction tasks are a bit like a storm of reads. If you are having problems 
with memory management all those reads can result in increased GC. 

> It looks like we hit OOM when repair starts streaming
> multiple cfs simultaneously. 
Odd. It's not very memory intensive. 

> I'm wondering if I should throttle streaming, and/or repair only one
> CF at a time.

Decreasing stream_throughput_outbound_megabits_per_sec may help, if the goal is 
just to get repair working. 

You may also want to increase phi_convict_threshold to 12, this will make it 
harder for a node to get marked as down. Which can be handy when GC is causing 
problems and you have under powered nodes. If the node is marked as down the 
repair session will fail instantly. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 9:12 AM, Dane Miller  wrote:

> On Fri, Mar 22, 2013 at 5:58 PM, Wei Zhu  wrote:
>> compaction needs some disk I/O. Slowing down our compaction will improve 
>> overall
>> system performance. Of course, you don't want to go too slow and fall behind 
>> too much.
> 
> Hmm.  Even after making the suggested configuration changes, repair
> still fails with OOM (but only one node died this time, which is an
> improvement).  It looks like we hit OOM when repair starts streaming
> multiple cfs simultaneously.  Just prior to OOM, the node loses
> contact with another node in the cluster and starts storing hints.
> 
> I'm wondering if I should throttle streaming, and/or repair only one
> CF at a time.
> 
>> From: "Dane Miller"
>> Subject: Re: Stream fails during repair, two nodes out-of-memory
>> 
>> On Thu, Mar 21, 2013 at 10:28 AM, aaron morton  
>> wrote:
>>> heap of 1867M is kind of small. According to the discussion on this list,
>>> it's advisable to have m1.xlarge.
>>> 
>>> +1
>>> 
>>> In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the NEW_HEAP_SIZE to
>>> 400M
>>> 
>>> In the yaml file set
>>> 
>>> in_memory_compaction_limit_in_mb to 32
>>> compaction_throughput_mb_per_sec to 8
>>> concurrent_compactors to 2
>>> 
>>> This will slow down compaction a lot. You may want to restore some of these
>>> settings once you have things stable.
>>> 
>>> You have an under powered box for what you are trying to do.
>> 
>> Thanks very much for the info.  Have made the changes and am retrying.
>> I'd like to understand, why does it help to slow compaction?
>> 
>> It does seem like the cluster is under powered to handle our
>> application's full write load plus repairs, but it operates fine
>> otherwise.
>> 
>> On Wed, Mar 20, 2013 at 8:47 PM, Wei Zhu  wrote:
>>> It's clear you are out of memory. How big is your data size?
>> 
>> 120 GB per node, of which 50% is actively written/updated, and 50% is
>> read-mostly.
>> 
>> Dane
>>

Re: Observation on shuffling vs adding/removing nodes

2013-03-24 Thread aaron morton

> We initially tried to run a shuffle, however it seemed to be going really 
> slowly (very little progress by watching "cassandra-shuffle ls | wc -l" after 
> 5-6 hours and no errors in logs),
My guess is that shuffle not designed to be as efficient as possible as it is 
only used once. Was it continuing to make progress?

> so we cancelled it and instead added 3 nodes to the cluster, waited for them 
> to bootstrap, and then decommissioned the first 3 nodes. 
You added 3 nodes with num_tokens set in the yaml file ?
What does nodetool status say ? 

Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 9:41 AM, Andrew Bialecki  wrote:

> Just curious if anyone has any thoughts on something we've observed in a 
> small test cluster.
> 
> We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to start 
> using vnodes. We upgraded the cluster to 1.2.2 and then followed the 
> instructions for using vnodes. We initially tried to run a shuffle, however 
> it seemed to be going really slowly (very little progress by watching 
> "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in logs), so we 
> cancelled it and instead added 3 nodes to the cluster, waited for them to 
> bootstrap, and then decommissioned the first 3 nodes. Total process took 
> about 3 hours. My assumption is that the final result is the same in terms of 
> data distributed somewhat randomly across nodes now (assuming no bias in the 
> token ranges selected when bootstrapping a node).
> 
> If that assumption is correct, the observation would be, if possible, adding 
> nodes and then removing nodes appears to be a faster way to shuffle data for 
> small clusters. Obviously not always possible, but I thought I'd just throw 
> this out there in case anyone runs into a similar situation. This cluster is 
> unsurprisingly on EC2 instances, which made provisioning and shutting down 
> nodes extremely easy.
> 
> Cheers,
> Andrew

Re: High disk I/O during reads

2013-03-24 Thread aaron morton

> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> xvdap10.13 0.00 1.07  0 16
> xvdb474.20 13524.5325.33 202868380
> xvdc469.87 13455.7330.40 201836456
Perchance are you running on m1.large instances ? 
You may be seeing the "moderate" IO performance they offer. 
 
> The query is looking for a maximum of 50 messages between two dates, in 
> reverse order.  
This is probably the slowest query you can do. Removing the reverse would make 
it faster, or using a reversed comparator. 
You mentioned that most queries in cfhistograms were hitting one SSTable. If 
you are comfortable with the output from that I would work out the distribution 
for read latency, the read_latency in cfstats if the most recent only. 

It have a lot of deletes on a row that can slow down reads as well. Does not 
sound like the case but I wanted to mention in. 

Next thing to look at is the proxyhistograms, that will give you the end-to-end 
request latency in cassandra. Which will help with identifying if the issue is 
the disk read or network / something else. The something else may be 
DigestMismatch errors if you have been dropping writes and not running repair / 
disable hints. 

Cheers

 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 2:59 PM, Matt Kap  wrote:

> Having battled similar issues with read latency recently, here's some
> general things to look out for.
> 
> - At 118ms, something is definitely broken. You should be looking at
> under 10ms or lower, depending on hardware.
> - Do "nodetool info" on all 5 nodes. Is the load distributed evenly?
> Is it reasonable (under 500GB)?
> - Make sure you aren't running low on heap space. You could see that
> from "nodetool info" also. If you are running low, very bad things
> begin to happen (lots of GC, constant flushing of Memtables, reduction
> of Key Cache, etc). Generally, once there, the node doesn't recover,
> and read latency goes to sh*t.
> - Which compaction strategy are you using? Leveled compactions or
> size-tiered? There's different issues with both.
> - Is your Key Cache turned on? What's the Key Cache hit rate?
> - Is the Read Latency the same on all nodes? Or just one in particular?
> - Are pending compactions building up?
> - What's %util on disk? Same on all nodes?
> 
> I would go through "nodetool cfstats, info, compactionstats, tpstats",
> and see if things are roughly the same across all the nodes. You could
> also just be under capacity, but more likely, there's an actual
> problem looming somewhere.
> 
> Cheers!
> -Matt
> 
> On Sat, Mar 23, 2013 at 3:18 AM,   wrote:
>> You can try to disable readahead on cassandra data disk.
>> 
>> Jon Scarborough  написал(а):
>>> 
>>> Checked tpstats, there are very few dropped messages.
>>> 
>>> Checked histograms. Mostly nothing surprising. The vast majority of rows
>>> are small, and most reads only access one or two SSTables.
>>> 
>>> What I did discover is that of our 5 nodes, one is performing well, with
>>> disk I/O in the ballprk that seems reasonable. The other 4 nodes are doing
>>> roughly 4x the disk i/O per second.  Interestingly, the node that is
>>> performing well also seems to be servicing about twice the number of reads
>>> that the other nodes are.
>>> 
>>> I compared configuration between the node performing well to those that
>>> aren't, and so far haven't found any discrepancies.
>>> 
>>> On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu  wrote:
 
 According to your cfstats, read latency is over 100 ms which is really
 really slow. I am seeing less than 3ms reads for my cluster which is on 
 SSD.
 Can you also check the nodetool cfhistorgram, it tells you more about the
 number of SSTable involved and read/write latency. Somtimes average doesn't
 tell you the whole storey.
 Also check your nodetool tpstats, are there a lot dropped reads?
 
 -Wei
 - Original Message -
 From: "Jon Scarborough" 
 To: user@cassandra.apache.org
 Sent: Friday, March 22, 2013 9:42:34 AM
 Subject: Re: High disk I/O during reads
 
 Key distribution across probably varies a lot from row to row in our
 case. Most reads would probably only need to look at a few SSTables, a few
 might need to look at more.
 
 I don't yet have a deep understanding of C* internals, but I would
 imagine even the more expensive use cases would involve something like 
 this:
 
 1) Check the index for each SSTable to determine if part of the row is
 there.
 2) Look at the endpoints of the slice to determine if the data in a
 particular SSTable is relevant to the query.
 3) Read the chunks of those SSTables, working backwards from the end of
 the slice until enough columns have been read to satisfy the limit claus

Re: Cassandra - conflict resolution for column updates with identical timestamp

2013-03-24 Thread aaron morton

It's always been like that see 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Column.java#L231

Chees
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 4:18 PM, dong.yajun  wrote:

> Thanks Capriolo, 
> 
> Umm.. so is there any background or history abort this issue? 
> 
> On Sun, Mar 24, 2013 at 2:32 AM, Edward Capriolo  
> wrote:
> The value that sorts higher, this way it is deterministic.
> 
> 
> On Sat, Mar 23, 2013 at 12:12 PM, dong.yajun  wrote:
> Hello, 
> 
> I would like to know which write wins in case of two updates with the 
> same client timestamp in Cassandra. 
> 
> Initial data: KeyA: { col1:"val AA", col2:"val BB", col3:"val CC"}
> 
> Client 1 sends update: KeyA: { col1:"val C1", col2:"val B1"} on Sx
> 
> Client 2 sends update: KeyA: { col1:"val C2", col2:"val B2"} on Sy
> 
> Both updates have the same timestamp. 
> 
> -- 
> Ric Dong
> 
> 
> 
> 
> 
> -- 
> Ric Dong 
> Newegg Ecommerce, MIS department 
>

Re: create secondary index on column family

2013-03-24 Thread aaron morton

> 
> I tried to wrap 'name' to bytes('name'), but it would throw "can not parse 
> FUNCTION_CALL as hex bytes", seems this does not work.
What was the statement you used and what was the error. 

> So the stored bytes are the same, right? 

Yes. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 11:53 PM, Xu Renjie  wrote:

> 
> 
> 
> On Sun, Mar 24, 2013 at 1:45 AM, aaron morton  wrote:
>> But a error is thrown saying "can not parse name as hex bytes".
> If the comparator is Bytes then the column names need to be a hex string. 
> 
> The easiest thing to do is create a CF where the comparator is UTF8Type so 
> you can use string column names. 
> 
> And currently  our column families are all of Bytes, since Cassandra cannot 
> update the comparator, it's not easy to change to UTF8Type.
> I tried to wrap 'name' to bytes('name'), but it would throw "can not parse 
> FUNCTION_CALL as hex bytes", seems this does not work.
>> just that the UTF8Type needs to be validated before storing the data into 
>> database and BytesType need not to?
> 
> It takes *very* little additional effort. 
> 
> Cheers
> 
>  
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/03/2013, at 12:10 AM, Xu Renjie  wrote:
> 
>> Sorry, continued:
>>I have created a column family User with no parameters specified, just
>>  create column family User.
>>  Then I checked that the default comparator is BytesType. 
>>  
>>   Then I want to create secondary index on one column like below:
>>   update column family User with column_metadata=[{column_name:name, 
>> validation_class:BytesType, index_type:0}];
>> But a error is thrown saying "can not parse name as hex bytes".
>> 
>> So I wonder under this situation, is it possible to create index using 
>> cassandra-cli, if possible, how?
>> 
>> Furthermore, I wonder what's the difference of type BytesType and UTF8Type 
>> and other types underlying.
>> If I store string 'name' into database, do they have the same internal bytes 
>> stored in Cassandra,
>> just that the UTF8Type needs to be validated before storing the data into 
>> database and BytesType need not to?
>> 
>> 
>> On Fri, Mar 22, 2013 at 7:00 PM, Xu Renjie  wrote:
>> Hello, guys:
>>I am new to Cassandra. I am currently using cassandra-cli(version 1.1.6). 
>> 
> 
>

Re: TimeUUID Order Partitioner

2013-03-24 Thread aaron morton

The best thing to do is start with a look at ByteOrderedPartitoner and 
AbstractByteOrderedPartitioner. 

You'll want to create a new TimeUUIDToken extends Token and a new 
UUIDPartitioner that extends AbstractPartitioner<>

Usual disclaimer that ordered partitioners cause problems with load balancing. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel  wrote:

> Hi,
> 
> I store in my system rows where the key is a UUID version1, TimeUUID. I would 
> like to maintain rows ordered by time. I know that in this case, it is 
> recomended to use an external CF where column names are UUID ordered by time. 
> But in my use case this is not possible, so I would like to use a custom 
> Partitioner in order to do this. If I use ByteOrderedPartitioner rows are not 
> correctly ordered because of the way a UUID stores the timestamp. What is 
> needed in order to implement my own Partitioner?
> 
> Thank you.
> 
> Carlos Pérez Miguel

Re: cassandra performance

2013-03-24 Thread aaron morton

> "select CPUTime,User,site from CF(or tablename) where user=xxx and 
> Jobtype=xxx"
Even thought cassandra has tables and looks like a RDBMS it's not. 
Queries with multiple secondary index clauses will not perform as well as those 
with none. 

 There is plenty of documentation here http://www.datastax.com/docs , start 
with the help on data modelling to get an idea of how Cassandra is different. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/03/2013, at 4:32 AM, Derek Williams  wrote:

> Biggest advantage of Cassandra is it's ability to scale linearly as more 
> nodes are added and it's ability to handle node failures.
> 
> Also to get the maximum performance from Cassandra you need to be making 
> multiple requests in parallel.
> 
> 
> On Sun, Mar 24, 2013 at 3:15 AM, 张刚  wrote:
> Hello,
> I am new to Cassandra.I do some test on a single machine. I install Cassandra 
> with a binary tarball distribution.
> I create a CF to store the data that get from MySQL. The CF has the same 
> fields as the table in MySQL. So it looks like a table. 
> I do the same select from the CF in Cassandra and the table in MySQL,and I 
> find the processing time of MySQL is better than Cassandra.
> So,I wander what are the advantages of Cassandra compare MySQL and how to 
> improve the performance of Cassandra.
> Is this the right way to use Cassandra.
> 
> 
> 
> 
> -- 
> Derek Williams

Re: Backup strategies in a multi DC cluster

2013-03-24 Thread aaron morton

> There are advantages and disadvantages in both approaches. What are people 
> doing in their production systems?
Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to 
get things off node. 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/03/2013, at 4:37 AM, Jabbar Azam  wrote:

> Hello,
> 
> I've been experimenting with cassandra for quite a while now.
> 
> It's time for me to look at backups but I'm not sure what the best practice 
> is. I want to be able to recover the data to a point in time before any user 
> or software errors.
> 
> We will have two datacentres with 4 servers and RF=3.
> 
> Each datacentre will have at most 1.6 TB(includes replication, LZ4 
> compression, using test data) of data. That is ten years of data after which 
> we will start purging. This amounts to about 400MB of data generation per day.
> 
> I've read about users doing snapshots of individual nodes to S3(Netflix) and 
> I've read  about creating virtual datacentres 
> (http://www.datastax.com/dev/blog/multi-datacenter-replication) where each 
> virtual datacentre contains a backup node.
> 
> There are advantages and disadvantages in both approaches. What are people 
> doing in their production systems?
> 
> 
> 
> 
> -- 
> Thanks
> 
> Jabbar Azam

Re: Backup strategies in a multi DC cluster

2013-03-24 Thread Jabbar Azam

Thanks Aaron. I have a hypothetical question.

Assume you have four nodes and a snapshot is taken.  The following day if a
node goes down and data is corrupt through user error then how do you use
the previouus nights snapshots?

Would you replace the faulty node first and then restore last nights
snapshot?  What happens if you don't have a replacement node? You won't be
able to restore last nights snapshot.

However if a virtual datacenter consisting of a backup node is used then
the backup node could be used regardless of the number of nodes in the
datacentre. Would there be any disadvantages approach?  Sorry for the
questions I want to understand all the options.
On 24 Mar 2013 17:45, "aaron morton"  wrote:

> There are advantages and disadvantages in both approaches. What are people
> doing in their production systems?
>
> Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to
> get things off node.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/03/2013, at 4:37 AM, Jabbar Azam  wrote:
>
> Hello,
>
> I've been experimenting with cassandra for quite a while now.
>
> It's time for me to look at backups but I'm not sure what the best
> practice is. I want to be able to recover the data to a point in time
> before any user or software errors.
>
> We will have two datacentres with 4 servers and RF=3.
>
> Each datacentre will have at most 1.6 TB(includes replication, LZ4
> compression, using test data) of data. That is ten years of data after
> which we will start purging. This amounts to about 400MB of data generation
> per day.
>
> I've read about users doing snapshots of individual nodes to S3(Netflix)
> and I've read  about creating virtual datacentres (
> http://www.datastax.com/dev/blog/multi-datacenter-replication) where each
> virtual datacentre contains a backup node.
>
> There are advantages and disadvantages in both approaches. What are people
> doing in their production systems?
>
>
>
>
> --
> Thanks
>
> Jabbar Azam
>
>
>

Re: create secondary index on column family

2013-03-24 Thread Xu Renjie

On Mon, Mar 25, 2013 at 1:35 AM, aaron morton wrote:

> I tried to wrap 'name' to bytes('name'), but it would throw "can not parse
>> FUNCTION_CALL as hex bytes", seems this does not work.
>>
>>> What was the statement you used and what was the error.
>

OK, I have tried using ascii code 6e616d65(name)  like below, it's OK now.
update column family User with column_metadata=[{column_name:6e616d65,
validation_class:BytesType, index_type:0}];

However, I tried using utf8('name') like below:
update column family User with column_metadata=[{column_name:utf8('name'),
validation_class:BytesType, index_type:0}];
the error "can not parse FUNCTION_CALL as hex bytes" is thrown in CLI, no
log printed on server. Is the conversion function
not allowed in "update column family" statement?

> So the stored bytes are the same, right?
>
> Yes.
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 24/03/2013, at 11:53 PM, Xu Renjie  wrote:
>
>
>
>
> On Sun, Mar 24, 2013 at 1:45 AM, aaron morton wrote:
>
>> But a error is thrown saying "can not parse name as hex bytes".
>>
>> If the comparator is Bytes then the column names need to be a hex string.
>>
>> The easiest thing to do is create a CF where the comparator is UTF8Type
>> so you can use string column names.
>>
>> And currently  our column families are all of Bytes, since Cassandra
> cannot update the comparator, it's not easy to change to UTF8Type.
> I tried to wrap 'name' to bytes('name'), but it would throw "can not parse
> FUNCTION_CALL as hex bytes", seems this does not work.
>
>> just that the UTF8Type needs to be validated before storing the data into
>> database and BytesType need not to?
>>
>> It takes *very* little additional effort.
>>
>> Cheers
>>
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 23/03/2013, at 12:10 AM, Xu Renjie  wrote:
>>
>> Sorry, continued:
>>I have created a column family User with no parameters specified, just
>>  create column family User.
>>  Then I checked that the default comparator is BytesType.
>>
>>   Then I want to create secondary index on one column like below:
>>   update column family User with column_metadata=[{column_name:name,
>> validation_class:BytesType, index_type:0}];
>> But a error is thrown saying "can not parse name as hex bytes".
>>
>> So I wonder under this situation, is it possible to create index using
>> cassandra-cli, if possible, how?
>>
>> Furthermore, I wonder what's the difference of type BytesType and
>> UTF8Type and other types underlying.
>> If I store string 'name' into database, do they have the same internal
>> bytes stored in Cassandra,
>> just that the UTF8Type needs to be validated before storing the data into
>> database and BytesType need not to?
>>
>>
>> On Fri, Mar 22, 2013 at 7:00 PM, Xu Renjie wrote:
>>
>>> Hello, guys:
>>>I am new to Cassandra. I am currently using cassandra-cli(version
>>> 1.1.6).
>>>
>>
>>
>>
>
>

Re: Observation on shuffling vs adding/removing nodes

2013-03-24 Thread Andrew Bialecki

Wouldn't shock me if shuffle wasn't all that performant (and not knock on
shuffle...our case is somewhat specific).

We added 3 nodes with num_tokens=256 and worked great, the load was evenly
spread.

On Sun, Mar 24, 2013 at 1:14 PM, aaron morton wrote:

> We initially tried to run a shuffle, however it seemed to be going really
> slowly (very little progress by watching "cassandra-shuffle ls | wc -l"
> after 5-6 hours and no errors in logs),
>
> My guess is that shuffle not designed to be as efficient as possible as it
> is only used once. Was it continuing to make progress?
>
> so we cancelled it and instead added 3 nodes to the cluster, waited for
> them to bootstrap, and then decommissioned the first 3 nodes.
>
> You added 3 nodes with num_tokens set in the yaml file ?
> What does nodetool status say ?
>
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 24/03/2013, at 9:41 AM, Andrew Bialecki 
> wrote:
>
> Just curious if anyone has any thoughts on something we've observed in a
> small test cluster.
>
> We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to
> start using vnodes. We upgraded the cluster to 1.2.2 and then followed the
> instructions for using vnodes. We initially tried to run a shuffle, however
> it seemed to be going really slowly (very little progress by watching
> "cassandra-shuffle ls | wc -l" after 5-6 hours and no errors in logs), so
> we cancelled it and instead added 3 nodes to the cluster, waited for them
> to bootstrap, and then decommissioned the first 3 nodes. Total process took
> about 3 hours. My assumption is that the final result is the same in terms
> of data distributed somewhat randomly across nodes now (assuming no bias in
> the token ranges selected when bootstrapping a node).
>
> If that assumption is correct, the observation would be, if possible,
> adding nodes and then removing nodes appears to be a faster way to shuffle
> data for small clusters. Obviously not always possible, but I thought I'd
> just throw this out there in case anyone runs into a similar situation.
> This cluster is unsurprisingly on EC2 instances, which made provisioning
> and shutting down nodes extremely easy.
>
> Cheers,
> Andrew
>
>
>

cassandra performance

Re: create secondary index on column family

Re: create secondary index on column family

Re: cassandra performance

Re: cassandra performance

TimeUUID Order Partitioner

Re: cassandra performance

Re: cassandra performance

Re: Many to one type of replication.

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

Re: Stream fails during repair, two nodes out-of-memory

Re: Observation on shuffling vs adding/removing nodes

Re: High disk I/O during reads

Re: Cassandra - conflict resolution for column updates with identical timestamp

Re: create secondary index on column family

Re: TimeUUID Order Partitioner

Re: cassandra performance

Re: Backup strategies in a multi DC cluster

Re: Backup strategies in a multi DC cluster

Re: create secondary index on column family

Re: Observation on shuffling vs adding/removing nodes

21 matches

Site Navigation

Mail list logo

Footer information