Write performance expectations...

2013-02-13 Thread kadey
Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and 
the RandomPartioner. I'm writing to a column family in a keyspace that's 
replicated to all nodes in both datacenters, with a consistency level of 
LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken 


Re: Write performance expectations...

2013-02-13 Thread kadey
I'm not using multi-threads/processes. I'll try multi-threading to see if I get 
a boost. 

Thanks. 


Ken 


- Original Message -
From: "Tyler Hobbs"  
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 11:06:30 AM 
Subject: Re: Write performance expectations... 


2500 inserts per second is about what a single python thread using pycassa can 
do against a local node. Are you using multiple threads for the inserts? 
Multiple processes? 




On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodr...@gmail.com > wrote: 



Is there a particular reason for you to use EBS ? Instance Store are 
recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node 
reduce your performance, it is more or less like if you had only one node (at 
performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want 
some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < ka...@comcast.net > 






Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and 
the RandomPartioner. I'm writing to a column family in a keyspace that's 
replicated to all nodes in both datacenters, with a consistency level of 
LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken 








-- 
Tyler Hobbs 
DataStax 


Re: Write performance expectations...

2013-02-14 Thread kadey

> Is there a particular reason for you to use EBS ? Instance Store are 
> recommended because they improve performances by reducing the I/O throttling. 


> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you 
> want some feedback about the 2500 w/s, and also give the mean size of your 
> rows. 

The cluster was set up before I came onto the project, so I'm trying to get 
answers to these questions. 


> An other thing you should be aware of is that replicating the data to all 
> node reduce your performance, it is more or less like if you had only one 
> node (at performance level I mean). 


> Also, writing to different datacenters probably induce some network latency. 

In my understanding of how LOCAL_QUORUM works, the insert request is only 
waiting on (in my case) 2 nodes in the local datacenter to report a successful 
write . 


Ken 


- Original Message -
From: "Alain RODRIGUEZ"  
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 9:21:18 AM 
Subject: Re: Write performance expectations... 


Is there a particular reason for you to use EBS ? Instance Store are 
recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node 
reduce your performance, it is more or less like if you had only one node (at 
performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want 
some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < ka...@comcast.net > 




Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and 
the RandomPartioner. I'm writing to a column family in a keyspace that's 
replicated to all nodes in both datacenters, with a consistency level of 
LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken 





Re: Write performance expectations...

2013-02-14 Thread kadey
Using multithreading, inserting 2000 per thread, resulted in no throughput 
increase. Each thread is taking about 4 seconds per, indicating a bottleneck 
elsewhere. 




Ken 
- Original Message -
From: "Tyler Hobbs"  
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 11:06:30 AM 
Subject: Re: Write performance expectations... 


2500 inserts per second is about what a single python thread using pycassa can 
do against a local node. Are you using multiple threads for the inserts? 
Multiple processes? 




On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodr...@gmail.com > wrote: 



Is there a particular reason for you to use EBS ? Instance Store are 
recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node 
reduce your performance, it is more or less like if you had only one node (at 
performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want 
some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < ka...@comcast.net > 






Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and 
the RandomPartioner. I'm writing to a column family in a keyspace that's 
replicated to all nodes in both datacenters, with a consistency level of 
LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken 








-- 
Tyler Hobbs 
DataStax 


Re: Write performance expectations...

2013-02-14 Thread kadey

Alain, 
I found out that the client node is an m1.small, and the cassandra nodes are 
m1.large. 

This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk: 
{batchID: 2486272}}. Not a whole lot of data. 



If you don't use EBS, how is data persistence then maintained in the event that 
an instance goes down for whatever reason? 

Ken 
- Original Message -
From: "Alain RODRIGUEZ"  
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:34:06 AM 
Subject: Re: Write performance expectations... 


Hi Ken, 


You really should take a look at my first answer... and give us more 
information on the size of your inserts, the type of EC2 you are using at 
least. You should also consider using Instance store and not EBS. Well, look at 
all these things I already told you. 


Alain 



2013/2/14 Peter Lin < wool...@gmail.com > 


it could be the instances are IO limited. 

I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on 
a AMD FX 8 core with 32GB of ram. 

with 24 threads I get roughly 20K inserts per second. each insert is 
only about 100-150 bytes. 



On Thu, Feb 14, 2013 at 8:07 AM, < ka...@comcast.net > wrote: 
> Using multithreading, inserting 2000 per thread, resulted in no throughput 
> increase. Each thread is taking about 4 seconds per, indicating a bottleneck 
> elsewhere. 
> 
> Ken 
> 
>  
> From: "Tyler Hobbs" < ty...@datastax.com > 
> To: user@cassandra.apache.org 
> Sent: Wednesday, February 13, 2013 11:06:30 AM 
> 
> Subject: Re: Write performance expectations... 
> 
> 2500 inserts per second is about what a single python thread using pycassa 
> can do against a local node. Are you using multiple threads for the 
> inserts? Multiple processes? 
> 
> 
> On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodr...@gmail.com > 
> wrote: 
>> 
>> Is there a particular reason for you to use EBS ? Instance Store are 
>> recommended because they improve performances by reducing the I/O 
>> throttling. 
>> 
>> An other thing you should be aware of is that replicating the data to all 
>> node reduce your performance, it is more or less like if you had only one 
>> node (at performance level I mean). 
>> 
>> Also, writing to different datacenters probably induce some network 
>> latency. 
>> 
>> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you 
>> want some feedback about the 2500 w/s, and also give the mean size of your 
>> rows. 
>> 
>> Alain 
>> 
>> 
>> 2013/2/13 < ka...@comcast.net > 
>> 
>>> Hello, 
>>> New member here, and I have (yet another) question on write 
>>> performance. 
>>> 
>>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 
>>> 
>>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using 
>>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace 
>>> that's replicated to all nodes in both datacenters, with a consistency 
>>> level 
>>> of LOCAL_QUORUM. 
>>> 
>>> I'm seeing write performance of around 2500 rows per second. 
>>> 
>>> Is this in the ballpark for this kind of configuration? 
>>> 
>>> Thanks in advance. 
>>> 
>>> Ken 
>>> 
>> 
> 
> 
> 
> -- 
> Tyler Hobbs 
> DataStax 





Re: data model advice needed

2013-02-27 Thread kadey
One possibility would be to use dynamic columns, with each column name being a 
composite made from a timestamp, and the value of each containing serialized 
json of the details. The host could be the key. Then you could slice the data 
by column name. 




Ken 
- Original Message -
From: "Hans-Peter Sloot"  
To: user@cassandra.apache.org 
Sent: Wednesday, February 27, 2013 1:01:24 PM 
Subject: data model advice needed 

Hi, 

I would like to get some advice on how to model columnfamilies for storing log 
of firewalls. 
The columns are listed further below. 
All the possibilities confuse me a bit (super columns, secondary indexes etc). 

My main question is how can I create the columnfamily in order to be able to 
get slices of data by the timestamp column combined with 
getting this data for a specific host or some other column 

In sql this would be select * from traffic where ts between ... and .. and host 
= ' xxx' and source_ip = 'xx.xx.xx.xx' and severity = 'xx' 

Probably any combination can be usefull (the slice/between and host probably 
the most important. 

Hopefully someone can shed some light. 

Regards Hans-Peter 

CREATE COLUMNFAMILY traffic 
(key uuid primary key, 
host varchar, 
facility varchar, 
priority varchar, 
severity varchar, 
tag varchar, 
ts timestamp, 
program varchar, 
msg varchar, 
protocol varchar, 
policy varchar, 
sourcezone varchar, 
sourceip varchar, 
destzone varchar, 
destip varchar, 
destport varchar 
); 





Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd 
voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij 
u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien 
de integriteit van het bericht niet veilig gesteld is middels verzending via 
internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de 
inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, 
geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden 
wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit 
bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten 
waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met 
uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos 
Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos 
toegezonden. 

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Atos Nederland B.V. group liability cannot be 
triggered for the message content. Although the sender endeavours to maintain a 
computer virus-free network, the sender does not warrant that this transmission 
is virus-free and will not be liable for any damages resulting from any virus 
transmitted. On all offers and agreements under which Atos Nederland B.V. 
supplies goods and/or services of whatever nature, the Terms of Delivery from 
Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly 
submitted to you on your request. 

Atos Nederland B.V. / Utrecht 
KvK Utrecht 30132762