Re: EC2 storage options for C*

2016-01-31 Thread Jeff Jirsa
Hi John,

We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes per 
second on 60 nodes, we didn’t come close to hitting even 50% utilization (10k 
is more than enough for most workloads). PIOPS is not necessary. 



From:  John Wong
Reply-To:  "user@cassandra.apache.org"
Date:  Saturday, January 30, 2016 at 3:07 PM
To:  "user@cassandra.apache.org"
Subject:  Re: EC2 storage options for C*

For production I'd stick with ephemeral disks (aka instance storage) if you 
have running a lot of transaction. 
However, for regular small testing/qa cluster, or something you know you want 
to reload often, EBS is definitely good enough and we haven't had issues 99%. 
The 1% is kind of anomaly where we have flush blocked.

But Jeff, kudo that you are able to use EBS. I didn't go through the video, do 
you actually use PIOPS or just standard GP2 in your production cluster?

On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng  wrote:
Yep, that motivated my question "Do you have any idea what kind of disk 
performance you need?". If you need the performance, its hard to beat ephemeral 
SSD in RAID 0 on EC2, and its a solid, battle tested configuration. If you 
don't, though, EBS GP2 will save a _lot_ of headache.

Personally, on small clusters like ours (12 nodes), we've found our choice of 
instance dictated much more by the balance of price, CPU, and memory. We're 
using GP2 SSD and we find that for our patterns the disk is rarely the 
bottleneck. YMMV, of course.

On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa  wrote:
If you have to ask that question, I strongly recommend m4 or c4 instances with 
GP2 EBS.  When you don’t care about replacing a node because of an instance 
failure, go with i2+ephemerals. Until then, GP2 EBS is capable of amazing 
things, and greatly simplifies life.

We gave a talk on this topic at both Cassandra Summit and AWS re:Invent: 
https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable option, 
despite any old documents online that say otherwise.



From: Eric Plowe
Reply-To: "user@cassandra.apache.org"
Date: Friday, January 29, 2016 at 4:33 PM
To: "user@cassandra.apache.org"
Subject: EC2 storage options for C*

My company is planning on rolling out a C* cluster in EC2. We are thinking 
about going with ephemeral SSDs. The question is this: Should we put two in 
RAID 0 or just go with one? We currently run a cluster in our data center with 
2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the performance we 
are seeing thus far.

Thanks!

Eric





smime.p7s
Description: S/MIME cryptographic signature


Re: EC2 storage options for C*

2016-01-31 Thread Jack Krupansky
How about reads? Any differences between read-intensive and write-intensive
workloads?

-- Jack Krupansky

On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa 
wrote:

> Hi John,
>
> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes
> per second on 60 nodes, we didn’t come close to hitting even 50%
> utilization (10k is more than enough for most workloads). PIOPS is not
> necessary.
>
>
>
> From: John Wong
> Reply-To: "user@cassandra.apache.org"
> Date: Saturday, January 30, 2016 at 3:07 PM
> To: "user@cassandra.apache.org"
> Subject: Re: EC2 storage options for C*
>
> For production I'd stick with ephemeral disks (aka instance storage) if
> you have running a lot of transaction.
> However, for regular small testing/qa cluster, or something you know you
> want to reload often, EBS is definitely good enough and we haven't had
> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>
> But Jeff, kudo that you are able to use EBS. I didn't go through the
> video, do you actually use PIOPS or just standard GP2 in your production
> cluster?
>
> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng 
> wrote:
>
>> Yep, that motivated my question "Do you have any idea what kind of disk
>> performance you need?". If you need the performance, its hard to beat
>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.
>>
>> Personally, on small clusters like ours (12 nodes), we've found our
>> choice of instance dictated much more by the balance of price, CPU, and
>> memory. We're using GP2 SSD and we find that for our patterns the disk is
>> rarely the bottleneck. YMMV, of course.
>>
>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa 
>> wrote:
>>
>>> If you have to ask that question, I strongly recommend m4 or c4
>>> instances with GP2 EBS.  When you don’t care about replacing a node because
>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is
>>> capable of amazing things, and greatly simplifies life.
>>>
>>> We gave a talk on this topic at both Cassandra Summit and AWS re:Invent:
>>> https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable
>>> option, despite any old documents online that say otherwise.
>>>
>>>
>>>
>>> From: Eric Plowe
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Friday, January 29, 2016 at 4:33 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: EC2 storage options for C*
>>>
>>> My company is planning on rolling out a C* cluster in EC2. We are
>>> thinking about going with ephemeral SSDs. The question is this: Should we
>>> put two in RAID 0 or just go with one? We currently run a cluster in our
>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with
>>> the performance we are seeing thus far.
>>>
>>> Thanks!
>>>
>>> Eric
>>>
>>
>>
>


Re: EC2 storage options for C*

2016-01-31 Thread Jeff Jirsa
Also in that video - it's long but worth watching

We tested up to 1M reads/second as well, blowing out page cache to ensure we 
weren't "just" reading from memory



-- 
Jeff Jirsa


> On Jan 31, 2016, at 9:52 AM, Jack Krupansky  wrote:
> 
> How about reads? Any differences between read-intensive and write-intensive 
> workloads?
> 
> -- Jack Krupansky
> 
>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa  
>> wrote:
>> Hi John,
>> 
>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes per 
>> second on 60 nodes, we didn’t come close to hitting even 50% utilization 
>> (10k is more than enough for most workloads). PIOPS is not necessary. 
>> 
>> 
>> 
>> From: John Wong
>> Reply-To: "user@cassandra.apache.org"
>> Date: Saturday, January 30, 2016 at 3:07 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: EC2 storage options for C*
>> 
>> For production I'd stick with ephemeral disks (aka instance storage) if you 
>> have running a lot of transaction.
>> However, for regular small testing/qa cluster, or something you know you 
>> want to reload often, EBS is definitely good enough and we haven't had 
>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>> 
>> But Jeff, kudo that you are able to use EBS. I didn't go through the video, 
>> do you actually use PIOPS or just standard GP2 in your production cluster?
>> 
>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng  wrote:
>>> Yep, that motivated my question "Do you have any idea what kind of disk 
>>> performance you need?". If you need the performance, its hard to beat 
>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested 
>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.
>>> 
>>> Personally, on small clusters like ours (12 nodes), we've found our choice 
>>> of instance dictated much more by the balance of price, CPU, and memory. 
>>> We're using GP2 SSD and we find that for our patterns the disk is rarely 
>>> the bottleneck. YMMV, of course.
>>> 
 On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa  
 wrote:
 If you have to ask that question, I strongly recommend m4 or c4 instances 
 with GP2 EBS.  When you don’t care about replacing a node because of an 
 instance failure, go with i2+ephemerals. Until then, GP2 EBS is capable of 
 amazing things, and greatly simplifies life.
 
 We gave a talk on this topic at both Cassandra Summit and AWS re:Invent: 
 https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable 
 option, despite any old documents online that say otherwise.
 
 
 
 From: Eric Plowe
 Reply-To: "user@cassandra.apache.org"
 Date: Friday, January 29, 2016 at 4:33 PM
 To: "user@cassandra.apache.org"
 Subject: EC2 storage options for C*
 
 My company is planning on rolling out a C* cluster in EC2. We are thinking 
 about going with ephemeral SSDs. The question is this: Should we put two 
 in RAID 0 or just go with one? We currently run a cluster in our data 
 center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the 
 performance we are seeing thus far.
 
 Thanks!
 
 Eric
> 


smime.p7s
Description: S/MIME cryptographic signature


Re: EC2 storage options for C*

2016-01-31 Thread Eric Plowe
Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2
after testing is a viable contender for our workload. The only worry I have
is EBS outages, which have happened.

On Sunday, January 31, 2016, Jeff Jirsa  wrote:

> Also in that video - it's long but worth watching
>
> We tested up to 1M reads/second as well, blowing out page cache to ensure
> we weren't "just" reading from memory
>
>
>
> --
> Jeff Jirsa
>
>
> On Jan 31, 2016, at 9:52 AM, Jack Krupansky  > wrote:
>
> How about reads? Any differences between read-intensive and
> write-intensive workloads?
>
> -- Jack Krupansky
>
> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa  > wrote:
>
>> Hi John,
>>
>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes
>> per second on 60 nodes, we didn’t come close to hitting even 50%
>> utilization (10k is more than enough for most workloads). PIOPS is not
>> necessary.
>>
>>
>>
>> From: John Wong
>> Reply-To: "user@cassandra.apache.org
>> "
>> Date: Saturday, January 30, 2016 at 3:07 PM
>> To: "user@cassandra.apache.org
>> "
>> Subject: Re: EC2 storage options for C*
>>
>> For production I'd stick with ephemeral disks (aka instance storage) if
>> you have running a lot of transaction.
>> However, for regular small testing/qa cluster, or something you know you
>> want to reload often, EBS is definitely good enough and we haven't had
>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>>
>> But Jeff, kudo that you are able to use EBS. I didn't go through the
>> video, do you actually use PIOPS or just standard GP2 in your production
>> cluster?
>>
>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng > > wrote:
>>
>>> Yep, that motivated my question "Do you have any idea what kind of disk
>>> performance you need?". If you need the performance, its hard to beat
>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.
>>>
>>> Personally, on small clusters like ours (12 nodes), we've found our
>>> choice of instance dictated much more by the balance of price, CPU, and
>>> memory. We're using GP2 SSD and we find that for our patterns the disk is
>>> rarely the bottleneck. YMMV, of course.
>>>
>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa >> > wrote:
>>>
 If you have to ask that question, I strongly recommend m4 or c4
 instances with GP2 EBS.  When you don’t care about replacing a node because
 of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is
 capable of amazing things, and greatly simplifies life.

 We gave a talk on this topic at both Cassandra Summit and AWS
 re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much
 a viable option, despite any old documents online that say otherwise.



 From: Eric Plowe
 Reply-To: "user@cassandra.apache.org
 "
 Date: Friday, January 29, 2016 at 4:33 PM
 To: "user@cassandra.apache.org
 "
 Subject: EC2 storage options for C*

 My company is planning on rolling out a C* cluster in EC2. We are
 thinking about going with ephemeral SSDs. The question is this: Should we
 put two in RAID 0 or just go with one? We currently run a cluster in our
 data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with
 the performance we are seeing thus far.

 Thanks!

 Eric

>>>
>>>
>>
>


Re: EC2 storage options for C*

2016-01-31 Thread Jeff Jirsa
Free to choose what you'd like, but EBS outages were also addressed in that 
video (second half, discussion by Dennis Opacki). 2016 EBS isn't the same as 
2011 EBS. 

-- 
Jeff Jirsa


> On Jan 31, 2016, at 8:27 PM, Eric Plowe  wrote:
> 
> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 
> after testing is a viable contender for our workload. The only worry I have 
> is EBS outages, which have happened. 
> 
>> On Sunday, January 31, 2016, Jeff Jirsa  wrote:
>> Also in that video - it's long but worth watching
>> 
>> We tested up to 1M reads/second as well, blowing out page cache to ensure we 
>> weren't "just" reading from memory
>> 
>> 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky  
>>> wrote:
>>> 
>>> How about reads? Any differences between read-intensive and write-intensive 
>>> workloads?
>>> 
>>> -- Jack Krupansky
>>> 
 On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa  
 wrote:
 Hi John,
 
 We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes 
 per second on 60 nodes, we didn’t come close to hitting even 50% 
 utilization (10k is more than enough for most workloads). PIOPS is not 
 necessary. 
 
 
 
 From: John Wong
 Reply-To: "user@cassandra.apache.org"
 Date: Saturday, January 30, 2016 at 3:07 PM
 To: "user@cassandra.apache.org"
 Subject: Re: EC2 storage options for C*
 
 For production I'd stick with ephemeral disks (aka instance storage) if 
 you have running a lot of transaction.
 However, for regular small testing/qa cluster, or something you know you 
 want to reload often, EBS is definitely good enough and we haven't had 
 issues 99%. The 1% is kind of anomaly where we have flush blocked.
 
 But Jeff, kudo that you are able to use EBS. I didn't go through the 
 video, do you actually use PIOPS or just standard GP2 in your production 
 cluster?
 
> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng  
> wrote:
> Yep, that motivated my question "Do you have any idea what kind of disk 
> performance you need?". If you need the performance, its hard to beat 
> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested 
> configuration. If you don't, though, EBS GP2 will save a _lot_ of 
> headache.
> 
> Personally, on small clusters like ours (12 nodes), we've found our 
> choice of instance dictated much more by the balance of price, CPU, and 
> memory. We're using GP2 SSD and we find that for our patterns the disk is 
> rarely the bottleneck. YMMV, of course.
> 
>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa  
>> wrote:
>> If you have to ask that question, I strongly recommend m4 or c4 
>> instances with GP2 EBS.  When you don’t care about replacing a node 
>> because of an instance failure, go with i2+ephemerals. Until then, GP2 
>> EBS is capable of amazing things, and greatly simplifies life.
>> 
>> We gave a talk on this topic at both Cassandra Summit and AWS re:Invent: 
>> https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable 
>> option, despite any old documents online that say otherwise.
>> 
>> 
>> 
>> From: Eric Plowe
>> Reply-To: "user@cassandra.apache.org"
>> Date: Friday, January 29, 2016 at 4:33 PM
>> To: "user@cassandra.apache.org"
>> Subject: EC2 storage options for C*
>> 
>> My company is planning on rolling out a C* cluster in EC2. We are 
>> thinking about going with ephemeral SSDs. The question is this: Should 
>> we put two in RAID 0 or just go with one? We currently run a cluster in 
>> our data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are 
>> happy with the performance we are seeing thus far.
>> 
>> Thanks!
>> 
>> Eric


smime.p7s
Description: S/MIME cryptographic signature


Re: EC2 storage options for C*

2016-01-31 Thread Eric Plowe
Jeff,

If EBS goes down, then EBS Gp2 will go down as well, no? I'm not
discounting EBS, but prior outages are worrisome.

On Sunday, January 31, 2016, Jeff Jirsa  wrote:

> Free to choose what you'd like, but EBS outages were also addressed in
> that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the
> same as 2011 EBS.
>
> --
> Jeff Jirsa
>
>
> On Jan 31, 2016, at 8:27 PM, Eric Plowe  > wrote:
>
> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2
> after testing is a viable contender for our workload. The only worry I have
> is EBS outages, which have happened.
>
> On Sunday, January 31, 2016, Jeff Jirsa  > wrote:
>
>> Also in that video - it's long but worth watching
>>
>> We tested up to 1M reads/second as well, blowing out page cache to ensure
>> we weren't "just" reading from memory
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky 
>> wrote:
>>
>> How about reads? Any differences between read-intensive and
>> write-intensive workloads?
>>
>> -- Jack Krupansky
>>
>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa 
>> wrote:
>>
>>> Hi John,
>>>
>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes
>>> per second on 60 nodes, we didn’t come close to hitting even 50%
>>> utilization (10k is more than enough for most workloads). PIOPS is not
>>> necessary.
>>>
>>>
>>>
>>> From: John Wong
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Saturday, January 30, 2016 at 3:07 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: EC2 storage options for C*
>>>
>>> For production I'd stick with ephemeral disks (aka instance storage) if
>>> you have running a lot of transaction.
>>> However, for regular small testing/qa cluster, or something you know you
>>> want to reload often, EBS is definitely good enough and we haven't had
>>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>>>
>>> But Jeff, kudo that you are able to use EBS. I didn't go through the
>>> video, do you actually use PIOPS or just standard GP2 in your production
>>> cluster?
>>>
>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng 
>>> wrote:
>>>
 Yep, that motivated my question "Do you have any idea what kind of
 disk performance you need?". If you need the performance, its hard to beat
 ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
 configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.

 Personally, on small clusters like ours (12 nodes), we've found our
 choice of instance dictated much more by the balance of price, CPU, and
 memory. We're using GP2 SSD and we find that for our patterns the disk is
 rarely the bottleneck. YMMV, of course.

 On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa >>> > wrote:

> If you have to ask that question, I strongly recommend m4 or c4
> instances with GP2 EBS.  When you don’t care about replacing a node 
> because
> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is
> capable of amazing things, and greatly simplifies life.
>
> We gave a talk on this topic at both Cassandra Summit and AWS
> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much
> a viable option, despite any old documents online that say otherwise.
>
>
>
> From: Eric Plowe
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, January 29, 2016 at 4:33 PM
> To: "user@cassandra.apache.org"
> Subject: EC2 storage options for C*
>
> My company is planning on rolling out a C* cluster in EC2. We are
> thinking about going with ephemeral SSDs. The question is this: Should we
> put two in RAID 0 or just go with one? We currently run a cluster in our
> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy 
> with
> the performance we are seeing thus far.
>
> Thanks!
>
> Eric
>


>>>
>>


Re: EC2 storage options for C*

2016-01-31 Thread Jeff Jirsa
Yes, but getting at why you think EBS is going down is the real point. New GM 
in 2011. Very different product. 35:40 in the video


-- 
Jeff Jirsa


> On Jan 31, 2016, at 9:57 PM, Eric Plowe  wrote:
> 
> Jeff,
> 
> If EBS goes down, then EBS Gp2 will go down as well, no? I'm not discounting 
> EBS, but prior outages are worrisome.
> 
>> On Sunday, January 31, 2016, Jeff Jirsa  wrote:
>> Free to choose what you'd like, but EBS outages were also addressed in that 
>> video (second half, discussion by Dennis Opacki). 2016 EBS isn't the same as 
>> 2011 EBS. 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Jan 31, 2016, at 8:27 PM, Eric Plowe  wrote:
>>> 
>>> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 
>>> after testing is a viable contender for our workload. The only worry I have 
>>> is EBS outages, which have happened. 
>>> 
 On Sunday, January 31, 2016, Jeff Jirsa  wrote:
 Also in that video - it's long but worth watching
 
 We tested up to 1M reads/second as well, blowing out page cache to ensure 
 we weren't "just" reading from memory
 
 
 
 -- 
 Jeff Jirsa
 
 
> On Jan 31, 2016, at 9:52 AM, Jack Krupansky  
> wrote:
> 
> How about reads? Any differences between read-intensive and 
> write-intensive workloads?
> 
> -- Jack Krupansky
> 
>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa  
>> wrote:
>> Hi John,
>> 
>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes 
>> per second on 60 nodes, we didn’t come close to hitting even 50% 
>> utilization (10k is more than enough for most workloads). PIOPS is not 
>> necessary. 
>> 
>> 
>> 
>> From: John Wong
>> Reply-To: "user@cassandra.apache.org"
>> Date: Saturday, January 30, 2016 at 3:07 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: EC2 storage options for C*
>> 
>> For production I'd stick with ephemeral disks (aka instance storage) if 
>> you have running a lot of transaction.
>> However, for regular small testing/qa cluster, or something you know you 
>> want to reload often, EBS is definitely good enough and we haven't had 
>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>> 
>> But Jeff, kudo that you are able to use EBS. I didn't go through the 
>> video, do you actually use PIOPS or just standard GP2 in your production 
>> cluster?
>> 
>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng  
>>> wrote:
>>> Yep, that motivated my question "Do you have any idea what kind of disk 
>>> performance you need?". If you need the performance, its hard to beat 
>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested 
>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of 
>>> headache.
>>> 
>>> Personally, on small clusters like ours (12 nodes), we've found our 
>>> choice of instance dictated much more by the balance of price, CPU, and 
>>> memory. We're using GP2 SSD and we find that for our patterns the disk 
>>> is rarely the bottleneck. YMMV, of course.
>>> 
 On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa 
  wrote:
 If you have to ask that question, I strongly recommend m4 or c4 
 instances with GP2 EBS.  When you don’t care about replacing a node 
 because of an instance failure, go with i2+ephemerals. Until then, GP2 
 EBS is capable of amazing things, and greatly simplifies life.
 
 We gave a talk on this topic at both Cassandra Summit and AWS 
 re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much 
 a viable option, despite any old documents online that say otherwise.
 
 
 
 From: Eric Plowe
 Reply-To: "user@cassandra.apache.org"
 Date: Friday, January 29, 2016 at 4:33 PM
 To: "user@cassandra.apache.org"
 Subject: EC2 storage options for C*
 
 My company is planning on rolling out a C* cluster in EC2. We are 
 thinking about going with ephemeral SSDs. The question is this: Should 
 we put two in RAID 0 or just go with one? We currently run a cluster 
 in our data center with 2 250gig Samsung 850 EVO's in RAID 0 and we 
 are happy with the performance we are seeing thus far.
 
 Thanks!
 
 Eric


smime.p7s
Description: S/MIME cryptographic signature


Re: EC2 storage options for C*

2016-01-31 Thread Eric Plowe
http://m.theregister.co.uk/2013/08/26/amazon_ebs_cloud_problems/

That's what I'm worried about. Granted that's an article from 2013, and While
the the general purpose EBS volumes are performant for a production C*
workload, I'm worried about EBS outages. If EBS is down, my cluster is
down.

On Monday, February 1, 2016, Jeff Jirsa  wrote:

> Yes, but getting at why you think EBS is going down is the real point. New
> GM in 2011. Very different product. 35:40 in the video
>
>
> --
> Jeff Jirsa
>
>
> On Jan 31, 2016, at 9:57 PM, Eric Plowe  > wrote:
>
> Jeff,
>
> If EBS goes down, then EBS Gp2 will go down as well, no? I'm not
> discounting EBS, but prior outages are worrisome.
>
> On Sunday, January 31, 2016, Jeff Jirsa  > wrote:
>
>> Free to choose what you'd like, but EBS outages were also addressed in
>> that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the
>> same as 2011 EBS.
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Jan 31, 2016, at 8:27 PM, Eric Plowe  wrote:
>>
>> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2
>> after testing is a viable contender for our workload. The only worry I have
>> is EBS outages, which have happened.
>>
>> On Sunday, January 31, 2016, Jeff Jirsa 
>> wrote:
>>
>>> Also in that video - it's long but worth watching
>>>
>>> We tested up to 1M reads/second as well, blowing out page cache to
>>> ensure we weren't "just" reading from memory
>>>
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky 
>>> wrote:
>>>
>>> How about reads? Any differences between read-intensive and
>>> write-intensive workloads?
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa 
>>> wrote:
>>>
 Hi John,

 We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M
 writes per second on 60 nodes, we didn’t come close to hitting even 50%
 utilization (10k is more than enough for most workloads). PIOPS is not
 necessary.



 From: John Wong
 Reply-To: "user@cassandra.apache.org"
 Date: Saturday, January 30, 2016 at 3:07 PM
 To: "user@cassandra.apache.org"
 Subject: Re: EC2 storage options for C*

 For production I'd stick with ephemeral disks (aka instance storage) if
 you have running a lot of transaction.
 However, for regular small testing/qa cluster, or something you know
 you want to reload often, EBS is definitely good enough and we haven't had
 issues 99%. The 1% is kind of anomaly where we have flush blocked.

 But Jeff, kudo that you are able to use EBS. I didn't go through the
 video, do you actually use PIOPS or just standard GP2 in your production
 cluster?

 On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng 
 wrote:

> Yep, that motivated my question "Do you have any idea what kind of
> disk performance you need?". If you need the performance, its hard to beat
> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
> configuration. If you don't, though, EBS GP2 will save a _lot_ of 
> headache.
>
> Personally, on small clusters like ours (12 nodes), we've found our
> choice of instance dictated much more by the balance of price, CPU, and
> memory. We're using GP2 SSD and we find that for our patterns the disk is
> rarely the bottleneck. YMMV, of course.
>
> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <
> jeff.ji...@crowdstrike.com> wrote:
>
>> If you have to ask that question, I strongly recommend m4 or c4
>> instances with GP2 EBS.  When you don’t care about replacing a node 
>> because
>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is
>> capable of amazing things, and greatly simplifies life.
>>
>> We gave a talk on this topic at both Cassandra Summit and AWS
>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very
>> much a viable option, despite any old documents online that say 
>> otherwise.
>>
>>
>>
>> From: Eric Plowe
>> Reply-To: "user@cassandra.apache.org"
>> Date: Friday, January 29, 2016 at 4:33 PM
>> To: "user@cassandra.apache.org"
>> Subject: EC2 storage options for C*
>>
>> My company is planning on rolling out a C* cluster in EC2. We are
>> thinking about going with ephemeral SSDs. The question is this: Should we
>> put two in RAID 0 or just go with one? We currently run a cluster in our
>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy 
>> with
>> the performance we are seeing thus far.
>>
>> Thanks!
>>
>> Eric
>>
>
>

>>>


Can't select count(*)

2016-01-31 Thread Ivan Zelensky
Hi all! I have a table with simple primary key (one field on primary key
only), and ~1 million records. Table stored on single-node C* 2.2.4.
Problem: when I'm trying to execute "SELECT count(*) FROM my_table;",
operation is timed out.
As I understand, 1 mln rows is not so big dataset to use MapRed to count
it, so I thing it is smth wrong with configuration.
Also I can't do "COPY TO/FROM" when dataset > 30 000 rows.
Also maybe it is too weak hadrware (AWS EC2 t2.small), but even on t2.large
I had timeout on COPY, just on 300 000 rows.

My configuration is default config from deb package. Maybe somebody know
what I should to tweak there?

Thank you.