Re: EC2 storage options for C*

Jack Krupansky Mon, 01 Feb 2016 05:53:05 -0800

Oops... that was supposed to be "not a fan of video"! I have no problem
with the guys in the video!


-- Jack Krupansky

On Mon, Feb 1, 2016 at 8:51 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> I'm not a fan of guy - this appears to be the slideshare corresponding to
> the video:
>
> http://www.slideshare.net/AmazonWebServices/bdt323-amazon-ebs-cassandra-1-million-writes-per-second
>
> My apologies if my questions are actually answered on the video or slides,
> I just did a quick scan of the slide text.
>
> I'm curious where the EBS physical devices actually reside - are they in
> the same rack, the same data center, same availability zone? I mean, people
> try to minimize network latency between nodes, so how exactly is EBS able
> to avoid network latency?
>
> Did your test use Amazon EBS–Optimized Instances?
>
> SSD or magnetic or does it make any difference?
>
> What info is available on EBS performance at peak times, when multiple AWS
> customers have spikes of demand?
>
> Is RAID much of a factor or help at all using EBS?
>
> How exactly is EBS provisioned in terms of its own HA - I mean, with a
> properly configured Cassandra cluster RF provides HA, so what is the
> equivalent for EBS? If I have RF=3, what assurance is there that those
> three EBS volumes aren't all in the same physical rack?
>
> For multi-data center operation, what configuration options assure that
> the EBS volumes for each DC are truly physically separated?
>
> In terms of syncing data for the commit log, if the OS call to sync an EBS
> volume returns, is the commit log data absolutely 100% synced at the
> hardware level on the EBS end, such that a power failure of the systems on
> which the EBS volumes reside will still guarantee availability of the
> fsynced data. As well, is return from fsync an absolute guarantee of
> sstable durability when Cassandra is about to delete the commit log,
> including when the two are on different volumes? In practice, we would like
> some significant degree of pipelining of data, such as during the full
> processing of flushing memtables, but for the fsync at the end a solid
> guarantee is needed.
>
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe <eric.pl...@gmail.com> wrote:
>
>> Jeff,
>>
>> If EBS goes down, then EBS Gp2 will go down as well, no? I'm not
>> discounting EBS, but prior outages are worrisome.
>>
>>
>> On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> Free to choose what you'd like, but EBS outages were also addressed in
>>> that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the
>>> same as 2011 EBS.
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.pl...@gmail.com> wrote:
>>>
>>> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral.
>>> GP2 after testing is a viable contender for our workload. The only worry I
>>> have is EBS outages, which have happened.
>>>
>>> On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>>> Also in that video - it's long but worth watching
>>>>
>>>> We tested up to 1M reads/second as well, blowing out page cache to
>>>> ensure we weren't "just" reading from memory
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com>
>>>> wrote:
>>>>
>>>> How about reads? Any differences between read-intensive and
>>>> write-intensive workloads?
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com
>>>> > wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M
>>>>> writes per second on 60 nodes, we didn’t come close to hitting even 50%
>>>>> utilization (10k is more than enough for most workloads). PIOPS is not
>>>>> necessary.
>>>>>
>>>>>
>>>>>
>>>>> From: John Wong
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Saturday, January 30, 2016 at 3:07 PM
>>>>> To: "user@cassandra.apache.org"
>>>>> Subject: Re: EC2 storage options for C*
>>>>>
>>>>> For production I'd stick with ephemeral disks (aka instance storage)
>>>>> if you have running a lot of transaction.
>>>>> However, for regular small testing/qa cluster, or something you know
>>>>> you want to reload often, EBS is definitely good enough and we haven't had
>>>>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>>>>>
>>>>> But Jeff, kudo that you are able to use EBS. I didn't go through the
>>>>> video, do you actually use PIOPS or just standard GP2 in your production
>>>>> cluster?
>>>>>
>>>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <br...@blockcypher.com>
>>>>> wrote:
>>>>>
>>>>>> Yep, that motivated my question "Do you have any idea what kind of
>>>>>> disk performance you need?". If you need the performance, its hard to 
>>>>>> beat
>>>>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
>>>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of 
>>>>>> headache.
>>>>>>
>>>>>> Personally, on small clusters like ours (12 nodes), we've found our
>>>>>> choice of instance dictated much more by the balance of price, CPU, and
>>>>>> memory. We're using GP2 SSD and we find that for our patterns the disk is
>>>>>> rarely the bottleneck. YMMV, of course.
>>>>>>
>>>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <
>>>>>> jeff.ji...@crowdstrike.com> wrote:
>>>>>>
>>>>>>> If you have to ask that question, I strongly recommend m4 or c4
>>>>>>> instances with GP2 EBS.  When you don’t care about replacing a node 
>>>>>>> because
>>>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is
>>>>>>> capable of amazing things, and greatly simplifies life.
>>>>>>>
>>>>>>> We gave a talk on this topic at both Cassandra Summit and AWS
>>>>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very
>>>>>>> much a viable option, despite any old documents online that say 
>>>>>>> otherwise.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From: Eric Plowe
>>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>>> Date: Friday, January 29, 2016 at 4:33 PM
>>>>>>> To: "user@cassandra.apache.org"
>>>>>>> Subject: EC2 storage options for C*
>>>>>>>
>>>>>>> My company is planning on rolling out a C* cluster in EC2. We are
>>>>>>> thinking about going with ephemeral SSDs. The question is this: Should 
>>>>>>> we
>>>>>>> put two in RAID 0 or just go with one? We currently run a cluster in our
>>>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy 
>>>>>>> with
>>>>>>> the performance we are seeing thus far.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>

Re: EC2 storage options for C*

Reply via email to