Jeff: the gp2 drives are expensive, especially if you have to make them unnecessarily large to get the IOPS, and I want to get cheap per node as possible to get as many nodes as possible.
i3 + a cheap rust backup beats an m5 or similar one + EBS gp2 in cost when i did the numbers Ben: Going to s3 would be even cheaper and probably about the same speed, I think I was avoiding it for the network cost and throttling/not throttling, but if it is cheap enough vs the rust EBS then I'll do that. I think I came across your page when doing earlier research. Jon: I have my own thing that is very similar to medusa but supports our wonky various modes of access (bastions, ipv6, etc). Very similar with comparative incremental backups and the like. The backups run at scheduled times, but my rewrite would enable a more local strategy by watching the sstabledirs. The restore modes of medusa are better in some respects, but I can do more complicated things too. I'm trying to abstract access mode (k8/ssh/etc), cloud, and even tech (kafka/cassandra) in a rewrite and it is damn hard to avoid leakage of abstractions Reid: possibly we could but the ebs snapshot needs to do the 100G's every time, while various sstable copies/incremental backups just do the new files so the raw amount of bits being saved is just faster and more resiliant Thank you everyone, at least with all you bigwigs giving advice I can argue from appeal to authority to management :-) (which is always more effective than arguing from reason or evidence) On Fri, Dec 6, 2019 at 9:18 AM Reid Pinchback <[email protected]> wrote: > Correction: “most of your database will be in chunk cache, or buffer > cache anyways. > > > > *From: *Reid Pinchback <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, December 6, 2019 at 10:16 AM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: AWS ephemeral instances + backup > > > > *Message from External Sender* > > If you’re only going to have a small storage footprint per node like > 100gb, another option comes to mind. Use an instance type with large ram. > Use an EBS storage volume on an EBS-optimized instance type, and take EBS > snapshots. Most of your database will be in chunk cache anyways, so you > only need to make sure that the dirty background writer is keeping up. I’d > take a look at iowait during a snapshot and see if the results are > acceptable for a running node. Even if it is marginal, if you’re only > snapshotting one node at a time, then speculative retry would just skip > over the temporary slowpoke. > > > > *From: *Carl Mueller <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Thursday, December 5, 2019 at 3:21 PM > *To: *"[email protected]" <[email protected]> > *Subject: *AWS ephemeral instances + backup > > > > *Message from External Sender* > > Does anyone have experience tooling written to support this strategy: > > Use case: run cassandra on i3 instances on ephemerals but synchronize the > sstables and commitlog files to the cheapest EBS volume type (those have > bad IOPS but decent enough throughput) > > On node replace, the startup script for the node, back-copies the sstables > and commitlog state from the EBS to the ephemeral. > > As can be seen: > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html > <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=> > > the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS > volumes presumably) that would incur about a ten minute delay for node > replacement for a 1TB node, but I imagine this would only be used on higher > IOPS r/w nodes with smaller densities, so 100GB would be about a minute of > delay only, already within the timeframes of an AWS node > replacement/instance restart. > > >
