we're using containers for other reasons, not just cassandra. Tightly constraining resources means we don't have to worry about cassandra , the JVM , or Linux doing something silly and using too many resources and taking down the whole box.
On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark <co...@clark.ws> wrote: > You won't need containers - running one instance of Cassandra in that > configuration will hum along quite nicely and will make use of the cores > and memory. > > I'd forget the raid anyway and just mount the disks separately (jbod) > > -- > Colin > 320-221-9531 > > > On Jun 7, 2014, at 10:02 PM, Kevin Burton <bur...@spinn3r.com> wrote: > > Right now I'm just putting everything together as a proof of concept… so > just two cheap replicas for now. And it's at 1/10000th of the load. > > If we lose data it's ok :) > > I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 > cores, probably 48-64GB of RAM each box. > > Just one datacenter for now… > > We're probably going to be migrating to using linux containers at some > point. This way we can have like 16GB , one 400GB SSD, 4 cores for each > image. And we can ditch the RAID which is nice. :) > > > On Sat, Jun 7, 2014 at 7:51 PM, Colin <colpcl...@gmail.com> wrote: > >> To have any redundancy in the system, start with at least 3 nodes and a >> replication factor of 3. >> >> Try to have at least 8 cores, 32 gig ram, and separate disks for log and >> data. >> >> Will you be replicating data across data centers? >> >> -- >> Colin >> 320-221-9531 >> >> >> On Jun 7, 2014, at 9:40 PM, Kevin Burton <bur...@spinn3r.com> wrote: >> >> Oh.. To start with we're going to use from 2-10 nodes.. >> >> I think we're going to take the original strategy and just to use 100 >> buckets .. 0-99… then the timestamp under that.. I think it should be fine >> and won't require an ordered partitioner. :) >> >> Thanks! >> >> >> On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark <co...@clark.ws> wrote: >> >>> With 100 nodes, that ingestion rate is actually quite low and I don't >>> think you'd need another column in the partition key. >>> >>> You seem to be set in your current direction. Let us know how it works >>> out. >>> >>> -- >>> Colin >>> 320-221-9531 >>> >>> >>> On Jun 7, 2014, at 9:18 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>> >>> What's 'source' ? You mean like the URL? >>> >>> If source too random it's going to yield too many buckets. >>> >>> Ingestion rates are fairly high but not insane. About 4M inserts per >>> hour.. from 5-10GB… >>> >>> >>> On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark <co...@clark.ws> wrote: >>> >>>> Not if you add another column to the partition key; source for example. >>>> >>>> >>>> I would really try to stay away from the ordered partitioner if at all >>>> possible. >>>> >>>> What ingestion rates are you expecting, in size and speed. >>>> >>>> -- >>>> Colin >>>> 320-221-9531 >>>> >>>> >>>> On Jun 7, 2014, at 9:05 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>>> >>>> >>>> Thanks for the feedback on this btw.. .it's helpful. My notes below. >>>> >>>> On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark <co...@clark.ws> wrote: >>>> >>>>> No, you're not-the partition key will get distributed across the >>>>> cluster if you're using random or murmur. >>>>> >>>> >>>> Yes… I'm aware. But in practice this is how it will work… >>>> >>>> If we create bucket b0, that will get hashed to h0… >>>> >>>> So say I have 50 machines performing writes, they are all on the same >>>> time thanks to ntpd, so they all compute b0 for the current bucket based on >>>> the time. >>>> >>>> That gets hashed to h0… >>>> >>>> If h0 is hosted on node0 … then all writes go to node zero for that 1 >>>> second interval. >>>> >>>> So all my writes are bottlenecking on one node. That node is >>>> *changing* over time… but they're not being dispatched in parallel over N >>>> nodes. At most writes will only ever reach 1 node a time. >>>> >>>> >>>> >>>>> You could also ensure that by adding another column, like source to >>>>> ensure distribution. (Add the seconds to the partition key, not the >>>>> clustering columns) >>>>> >>>>> I can almost guarantee that if you put too much thought into working >>>>> against what Cassandra offers out of the box, that it will bite you later. >>>>> >>>>> >>>> Sure.. I'm trying to avoid the 'bite you later' issues. More so because >>>> I'm sure there are Cassandra gotchas to worry about. Everything has them. >>>> Just trying to avoid the land mines :-P >>>> >>>> >>>>> In fact, the use case that you're describing may best be served by a >>>>> queuing mechanism, and using Cassandra only for the underlying store. >>>>> >>>> >>>> Yes… that's what I'm doing. We're using apollo to fan out the queue, >>>> but the writes go back into cassandra and needs to be read out >>>> sequentially. >>>> >>>> >>>>> >>>>> I used this exact same approach in a use case that involved writing >>>>> over a million events/second to a cluster with no problems. Initially, I >>>>> thought ordered partitioner was the way to go too. And I used separate >>>>> processes to aggregate, conflate, and handle distribution to clients. >>>>> >>>> >>>> >>>> Yes. I think using 100 buckets will work for now. Plus I don't have to >>>> change the partitioner on our existing cluster and I'm lazy :) >>>> >>>> >>>>> >>>>> Just my two cents, but I also spend the majority of my days helping >>>>> people utilize Cassandra correctly, and rescuing those that haven't. >>>>> >>>>> >>>> Definitely appreciate the feedback! Thanks! >>>> >>>> -- >>>> >>>> Founder/CEO Spinn3r.com >>>> Location: *San Francisco, CA* >>>> Skype: *burtonator* >>>> blog: http://burtonator.wordpress.com >>>> … or check out my Google+ profile >>>> <https://plus.google.com/102718274791889610666/posts> >>>> <http://spinn3r.com> >>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations >>>> are people. >>>> >>>> >>> >>> >>> -- >>> >>> Founder/CEO Spinn3r.com >>> Location: *San Francisco, CA* >>> Skype: *burtonator* >>> blog: http://burtonator.wordpress.com >>> … or check out my Google+ profile >>> <https://plus.google.com/102718274791889610666/posts> >>> <http://spinn3r.com> >>> War is peace. Freedom is slavery. Ignorance is strength. Corporations >>> are people. >>> >>> >> >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> Skype: *burtonator* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >> people. >> >> > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > Skype: *burtonator* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people. > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.