Oh.. To start with we're going to use from 2-10 nodes.. I think we're going to take the original strategy and just to use 100 buckets .. 0-99… then the timestamp under that.. I think it should be fine and won't require an ordered partitioner. :)
Thanks! On Sat, Jun 7, 2014 at 7:38 PM, Colin Clark <co...@clark.ws> wrote: > With 100 nodes, that ingestion rate is actually quite low and I don't > think you'd need another column in the partition key. > > You seem to be set in your current direction. Let us know how it works > out. > > -- > Colin > 320-221-9531 > > > On Jun 7, 2014, at 9:18 PM, Kevin Burton <bur...@spinn3r.com> wrote: > > What's 'source' ? You mean like the URL? > > If source too random it's going to yield too many buckets. > > Ingestion rates are fairly high but not insane. About 4M inserts per > hour.. from 5-10GB… > > > On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark <co...@clark.ws> wrote: > >> Not if you add another column to the partition key; source for example. >> >> I would really try to stay away from the ordered partitioner if at all >> possible. >> >> What ingestion rates are you expecting, in size and speed. >> >> -- >> Colin >> 320-221-9531 >> >> >> On Jun 7, 2014, at 9:05 PM, Kevin Burton <bur...@spinn3r.com> wrote: >> >> >> Thanks for the feedback on this btw.. .it's helpful. My notes below. >> >> On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark <co...@clark.ws> wrote: >> >>> No, you're not-the partition key will get distributed across the cluster >>> if you're using random or murmur. >>> >> >> Yes… I'm aware. But in practice this is how it will work… >> >> If we create bucket b0, that will get hashed to h0… >> >> So say I have 50 machines performing writes, they are all on the same >> time thanks to ntpd, so they all compute b0 for the current bucket based on >> the time. >> >> That gets hashed to h0… >> >> If h0 is hosted on node0 … then all writes go to node zero for that 1 >> second interval. >> >> So all my writes are bottlenecking on one node. That node is *changing* >> over time… but they're not being dispatched in parallel over N nodes. At >> most writes will only ever reach 1 node a time. >> >> >> >>> You could also ensure that by adding another column, like source to >>> ensure distribution. (Add the seconds to the partition key, not the >>> clustering columns) >>> >>> I can almost guarantee that if you put too much thought into working >>> against what Cassandra offers out of the box, that it will bite you later. >>> >>> >> Sure.. I'm trying to avoid the 'bite you later' issues. More so because >> I'm sure there are Cassandra gotchas to worry about. Everything has them. >> Just trying to avoid the land mines :-P >> >> >>> In fact, the use case that you're describing may best be served by a >>> queuing mechanism, and using Cassandra only for the underlying store. >>> >> >> Yes… that's what I'm doing. We're using apollo to fan out the queue, but >> the writes go back into cassandra and needs to be read out sequentially. >> >> >>> >>> I used this exact same approach in a use case that involved writing over >>> a million events/second to a cluster with no problems. Initially, I >>> thought ordered partitioner was the way to go too. And I used separate >>> processes to aggregate, conflate, and handle distribution to clients. >>> >> >> >> Yes. I think using 100 buckets will work for now. Plus I don't have to >> change the partitioner on our existing cluster and I'm lazy :) >> >> >>> >>> Just my two cents, but I also spend the majority of my days helping >>> people utilize Cassandra correctly, and rescuing those that haven't. >>> >>> >> Definitely appreciate the feedback! Thanks! >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> Skype: *burtonator* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> War is peace. Freedom is slavery. Ignorance is strength. Corporations are >> people. >> >> > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > Skype: *burtonator* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people. > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.