well you could add milliseconds, at best you're still bottlenecking most of your writes one one box.. maybe 2-3 if there are ones that are lagging.
Anyway.. I think using 100 buckets is probably fine.. Kevin On Sat, Jun 7, 2014 at 2:45 PM, Colin <colpcl...@gmail.com> wrote: > The add seconds to the bucket. Also, the data will get cached-it's not > going to hit disk on every read. > > Look at the key cache settings on the table. Also, in 2.1 you have even > more control over caching. > > -- > Colin > 320-221-9531 > > > On Jun 7, 2014, at 4:30 PM, Kevin Burton <bur...@spinn3r.com> wrote: > > > On Sat, Jun 7, 2014 at 1:34 PM, Colin <colpcl...@gmail.com> wrote: > >> Maybe it makes sense to describe what you're trying to accomplish in more >> detail. >> >> > Essentially , I'm appending writes of recent data by our crawler and > sending that data to our customers. > > They need to sync to up to date writes…we need to get them writes within > seconds. > > A common bucketing approach is along the lines of year, month, day, hour, >> minute, etc and then use a timeuuid as a cluster column. >> >> > I mean that is acceptable.. but that means for that 1 minute interval, all > writes are going to that one node (and its replicas) > > So that means the total cluster throughput is bottlenecked on the max disk > throughput. > > Same thing for reads… unless our customers are lagged, they are all going > to stampede and ALL of them are going to read data from one node, in a one > minute timeframe. > > That's no fun.. that will easily DoS our cluster. > > >> Depending upon the semantics of the transport protocol you plan on >> utilizing, either the client code keep track of pagination, or the app >> server could, if you utilized some type of request/reply/ack flow. You >> could keep sequence numbers for each client, and begin streaming data to >> them or allowing query upon reconnect, etc. >> >> But again, more details of the use case might prove useful. >> >> > I think if we were to just 100 buckets it would probably work just fine. > We're probably not going to be more than 100 nodes in the next year and if > we are that's still reasonable performance. > > I mean if each box has a 400GB SSD that's 40TB of VERY fast data. > > Kevin > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > Skype: *burtonator* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people. > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.