On Tue, Apr 3, 2012 at 4:18 AM, Ben Coverston <ben.covers...@datastax.com>wrote:
> This is a difficult question to answer for a variety of reasons, but I'll > give it a try, maybe it will be helpful, maybe not. > > The most obvious problem with this is that Thrift is buffer based, not > streaming. That means that whatever the size of your chunk it needs to > be received, deserialized, and processed by cassandra within a timeframe > that we call the rpc_timeout (by default this is 10 seconds). > Thanks. I suspect that 'not streaming' is the key, and not just from the Cassandra side - our use case has a subtle assumption of streaming on the client side. We could chop it up in to buckets and put each one in a time ordered column, but that the defeats the purpose of why I was considering Cassandra - to avoid the latency of seeks in HDFS cheers > > Bigger buffers mean larger allocations, larger allocations mean that the > JVM is working harder, and is more prone to fragmentation on the heap. > > With mixed workloads (lots of high latency, large requests and many very > small low latency requests) larger buffers can also, over time, clog up the > thread pool in a way that can cause your shorter queries to have to wait > for your longer running queries to complete (to free up worker threads) > making everything slow. This isn't a problem unique to Cassandra, > everything that uses worker queues runs into some variant of this problem. > > As with everything else, you'll probably need to test your specific use > case to see what 'too big' is for you. > > On Mon, Apr 2, 2012 at 9:23 AM, Franc Carter <franc.car...@sirca.org.au>wrote: > >> >> Hi, >> >> We are in the early stages of thinking about a project that needs to >> store data that will be accessed by Hadoop. One of the concerns we have is >> around the Latency of HDFS as our use case is is not for reading all the >> data and hence we will need custom RecordReaders etc. >> >> I've seen a couple of comments that you shouldn't put large chunks in to >> a value - however 'large' is not well defined for the range of people using >> these solutions ;-) >> >> Doe anyone have a rough rule of thumb for how big a single value can be >> before we are outside sanity? >> >> thanks >> >> -- >> >> *Franc Carter* | Systems architect | Sirca Ltd >> <marc.zianideferra...@sirca.org.au> >> >> franc.car...@sirca.org.au | www.sirca.org.au >> >> Tel: +61 2 9236 9118 >> >> Level 9, 80 Clarence St, Sydney NSW 2000 >> >> PO Box H58, Australia Square, Sydney NSW 1215 >> >> > > > -- > Ben Coverston > DataStax -- The Apache Cassandra Company > > -- *Franc Carter* | Systems architect | Sirca Ltd <marc.zianideferra...@sirca.org.au> franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215