I wasn't aware of CompositeColumns, thanks for the tip. However I think it still doesn't allow me to do the query I need - basically I need to do a timestamp range query, limiting only to certain file names at each timestamp. With BOP and a separate row for each timestamp, prefixed by a random UUID, and file names as column names, I can do this query. With CompositeColumns, I can only query one contiguous range, so I'd have to know the timestamps before hand to limit the file names. I can resolve this using indexes, but on paper it looks like this would be significantly slower (it would take me 5 round trips instead of 3 to complete each query, and the query is made multiple times on every single client request).
The two down sides I've seen listed for BOP are balancing issues and hotspots. I can understand why RP is recommended, from the balancing issues alone. However these aren't problems for my application. Is there anything else I am missing? Does the Cassandra team plan on continuing to support BOP? I haven't completely ruled out RP, but I like having BOP as an option, it opens up interesting modeling alternatives that I think have real advantages for some (if uncommon) applications. Thanks, Bryce On Wed, 21 Dec 2011 08:08:16 +1300 aaron morton <aa...@thelastpickle.com> wrote: > Bryce, > Have you considered using CompositeColumns and a standard CF? > Row key is the UUID column name is (timestamp : dir_entry) you can > then slice all columns with a particular time stamp. > > Even if you have a random key, I would use the RP unless you > have an extreme use case. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 3:06 AM, Bryce Allen wrote: > > > I think it comes down to how much you benefit from row range scans, > > and how confident you are that going forward all data will continue > > to use random row keys. > > > > I'm considering using BOP as a way of working around the non indexes > > super column limitation. In my current schema, row keys are random > > UUIDs, super column names are timestamps, and columns contain a > > snapshot in time of directory contents, and could be quite large. If > > instead I use row keys that are (uuid)-(timestamp), and use a > > standard column family, I can do a row range query and select only > > specific columns. I'm still evaluating if I can do this with BOP - > > ideally the token would just use the first 128 bits of the key, and > > I haven't found any documentation on how it compares keys of > > different length. > > > > Another trick with BOP is to use MD5(rowkey)-rowkey for data that > > has non uniform row keys. I think it's reasonable to use if most > > data is uniform and benefits from range scans, but a few things are > > added that aren't/don't. This trick does make the keys larger, > > which increases storage cost and IO load, so it's probably a bad > > idea if a significant subset of the data requires it. > > > > Disclaimer - I wrote that wiki article to fill in a documentation > > gap, since there were no examples of BOP and I wasted a lot of time > > before I noticed the hex byte array vs decimal distinction for > > specifying the initial tokens (which to be fair is documented, just > > easy to miss on a skim). I'm also new to cassandra, I'm just > > describing what makes sense to me "on paper". FWIW I confirmed that > > random UUIDs (type 4) row keys really do evenly distribute when > > using BOP. > > > > -Bryce > > > > On Mon, 19 Dec 2011 19:01:00 -0800 > > Drew Kutcharian <d...@venarc.com> wrote: > >> Hey Guys, > >> > >> I just came across > >> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got > >> me thinking. If the row keys are java.util.UUID which are generated > >> randomly (and securely), then what type of partitioner would be the > >> best? Since the key values are already random, would it make a > >> difference to use RandomPartitioner or one can use > >> ByteOrderedPartitioner or OrderPreservingPartitioning as well and > >> get the same result? > >> > >> -- Drew > >> >
signature.asc
Description: PGP signature