Hi Jonathan, I've never seen a paper that discusses it as a primary topic, it is always in some other context. IIRC, the most recent discussions of it I have seen have been in join algorithm literature from somewhere in Asia. MPP analytical databases often implement some form of skew adaptivity but there is no standard way because the design tradeoffs are context dependent. DB2 also has a non-adaptive technique for dealing with skew that should be simple to implement on Cassandra and might provide an 80/20 option (more on that a little further on).
Skew adaptivity is generally implemented with a mix of data structures along the lines of an adaptive quad-tree. The reason you only see this in analytical databases is that the data skew is unlikely to change much and/or have too much concurrent updating. If the distribution radically changes all of the time under high concurrency, it will create some mix of resource contention, lost selectivity, or runaway space consumption depending on implementation detail. The optimal mix of pain tends to be a compile-time option, so it isn't very flexible. Definitely not optimal for concurrent OLTP-ish workloads. Alternatively: IBM's DB2 has a couple different data organization options that essentially define partitionable skew invariants. The closer the real data distribution is to the skew invariant, the more access performance is like a distributed hash table. For many data models, the data distribution skew can be roughly predicted ahead of time and for those applications it is relatively efficient. You can take this pretty far; DB2 has libraries of skew invariants based on irregular Voronoi tesselation and I believe the most recent version of SQL Server has something similar. I think they even have tools for sampling representative data for the purposes of finding a good skew invariant. It is not much but I hope that helps. Data distribution skew handling only partly mitigates data access skew issues (e.g. temporal data) but it is better than nothing. -- J. Andrew Rogers On Tue, Aug 24, 2010 at 10:30 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > What are some good papers to read for background? > > On Tue, Aug 24, 2010 at 12:26 PM, J. Andrew Rogers > <jar.mail...@gmail.com> wrote: >> On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong <hie...@vng.com.vn> wrote: >>> OrderPreservingPartitioner is efficient range queries but can cause >>> unevently distributed data. Does anyone has an idea of a >>> HybridPartitioner which takes advantages of both RandomPartitioner >>> and OPP, or at least a partitioner trade off between them. >> >> >> What you are looking for is skew adaptive partitioning i.e. like a >> B+Tree except distributable. >> >> A couple different methods for doing something like this exist, but >> you rarely see them and they have their own (different) tradeoffs. To >> the best of my knowledge, implementation requires a fairly deep >> architectural commitment; it is more involved than simply defining a >> partitioning function and the "adaptive" aspect must be distribution >> friendly. It is an active area of research in the literature with no >> obvious and simple solutions that can be lashed onto a database engine >> "as is". >> >> -- >> J. Andrew Rogers >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >