On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo <r...@pythian.com> wrote:
> Hi Jeff, > > Could you expand: "Tables without clustering keys are often deceptively > expensive to compact, as a lot of work (relative to the other cell > boundaries) happens on partition boundaries." This is something I didn't > know and highly interesting to know more about! > > > We do a lot "by partition". We build column indexes by partition. We update the partition index on each partition. We invalidate key cache by partition. They're not super expensive, but they take time, and tables with tiny partitions can actually be slower to compact. There's no magic cutoff where it does/doesn't make sense, my comment is mostly a warning that the edges of the "normal" use cases tend to be less optimized than the common case. Having a table with a hundred billion records, where the key is numeric and the value is a single byte (let's say you're keeping track of whether or not a specific sensor has ever detected some magic event, and you have 100B sensors, that table will be close to the worst-case example of this behavior).