I've been trying to work with making Catalyst Cassandra partitioning aware.
There seem to be two major blocks on this.

The first is that DataSourceScanExec is unable to learn what the underlying
partitioning should be from the BaseRelation it comes from. I'm currently
able to get around this by using the DataSourceStrategy plan and then
transforming the resultant DataSourceScanExec.

The second is that the Partitioning trait is sealed. I want to define a new
partitioning which is Clustered but is not hashed based on certain columns.
It would look almost identical to the HashPartitioning class except the
expression which returns a valid PartitionID given expressions would be
different.

Anyone have any ideas on how to get around the second issue? Would it be
worth while to make changes to allow BaseRelations to advertise a
particular Partitioner?

Reply via email to