I've been trying to work with making Catalyst Cassandra partitioning aware. There seem to be two major blocks on this.
The first is that DataSourceScanExec is unable to learn what the underlying partitioning should be from the BaseRelation it comes from. I'm currently able to get around this by using the DataSourceStrategy plan and then transforming the resultant DataSourceScanExec. The second is that the Partitioning trait is sealed. I want to define a new partitioning which is Clustered but is not hashed based on certain columns. It would look almost identical to the HashPartitioning class except the expression which returns a valid PartitionID given expressions would be different. Anyone have any ideas on how to get around the second issue? Would it be worth while to make changes to allow BaseRelations to advertise a particular Partitioner?