Hi All,

I've run into a problem with empty partitions when the number of elements in a 
DataSet is less than the Degree of Parallelism.  I've created a gist here to 
describe it:


https://gist.github.com/andrewpalumbo/1768dac6d2f5fdf963abacabd859aaf3


I have two 2x2 matrices, Matrix A and Matrix B and an execution environment 
where the degree of parallelism is 4. Both matrices   are blockified in  2 
different DataSet s . In this case (the case of a 2x2 matrices with 4 
partitions) this means that each row goes into a partition leaving 2 empty 
partitions. In Matrix A, the rows go into partitions 0, 1. However the rows of 
Matrix B end up in partitions 1, 2. I assign the ordinal index of the 
blockified matrix's partition to its block, and then join on that index.


However in this case, with differently partitioned matrices of the same 
geometry, the intersection of the blockified matrices' indices is 1, and 
partitions 0 and 2 are dropped.


I've tried explicitly defining the dop for Matrix B using the count of 
non-empty partitions in Matrix A, however this changes the order of the 
DataSet, placing partition 2 into partition 0.


Is there a way to make sure that these datasets are partitioned in the same way?


Thank you,


Andy


Reply via email to