Re: Partition problem

2016-05-18 Thread Andrew Palumbo
51 AM To: dev@flink.apache.org Subject: Re: Partition problem Hi Andrew, I think in the end it boils down to counting the number of rows/finding the maximum index in the set of rows if you want to partition your matrix into blocks where the row indices are monotonically increasing. Without

Re: Partition problem

2016-05-17 Thread Till Rohrmann
ion 0 will have data and partition 1 > will > > have data. > > > > > > @till, I see what you did in ALS, with a Custom partitioner, Is there a > > way that I can write a custom partitioner to make sure that we have data > in > > the 0th and 1st partition?

Re: Partition problem

2016-05-17 Thread Fabian Hueske
ere are 2 of 4 partitions being used. > > > From: Andrew Palumbo > Sent: Saturday, May 14, 2016 3:25:38 PM > To: dev@flink.apache.org > Subject: Re: Partition problem > > Hi Till and Fabian, > I had to come back to this proble

Re: Partition problem

2016-05-14 Thread Andrew Palumbo
, 2016 3:25:38 PM To: dev@flink.apache.org Subject: Re: Partition problem Hi Till and Fabian, I had to come back to this problem because we're putting out a maintenance release soon. I think I overcomplicated the issue here. I don't need equal partitions. All that I need is to ensure th

Re: Partition problem

2016-05-14 Thread Andrew Palumbo
#x27;t see much documentation for custom partitioners. Thanks. Andy From: Till Rohrmann Sent: Tuesday, April 26, 2016 9:39:41 AM To: dev@flink.apache.org Subject: Re: Partition problem If you don’t know the size of your matrix, then you cannot partition it

Re: Partition problem

2016-04-26 Thread Till Rohrmann
; > > From: Andrew Palumbo > Sent: Monday, April 25, 2016 1:58 PM > To: dev@flink.apache.org > Subject: Re: Partition problem > > Thank you Fabian and Till for answering, > > I think that my explanation of the problem was a bit ov

Re: Partition problem

2016-04-25 Thread Andrew Palumbo
ange(0) also: The Blockified representation is a `DataSet[(Array(K), Matrix)]`. Thanks From: Andrew Palumbo Sent: Monday, April 25, 2016 1:58 PM To: dev@flink.apache.org Subject: Re: Partition problem Thank you Fabian and Till for answering, I think t

Re: Partition problem

2016-04-25 Thread Andrew Palumbo
e able to join the two Bockified DataSets (of any size) in the correct order.. so maybe there is an other way to do this? Thanks again for your time. Andy From: Fabian Hueske Sent: Monday, April 25, 2016 6:09 AM To: dev@flink.apache.org Subject: Re: P

Re: Partition problem

2016-04-25 Thread Fabian Hueske
Hi Andrew, I might be wrong, but I think this problem is caused by an assumption of how Flink reads input data. In Flink, each InputSplit is not read by a new task and a split does not correspond to a partition. This is different from how Hadoop MR and Spark handle InputSplits. Instead, Flink cre

Re: Partition problem

2016-04-25 Thread Till Rohrmann
Hi Andrew, I think the problem is that you assume that both matrices have the same partitioning. If you guarantee that this is the case, then you can use the subtask index as the block index. But in the general case this is not true, and then you have to calculate the blocks by first assigning a b

Partition problem

2016-04-24 Thread Andrew Palumbo
Hi All, I've run into a problem with empty partitions when the number of elements in a DataSet is less than the Degree of Parallelism. I've created a gist here to describe it: https://gist.github.com/andrewpalumbo/1768dac6d2f5fdf963abacabd859aaf3 I have two 2x2 matrices, Matrix A and Matri