Re: Flink Iterations vs. While loop

2016-09-09 Thread Drewes, Dan Benedikt
Hi Till, you're right, my implementation wouldn't scale well for a very large number of features.Thank you for that hint! However, i'm not using that much features, so this shouldn't be the cause for the strange behaviour. Yes, the 30 Minutes is the time for all jobs together. It's the time di

Re: Flink Iterations vs. While loop

2016-09-07 Thread Till Rohrmann
Hi Dan, first a general remark: I fear that your L-BFGS implementation is not well suited for large scale problems. You might wanna take a look at [1]. In the case of the while loop solution you're actually executing n jobs with n being the number of iterations. Thus, you have to add the executio

Re: Flink Iterations vs. While loop

2016-09-07 Thread Till Rohrmann
Usually, the while loop solution should perform much worse since it will execute with each new iteration all previous iterations steps without persisting the intermediate results. Thus, it should have a quadratic complexity in terms of iteration step operations instead of a linear complexity. Addit

Re: Flink Iterations vs. While loop

2016-09-06 Thread Theodore Vasiloudis
Have you tried profiling the application to see where most of the time is spent during the runs? If most of the time is spent reading in the data maybe any difference between the two methods is being obscured. -- Sent from a mobile device. May contain autocorrect errors. On Sep 6, 2016 4:55 PM,

Re: Flink Iterations vs. While loop

2016-09-06 Thread Greg Hogan
Hi Dan, Flink currently allocates each task slot an equal portion of managed memory. I don't know the best way to count task slots. https://ci.apache.org/projects/flink/flink-docs-master/concepts/index.html#workers-slots-resources If you assign TaskManagers less memory then Linux will use the me

Re: Flink Iterations vs. While loop

2016-09-06 Thread Dan Drewes
Hi, I am not broadcasting the data but the model, i.e. the weight vector contained in the "State". You are right, it would be better for the implementation with the while loop to have the data on HDFS. But that's exactly the point of my question: Why are the Flink Iterations not faster if you do

Re: Flink Iterations vs. While loop

2016-09-05 Thread Theodore Vasiloudis
Hello Dan, are you broadcasting the 85GB of data then? I don't get why you wouldn't store that file on HDFS so it's accessible by your workers. If you have the full code available somewhere we might be able to help better. For L-BFGS you should only be broadcasting the model (i.e. the weight ve

Re: Flink Iterations vs. While loop

2016-09-02 Thread Dan Drewes
Hi Greg, thanks for your response! I just had a look and realized that it's just about 85 GB of data. Sorry about that wrong information. It's read from a csv file on the master node's local file system. The 8 nodes have more than 40 GB available memory each and since the data is equally di

Re: Flink Iterations vs. While loop

2016-09-02 Thread Greg Hogan
Hi Dan, Where are you reading the 200 GB "data" from? How much memory per node? If the DataSet is read from a distributed filesystem and if with iterations Flink must spill to disk then I wouldn't expect much difference. About how many iterations are run in the 30 minutes? I don't know that this i