Hello Chesnay,
Thanks for the advice. I've begun adding multiple jobs per Python plan file
here: https://issues.apache.org/jira/browse/FLINK-5183 and
https://github.com/GEOFBOT/flink/tree/FLINK-5183
The functionality of the patch works. I am able to run multiple jobs per
file successfully, but th
Hello,
implementing collect() in python is not that trivial and the gain is
questionable. There is an inherent size limit (think 10mb), and it is
a bit at odds with the deployment model of the Python API.
Something easier would be to execute each iteration of the for-loop as a
separate job an
Hello,
I know that the reuse of the data set in my plan is causing the problem
(after one dictionary atom is learned using the data set "S", "S" is
updated for use with the next dictionary atom). When I comment out the line
updating the data set "S", I have no problem and the plan processing phase
Hi Ufuk,
The master instance of the cluster was also a m3.xlarge instance with 15 GB
RAM, which I would've expected to be enough. I have gotten the program to
run successfully on a personal virtual cluster where each node has 8 GB RAM
and where the master node was also a worker node, so the proble
The Python API is in alpha state currently, so we would have to check if it is
related specifically to that. Looping in Chesnay who worked on that.
The JVM GC error happens on the client side as that's where the optimizer runs.
How much memory does the client submitting the job have?
How do you
Hello all,
I have a pretty complicated plan file using the Flink Python API running on
a AWS EMR cluster of m3.xlarge instances using YARN. The plan is for a
dictionary learning algorithm and has to run a sequence of operations many
times; each sequence involves bulk iterations with join operation