Hi!
HDFS is mentioned in the docs but not explicitly listed as a requirement:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/python.html#project-setup
I suppose the Python API could also distribute its libraries through
Flink's BlobServer.
Cheers,
Max
On Tue, Jul 19, 2016 at
Glad to hear it! The HDFS requirement should most definitely be
documented; i assumed it already was actually...
On 19.07.2016 03:42, Geoffrey Mon wrote:
Hello Chesnay,
Thank you very much! With your help I've managed to set up a Flink
cluster that can run Python jobs successfully. I solved m
Hello Chesnay,
Thank you very much! With your help I've managed to set up a Flink cluster
that can run Python jobs successfully. I solved my issue by removing
local=True and installing HDFS in a separate cluster.
I don't think it was clearly mentioned in the documentation that HDFS was
required f
well now i know what the problem could be.
You are trying to execute a job on a cluster (== not local), but have
set the local flag to true.
env.execute(local=True)
Due to this flag the files are only copied into the tmp directory of the
node where you execute the plan, and are thus not a
I haven't yet figured out how to write a Java job to test DistributedCache
functionality between machines; I've only gotten worker nodes to create
caches from local files (on the same worker nodes), rather than on files
from the master node. The DistributedCache test I've been using (based on
the D
Please also post the job you're trying to run.
On 17.07.2016 08:43, Geoffrey Mon wrote:
The Java program I used to test DistributedCache was faulty since it
actually created the cache from files on the machine on which the
program was running (i.e. the worker node).
I tried implementing a clu
Does this mean the revised DistributedCache job run successfully?
On 17.07.2016 08:43, Geoffrey Mon wrote:
The Java program I used to test DistributedCache was faulty since it
actually created the cache from files on the machine on which the
program was running (i.e. the worker node).
I tried
The Java program I used to test DistributedCache was faulty since it
actually created the cache from files on the machine on which the program
was running (i.e. the worker node).
I tried implementing a cluster again, this time using two actual machines
instead of virtual machines. I found the same
I wrote a simple Java plan that reads a file in the distributed cache and
uses the first line from that file in a map operation. Sure enough, it
works locally, but fails when the job is sent to a taskmanager on a worker
node. Since DistributedCache seems to work for everyone else, I'm thinking
that
Could you write a java job that uses the Distributed cache to distribute
files?
If this fails then the DC is faulty, if it doesn't something in the
Python API is wrong.
On 15.07.2016 08:06, Geoffrey Mon wrote:
I've come across similar issues when trying to set up Flink on Amazon
EC2 instance
I've come across similar issues when trying to set up Flink on Amazon EC2
instances. Presumably there is something wrong with my setup? Here is the
flink-conf.yaml I am using:
https://gist.githubusercontent.com/GEOFBOT/3ffc9b21214174ae750cc3fdb2625b71/raw/flink-conf.yaml
Thanks,
Geoffrey
On Wed,
Hello,
Here is the TaskManager log on pastebin:
http://pastebin.com/XAJ56gn4
I will look into whether the files were created.
By the way, the cluster is made with virtual machines running on BlueData
EPIC. I don't know if that might be related to the problem.
Thanks,
Geoffrey
On Wed, Jul 13, 2
Hello Geoffrey,
How often does this occur?
Flink distributes the user-code and the python library using the
Distributed Cache.
Either the file is deleted right after being created for some reason, or
the DC returns a file name before the file was created (which shouldn't
happen, it should b
Hello all,
I've set up Flink on a very small cluster of one master node and five
worker nodes, following the instructions in the documentation (
https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html).
I can run the included examples like WordCount and PageRank across the
14 matches
Mail list logo