Intermediate Data Caching

2016-07-17 Thread Saliya Ekanayake
Hi, I am trying to understand what's the intermediate caching support in Flink. For example, when there's an iterative dataset what's being cached between iterations. Is there some documentation on this? Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Inform

Re:

2016-07-17 Thread Chesnay Schepler
Hello Chen, you can access the set configuration in your rich function like this: |public static final class Tokenizer extends RichFlatMapFunctionTuple2> { @Override public void flatMap(String value, Collector> out) { ParameterTool parameters = (ParameterTool) getRuntimeContext().getExecution

[no subject]

2016-07-17 Thread Chen Bekor
Hi, I Need some assistance - I’m trying to globally register arguments from my main function for further extraction on stream processing nodes. My code base is Scala: val env = StreamExecutionEnvironment.getExecutionEnvironment val parameterTool = ParameterTool.fromArgs(args) env.getConfig.set

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
well now i know what the problem could be. You are trying to execute a job on a cluster (== not local), but have set the local flag to true. env.execute(local=True) Due to this flag the files are only copied into the tmp directory of the node where you execute the plan, and are thus not a

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Geoffrey Mon
I haven't yet figured out how to write a Java job to test DistributedCache functionality between machines; I've only gotten worker nodes to create caches from local files (on the same worker nodes), rather than on files from the master node. The DistributedCache test I've been using (based on the D

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
Please also post the job you're trying to run. On 17.07.2016 08:43, Geoffrey Mon wrote: The Java program I used to test DistributedCache was faulty since it actually created the cache from files on the machine on which the program was running (i.e. the worker node). I tried implementing a clu

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
Does this mean the revised DistributedCache job run successfully? On 17.07.2016 08:43, Geoffrey Mon wrote: The Java program I used to test DistributedCache was faulty since it actually created the cache from files on the machine on which the program was running (i.e. the worker node). I tried