Hi Hanan, you're right that currently every time you submit a job to the Flink cluster, all user code jars are uploaded and overwrite possibly existing files. This is not really necessary if they don't change. Maybe we should add a check that already existing files on the JobManager are not uploaded again by the JobClient. This should improve the performance for your use case.
The corresponding JIRA issue is https://issues.apache.org/jira/browse/FLINK-2760. Cheers, Till On Thu, Sep 24, 2015 at 1:31 PM, Hanan Meyer <ha...@scalabill.it> wrote: > Hello All > > I use Flink in order to filter data from Hdfs and write it back as CSV. > > I keep getting the "Checking and uploading JAR files" on every DataSet > filtering action or > executionEnvironment execution. > > I use ExecutionEnvironment.createRemoteEnvironment(ip+jars..) because I > launch Flink from > a J2EE Aplication Server . > > The Jars serialization and transportation takes a huge part of the > execution time . > Is there a way to force Flink to pass the Jars only once? > > Please advise > > Thanks, > > Hanan Meyer >