Re: Best way to trigger dataset sampling

2016-09-28 Thread Flavio Pompermaier
I think I'll probably end with submitting the job through YARN in order to have a more standard approach :) Thanks, Flavio On Wed, Sep 28, 2016 at 5:19 PM, Maximilian Michels wrote: > I meant that you simply keep the sampling jar on the machine where you > want to sample. However, you mentioned

Re: Best way to trigger dataset sampling

2016-09-28 Thread Maximilian Michels
I meant that you simply keep the sampling jar on the machine where you want to sample. However, you mentioned that it is a requirement for it to be on the cluster. Cheers, Max On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier wrote: > Hi max, > that's exactly what I was looking for. What do yo

Re: Best way to trigger dataset sampling

2016-09-27 Thread Flavio Pompermaier
Hi max, that's exactly what I was looking for. What do you mean for 'the best thing is if you keep a local copy of your sampling jars and work directly with them'? Best, Flavio On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels wrote: > Hi Flavio, > > This is not really possible at the moment.

Re: Best way to trigger dataset sampling

2016-09-27 Thread Maximilian Michels
Hi Flavio, This is not really possible at the moment. Though there is a workaround. You can create a dummy jar file (may be empty). Then you can use ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass /path/to/dummy.jar That way Flink will include your cluster jar and you can l

Re: Best way to trigger dataset sampling

2016-09-27 Thread Flavio Pompermaier
Hi Max, actually I have a jar containing sampling jobs and I need to collect results from a client. I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear that it's not the right way to do that because I just need to tell the cluster the main class and the parameters to run the j

Re: Best way to trigger dataset sampling

2016-09-27 Thread Maximilian Michels
Hi Flavio, Do you want to sample from a running batch job? That would be like Queryable State in streaming jobs but it is not supported in batch mode. Cheers, Max On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier wrote: > Hi to all, > > I have a use case where I need to tell a Flink cluster

Best way to trigger dataset sampling

2016-09-26 Thread Flavio Pompermaier
Hi to all, I have a use case where I need to tell a Flink cluster to give me a sample of X records using parametrizable sampling functions. Is there any best practice or advice to do that? Should I create a Remote ExecutionEnvironment or should I use the Flink client (I don't know if it uses REST