I meant that you simply keep the sampling jar on the machine where you want to sample. However, you mentioned that it is a requirement for it to be on the cluster.
Cheers, Max On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi max, > that's exactly what I was looking for. What do you mean for 'the best thing > is if you keep a local copy of your sampling jars and work directly with > them'? > > Best, > Flavio > > On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Flavio, >> >> This is not really possible at the moment. Though there is a workaround. >> You can create a dummy jar file (may be empty). Then you can use >> >> ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass >> /path/to/dummy.jar >> >> That way Flink will include your cluster jar and you can load all classes >> necessary. >> >> Alternatively, using the Remote Environment, this looks like this: >> >> public static void main(String[] args) throws Exception { >> >> final RemoteEnvironment env = new RemoteEnvironment( >> "remoteHost", >> 6123, >> new Configuration(), >> new String[0], >> new URL[]{ >> new URL("file:///path/to/sample.jar"), >> new >> URL("file:///Users/max/Dev/flink/build-target/lib/flink-dist_2.10-1.2-SNAPSHOT.jar")}); >> URLClassLoader classLoader = new >> URLClassLoader(env.globalClasspaths.toArray(new URL[0])); >> >> Class<?> clazz = >> classLoader.loadClass("org.package.sample.SampleClass"); >> >> Method main = clazz.getDeclaredMethod("sampleMethod", >> ExecutionEnvironment.class); >> >> // pass environment as an argument to your sample method >> // the method should return the results of the execution >> Object sampleResult = main.invoke(null, env); >> } >> >> >> Beware, this is extremely hacky. We should have a better way to invoke jar >> files remotely. Honestly, the best thing is if you keep a local copy of your >> sampling jars and work directly with them. >> >> Cheers, >> Max >> >> On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier >> <pomperma...@okkam.it> wrote: >>> >>> Hi Max, >>> actually I have a jar containing sampling jobs and I need to collect >>> results from a client. >>> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear >>> that it's not the right way to do that because >>> I just need to tell the cluster the main class and the parameters to run >>> the job (and where the jar file is on HDFS). >>> >>> Best, >>> Flavio >>> >>> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <m...@apache.org> >>> wrote: >>>> >>>> Hi Flavio, >>>> >>>> Do you want to sample from a running batch job? That would be like >>>> Queryable State in streaming jobs but it is not supported in batch >>>> mode. >>>> >>>> Cheers, >>>> Max >>>> >>>> >>>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier >>>> <pomperma...@okkam.it> wrote: >>>> > Hi to all, >>>> > >>>> > I have a use case where I need to tell a Flink cluster to give me a >>>> > sample >>>> > of X records using parametrizable sampling functions. Is there any >>>> > best >>>> > practice or advice to do that? >>>> > >>>> > Should I create a Remote ExecutionEnvironment or should I use the >>>> > Flink >>>> > client (I don't know if it uses REST services or RPC or whatever)? >>>> > Is there any java snippet for that? >>>> > >>>> > Best, >>>> > Flavio >>>> > >>> >>> >>> >>> >> > >