I think I'll probably end with submitting the job through YARN in order to have a more standard approach :)
Thanks, Flavio On Wed, Sep 28, 2016 at 5:19 PM, Maximilian Michels <m...@apache.org> wrote: > I meant that you simply keep the sampling jar on the machine where you > want to sample. However, you mentioned that it is a requirement for it > to be on the cluster. > > Cheers, > Max > > On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier > <pomperma...@okkam.it> wrote: > > Hi max, > > that's exactly what I was looking for. What do you mean for 'the best > thing > > is if you keep a local copy of your sampling jars and work directly with > > them'? > > > > Best, > > Flavio > > > > On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <m...@apache.org> > wrote: > >> > >> Hi Flavio, > >> > >> This is not really possible at the moment. Though there is a workaround. > >> You can create a dummy jar file (may be empty). Then you can use > >> > >> ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass > >> /path/to/dummy.jar > >> > >> That way Flink will include your cluster jar and you can load all > classes > >> necessary. > >> > >> Alternatively, using the Remote Environment, this looks like this: > >> > >> public static void main(String[] args) throws Exception { > >> > >> final RemoteEnvironment env = new RemoteEnvironment( > >> "remoteHost", > >> 6123, > >> new Configuration(), > >> new String[0], > >> new URL[]{ > >> new URL("file:///path/to/sample.jar"), > >> new > >> URL("file:///Users/max/Dev/flink/build-target/lib/flink- > dist_2.10-1.2-SNAPSHOT.jar")}); > >> URLClassLoader classLoader = new > >> URLClassLoader(env.globalClasspaths.toArray(new URL[0])); > >> > >> Class<?> clazz = > >> classLoader.loadClass("org.package.sample.SampleClass"); > >> > >> Method main = clazz.getDeclaredMethod("sampleMethod", > >> ExecutionEnvironment.class); > >> > >> // pass environment as an argument to your sample method > >> // the method should return the results of the execution > >> Object sampleResult = main.invoke(null, env); > >> } > >> > >> > >> Beware, this is extremely hacky. We should have a better way to invoke > jar > >> files remotely. Honestly, the best thing is if you keep a local copy of > your > >> sampling jars and work directly with them. > >> > >> Cheers, > >> Max > >> > >> On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier > >> <pomperma...@okkam.it> wrote: > >>> > >>> Hi Max, > >>> actually I have a jar containing sampling jobs and I need to collect > >>> results from a client. > >>> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I > fear > >>> that it's not the right way to do that because > >>> I just need to tell the cluster the main class and the parameters to > run > >>> the job (and where the jar file is on HDFS). > >>> > >>> Best, > >>> Flavio > >>> > >>> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <m...@apache.org> > >>> wrote: > >>>> > >>>> Hi Flavio, > >>>> > >>>> Do you want to sample from a running batch job? That would be like > >>>> Queryable State in streaming jobs but it is not supported in batch > >>>> mode. > >>>> > >>>> Cheers, > >>>> Max > >>>> > >>>> > >>>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier > >>>> <pomperma...@okkam.it> wrote: > >>>> > Hi to all, > >>>> > > >>>> > I have a use case where I need to tell a Flink cluster to give me a > >>>> > sample > >>>> > of X records using parametrizable sampling functions. Is there any > >>>> > best > >>>> > practice or advice to do that? > >>>> > > >>>> > Should I create a Remote ExecutionEnvironment or should I use the > >>>> > Flink > >>>> > client (I don't know if it uses REST services or RPC or whatever)? > >>>> > Is there any java snippet for that? > >>>> > > >>>> > Best, > >>>> > Flavio > >>>> > > >>> > >>> > >>> > >>> > >> > > > > >