I think I'll probably end with submitting the job through YARN in order to
have a more standard approach :)

Thanks,
Flavio

On Wed, Sep 28, 2016 at 5:19 PM, Maximilian Michels <m...@apache.org> wrote:

> I meant that you simply keep the sampling jar on the machine where you
> want to sample. However, you mentioned that it is a requirement for it
> to be on the cluster.
>
> Cheers,
> Max
>
> On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier
> <pomperma...@okkam.it> wrote:
> > Hi max,
> > that's exactly what I was looking for. What do you mean for 'the best
> thing
> > is if you keep a local copy of your sampling jars and work directly with
> > them'?
> >
> > Best,
> > Flavio
> >
> > On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <m...@apache.org>
> wrote:
> >>
> >> Hi Flavio,
> >>
> >> This is not really possible at the moment. Though there is a workaround.
> >> You can create a dummy jar file (may be empty). Then you can use
> >>
> >> ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass
> >> /path/to/dummy.jar
> >>
> >> That way Flink will include your cluster jar and you can load all
> classes
> >> necessary.
> >>
> >> Alternatively, using the Remote Environment, this looks like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >>    final RemoteEnvironment env = new RemoteEnvironment(
> >>       "remoteHost",
> >>       6123,
> >>       new Configuration(),
> >>       new String[0],
> >>       new URL[]{
> >>          new URL("file:///path/to/sample.jar"),
> >>          new
> >> URL("file:///Users/max/Dev/flink/build-target/lib/flink-
> dist_2.10-1.2-SNAPSHOT.jar")});
> >>    URLClassLoader classLoader = new
> >> URLClassLoader(env.globalClasspaths.toArray(new URL[0]));
> >>
> >>    Class<?> clazz =
> >> classLoader.loadClass("org.package.sample.SampleClass");
> >>
> >>    Method main = clazz.getDeclaredMethod("sampleMethod",
> >> ExecutionEnvironment.class);
> >>
> >>    // pass environment as an argument to your sample method
> >>    // the method should return the results of the execution
> >>    Object sampleResult = main.invoke(null, env);
> >> }
> >>
> >>
> >> Beware, this is extremely hacky. We should have a better way to invoke
> jar
> >> files remotely. Honestly, the best thing is if you keep a local copy of
> your
> >> sampling jars and work directly with them.
> >>
> >> Cheers,
> >> Max
> >>
> >> On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier
> >> <pomperma...@okkam.it> wrote:
> >>>
> >>> Hi Max,
> >>> actually I have a jar containing sampling jobs and I need to collect
> >>> results from a client.
> >>> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I
> fear
> >>> that it's not the right way to do that because
> >>> I just need to tell the cluster the main class and the parameters to
> run
> >>> the job (and where the jar file is on HDFS).
> >>>
> >>> Best,
> >>> Flavio
> >>>
> >>> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <m...@apache.org>
> >>> wrote:
> >>>>
> >>>> Hi Flavio,
> >>>>
> >>>> Do you want to sample from a running batch job? That would be like
> >>>> Queryable State in streaming jobs but it is not supported in batch
> >>>> mode.
> >>>>
> >>>> Cheers,
> >>>> Max
> >>>>
> >>>>
> >>>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier
> >>>> <pomperma...@okkam.it> wrote:
> >>>> > Hi to all,
> >>>> >
> >>>> > I have a use case where I need to tell a Flink cluster to give me a
> >>>> > sample
> >>>> > of X records using parametrizable sampling functions. Is there any
> >>>> > best
> >>>> > practice or advice to do that?
> >>>> >
> >>>> > Should I create a Remote ExecutionEnvironment or should I use the
> >>>> > Flink
> >>>> > client (I don't know if it uses REST services or RPC or whatever)?
> >>>> > Is there any java snippet for that?
> >>>> >
> >>>> > Best,
> >>>> > Flavio
> >>>> >
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
>

Reply via email to