I meant that you simply keep the sampling jar on the machine where you
want to sample. However, you mentioned that it is a requirement for it
to be on the cluster.

Cheers,
Max

On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier
<pomperma...@okkam.it> wrote:
> Hi max,
> that's exactly what I was looking for. What do you mean for 'the best thing
> is if you keep a local copy of your sampling jars and work directly with
> them'?
>
> Best,
> Flavio
>
> On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <m...@apache.org> wrote:
>>
>> Hi Flavio,
>>
>> This is not really possible at the moment. Though there is a workaround.
>> You can create a dummy jar file (may be empty). Then you can use
>>
>> ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass
>> /path/to/dummy.jar
>>
>> That way Flink will include your cluster jar and you can load all classes
>> necessary.
>>
>> Alternatively, using the Remote Environment, this looks like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>>    final RemoteEnvironment env = new RemoteEnvironment(
>>       "remoteHost",
>>       6123,
>>       new Configuration(),
>>       new String[0],
>>       new URL[]{
>>          new URL("file:///path/to/sample.jar"),
>>          new
>> URL("file:///Users/max/Dev/flink/build-target/lib/flink-dist_2.10-1.2-SNAPSHOT.jar")});
>>    URLClassLoader classLoader = new
>> URLClassLoader(env.globalClasspaths.toArray(new URL[0]));
>>
>>    Class<?> clazz =
>> classLoader.loadClass("org.package.sample.SampleClass");
>>
>>    Method main = clazz.getDeclaredMethod("sampleMethod",
>> ExecutionEnvironment.class);
>>
>>    // pass environment as an argument to your sample method
>>    // the method should return the results of the execution
>>    Object sampleResult = main.invoke(null, env);
>> }
>>
>>
>> Beware, this is extremely hacky. We should have a better way to invoke jar
>> files remotely. Honestly, the best thing is if you keep a local copy of your
>> sampling jars and work directly with them.
>>
>> Cheers,
>> Max
>>
>> On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier
>> <pomperma...@okkam.it> wrote:
>>>
>>> Hi Max,
>>> actually I have a jar containing sampling jobs and I need to collect
>>> results from a client.
>>> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear
>>> that it's not the right way to do that because
>>> I just need to tell the cluster the main class and the parameters to run
>>> the job (and where the jar file is on HDFS).
>>>
>>> Best,
>>> Flavio
>>>
>>> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <m...@apache.org>
>>> wrote:
>>>>
>>>> Hi Flavio,
>>>>
>>>> Do you want to sample from a running batch job? That would be like
>>>> Queryable State in streaming jobs but it is not supported in batch
>>>> mode.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>>
>>>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier
>>>> <pomperma...@okkam.it> wrote:
>>>> > Hi to all,
>>>> >
>>>> > I have a use case where I need to tell a Flink cluster to give me a
>>>> > sample
>>>> > of X records using parametrizable sampling functions. Is there any
>>>> > best
>>>> > practice or advice to do that?
>>>> >
>>>> > Should I create a Remote ExecutionEnvironment or should I use the
>>>> > Flink
>>>> > client (I don't know if it uses REST services or RPC or whatever)?
>>>> > Is there any java snippet for that?
>>>> >
>>>> > Best,
>>>> > Flavio
>>>> >
>>>
>>>
>>>
>>>
>>
>
>

Reply via email to