Re: Flink on EMR Question

Ufuk Celebi Wed, 06 Jan 2016 03:08:47 -0800

@Stephan: It was added to the dependency management section in order to enforce 
a higher version for S3 client, because it was causing problems earlier.


> On 06 Jan 2016, at 11:14, Chiwan Park <[email protected]> wrote:
> 
> Great! Thanks for addressing!
> 
>> On Jan 6, 2016, at 5:51 PM, Stephan Ewen <[email protected]> wrote:
>> 
>> At a first look, I think that "flink-runtime" does not need Apache 
>> Httpclient at all. I'll try to simply remove that dependency...
>> 
>> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park <[email protected]> wrote:
>> Hi,
>> 
>> Thanks for answering me!
>> 
>> It is happy to hear the problem will be addressed. :)
>> 
>> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file 
>> system api implemented by Amazon uses 4.3.x. There are some API changes, so 
>> NoSuchMethodError exception occurs.
>> 
>>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen <[email protected]> wrote:
>>> 
>>> Hi!
>>> 
>>> Concerning (1) We have seen that a few times. The JVMs / Threads do 
>>> sometimes not properly exit in a graceful way, and YARN is not always able 
>>> to kill the process (YARN bug). I am currently working on a refactoring of 
>>> the YARN resource manager (to allow to easy addition of other frameworks) 
>>> and have addressed this as part of that. Will be in the master in a bit.
>>> 
>>> Concerning (2) Do you know which component in Flink uses the HTTP client?
>>> 
>>> Greetings,
>>> Stephan
>>> 
>>> 
>>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode 
>>> <[email protected]> wrote:
>>> Hi everyone,
>>> 
>>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a 
>>> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via 
>>> `stop`, one container seems to linger around. `yarn application -list` is 
>>> empty, whereas `bin/yarn-session.sh -q` lists the left-over container. 
>>> Also, there is still one application shown as ‚running‘ in Ambari’s YARN 
>>> pane under current applications. Then, after some time (order of a few 
>>> minutes) it disappears and the resources are available again.
>>> 
>>> I have not tested this behavior extensibly so far. Noticeably, I was not 
>>> able to reproduce it by just starting a session and then ending it again 
>>> right away without looking at the JobManager web interface. Maybe this 
>>> produces some kind of lag as far as YARN containers are concerned?
>>> 
>>> Cheers,
>>> Max
>>> 
>>>> Am 04.01.2016 um 12:52 schrieb Chiwan Park <[email protected]>:
>>>> 
>>>> Hi All,
>>>> 
>>>> I have some problems using Flink on Amazon EMR cluster.
>>>> 
>>>> Q1. Sometimes, jobmanager container still exists after destroying yarn 
>>>> session by pressing Ctrl+C. In that case, Flink YARN app seems exited 
>>>> correctly in YARN RM dashboard. But there is a running container in the 
>>>> dashboard. From logs of the container, I realize that the container is 
>>>> jobmanager.
>>>> 
>>>> I cannot kill the container because there is no permission to restart YARN 
>>>> RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem 
>>>> doesn’t appear.
>>>> 
>>>> Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it 
>>>> because of version conflict of Apache Httpclient. In default, 
>>>> implementation of S3 file system in EMR is 
>>>> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with 
>>>> other version of Apache Httpclient.
>>>> 
>>>> As I wrote above, I cannot restart Hadoop cluster after modifying 
>>>> conf-site.xml because of lack of permission. How can I solve this problem?
>>>> 
>>>> Regards,
>>>> Chiwan Park
>>>> 
>>>> 
>> 
>> Regards,
>> Chiwan Park
> 
> Regards,
> Chiwan Park
> 
>

Re: Flink on EMR Question

Reply via email to