Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-29 Thread Ufuk Celebi
Hey Aaron, I'm glad to hear that you resolved the issue. I think a docs contribution for this would be very helpful and could update this page: https://github.com/apache/flink/blob/master/docs/monitoring/debugging_classloading.md. If you want to create a separate JIRA ticket for this, ping me wi

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-29 Thread Aaron Levin
Hi Ufuk, I'll answer your question, but first I'll give you an update on how we resolved the issue: * adding `org.apache.hadoop.io.compress.SnappyCodec` to `classloader.parent-first-patterns.additional` in `flink-conf.yaml` (though, putting `org.apache.hadoop.util.NativeCodeLoader` also worked) *

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-28 Thread Ufuk Celebi
Hey Aaron, sorry for the late reply (again). (1) I think that your final result is in line with what I have reproduced in https://issues.apache.org/jira/browse/FLINK-11402. (2) I think renaming the file would not help as it will still be loaded multiple times when the jobs restarts (as it happen

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-25 Thread Aaron Levin
I don't control the code calling `System.loadLibrary("hadoop")` so that's not an option for me, unfortunately. On Thu, Jan 24, 2019 at 7:47 PM Guowei Ma wrote: > This may be caused by a jvm process can only load a so once.So a triky > way is to rename it。 > > 发自我的 iPhone > > 在 2019年1月25日,上午7:12

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-24 Thread Guowei Ma
This may be caused by a jvm process can only load a so once.So a triky way is to rename it。 发自我的 iPhone > 在 2019年1月25日,上午7:12,Aaron Levin 写道: > > Hi Ufuk, > > Update: I've pinned down the issue. It's multiple classloaders loading > `libhadoop.so`: > > ``` > failed to load native hadoop wit

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-24 Thread Aaron Levin
Hi Ufuk, Update: I've pinned down the issue. It's multiple classloaders loading `libhadoop.so`: ``` failed to load native hadoop with error: java.lang.UnsatisfiedLinkError: Native Library /usr/lib/libhadoop.so already loaded in another classloader ``` I'm not quite sure what the solution is. Ide

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-24 Thread Aaron Levin
Hi Ufuk, I'm starting to believe the bug is much deeper than the originally reported error because putting the libraries in `/usr/lib` or `/lib` does not work. This morning I dug into why putting `libhadoop.so` into `/usr/lib` didn't work, despite that being in the `java.library.path` at the call

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, One more update: I tried copying all the hadoop native `.so` files (mainly `libhadoop.so`) into `/lib` and am I still experiencing the issue I reported. I also tried naively adding the `.so` files to the jar with the flink application and am still experiencing the issue I reported (howeve

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, Two updates: 1. As suggested in the ticket, I naively copied the every `.so` in `hadoop-3.0.0/lib/native/` into `/lib/` and this did not seem to help. My knowledge of how shared libs get picked up is hazy, so I'm not sure if blindly copying them like that should work. I did check what `S

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-22 Thread Aaron Levin
Hey Ufuk, So, I looked into this a little bit: 1. clarification: my issues are with the hadoop-related snappy libraries and not libsnappy itself (this is my bad for not being clearer, sorry!). I already have `libsnappy` on my classpath, but I am looking into including the hadoop snappy libraries.

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-22 Thread Aaron Levin
Hey, Thanks so much for the help! This is awesome. I'll start looking into all of this right away and report back. Best, Aaron Levin On Mon, Jan 21, 2019 at 5:16 PM Ufuk Celebi wrote: > Hey Aaron, > > sorry for the late reply. > > (1) I think I was able to reproduce this issue using snappy-ja

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-21 Thread Ufuk Celebi
Hey Aaron, sorry for the late reply. (1) I think I was able to reproduce this issue using snappy-java. I've filed a ticket here: https://issues.apache.org/jira/browse/FLINK-11402. Can you check the ticket description whether it's in line with what you are experiencing? Most importantly, do you se

`env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-17 Thread Aaron Levin
Hello! *tl;dr*: settings in `env.java.opts` seem to stop having impact when a job is canceled or fails and then is restarted (with or without savepoint/checkpoints). If I restart the task-managers, the `env.java.opts` seem to start having impact again and our job will run without failure. More bel