Hi Marcelo, Thanks for looking into it. I have opened a jira for this:
https://issues.apache.org/jira/browse/SPARK-21494 And yes, it works fine with internal shuffle service. But for our system we have external shuffle/dynamic allocation configured by default. We wanted to try switching from the standard SASL/3DES to the new AES based authentication. Thanks ! On Thu, Jul 20, 2017 at 4:32 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Also, things seem to work with all your settings if you disable use of > the shuffle service (which also means no dynamic allocation), if that > helps you make progress in what you wanted to do. > > On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > > Hmm... I tried this with the new shuffle service (I generally have an > > old one running) and also see failures. I also noticed some odd things > > in your logs that I'm also seeing in mine, but it's better to track > > these in a bug instead of e-mail. > > > > Please file a bug and attach your logs there, I'll take a look at this. > > > > On Thu, Jul 20, 2017 at 2:06 PM, Udit Mehrotra > > <udit.mehrotr...@gmail.com> wrote: > >> Hi Marcelo, > >> > >> I ran with setting DEBUG level logging for 'org.apache.spark.network. > crypto' > >> for both Spark and Yarn. > >> > >> However, the DEBUG logs still do not convey anything meaningful. Please > find > >> it attached. Can you please take a quick look, and let me know if you > see > >> anything suspicious ? > >> > >> If not, do you think I should open a JIRA for this ? > >> > >> Thanks ! > >> > >> On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >>> > >>> Hmm... that's not enough info and logs are intentionally kept silent > >>> to avoid flooding, but if you enable DEBUG level logging for > >>> org.apache.spark.network.crypto in both YARN and the Spark app, that > >>> might provide more info. > >>> > >>> On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra > >>> <udit.mehrotr...@gmail.com> wrote: > >>> > So I added these settings in yarn-site.xml as well. Now I get a > >>> > completely > >>> > different error, but atleast it seems like it is using the crypto > >>> > library: > >>> > > >>> > ExecutorLostFailure (executor 1 exited caused by one of the running > >>> > tasks) > >>> > Reason: Unable to create executor due to Unable to register with > >>> > external > >>> > shuffle server due to : java.lang.IllegalArgumentException: > >>> > Authentication > >>> > failed. > >>> > at > >>> > > >>> > org.apache.spark.network.crypto.AuthRpcHandler.receive( > AuthRpcHandler.java:125) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportRequestHandler. > processRpcRequest(TransportRequestHandler.java:157) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportRequestHandler.handle( > TransportRequestHandler.java:105) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportChannelHandler.channelRead( > TransportChannelHandler.java:118) > >>> > > >>> > Any clue about this ? > >>> > > >>> > > >>> > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin <van...@cloudera.com > > > >>> > wrote: > >>> >> > >>> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra > >>> >> <udit.mehrotr...@gmail.com> wrote: > >>> >> > Is there any additional configuration I need for external shuffle > >>> >> > besides > >>> >> > setting the following: > >>> >> > spark.network.crypto.enabled true > >>> >> > spark.network.crypto.saslFallback false > >>> >> > spark.authenticate true > >>> >> > >>> >> Have you set these options on the shuffle service configuration too > >>> >> (which is the YARN xml config file, not spark-defaults.conf)? > >>> >> > >>> >> If you have there might be an issue, and you should probably file a > >>> >> bug and include your NM's log file. > >>> >> > >>> >> -- > >>> >> Marcelo > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Marcelo > >> > >> > > > > > > > > -- > > Marcelo > > > > -- > Marcelo >