Hi Marcelo,

Thanks for looking into it. I have opened a jira for this:

https://issues.apache.org/jira/browse/SPARK-21494

And yes, it works fine with internal shuffle service. But for our system we
have external shuffle/dynamic allocation configured by default. We wanted
to try switching from the standard SASL/3DES to the new AES based
authentication.

Thanks !

On Thu, Jul 20, 2017 at 4:32 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Also, things seem to work with all your settings if you disable use of
> the shuffle service (which also means no dynamic allocation), if that
> helps you make progress in what you wanted to do.
>
> On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> > Hmm... I tried this with the new shuffle service (I generally have an
> > old one running) and also see failures. I also noticed some odd things
> > in your logs that I'm also seeing in mine, but it's better to track
> > these in a bug instead of e-mail.
> >
> > Please file a bug and attach your logs there, I'll take a look at this.
> >
> > On Thu, Jul 20, 2017 at 2:06 PM, Udit Mehrotra
> > <udit.mehrotr...@gmail.com> wrote:
> >> Hi Marcelo,
> >>
> >> I ran with setting DEBUG level logging for 'org.apache.spark.network.
> crypto'
> >> for both Spark and Yarn.
> >>
> >> However, the DEBUG logs still do not convey anything meaningful. Please
> find
> >> it attached. Can you please take a quick look, and let me know if you
> see
> >> anything suspicious ?
> >>
> >> If not, do you think I should open a JIRA for this ?
> >>
> >> Thanks !
> >>
> >> On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>>
> >>> Hmm... that's not enough info and logs are intentionally kept silent
> >>> to avoid flooding, but if you enable DEBUG level logging for
> >>> org.apache.spark.network.crypto in both YARN and the Spark app, that
> >>> might provide more info.
> >>>
> >>> On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra
> >>> <udit.mehrotr...@gmail.com> wrote:
> >>> > So I added these settings in yarn-site.xml as well. Now I get a
> >>> > completely
> >>> > different error, but atleast it seems like it is using the crypto
> >>> > library:
> >>> >
> >>> > ExecutorLostFailure (executor 1 exited caused by one of the running
> >>> > tasks)
> >>> > Reason: Unable to create executor due to Unable to register with
> >>> > external
> >>> > shuffle server due to : java.lang.IllegalArgumentException:
> >>> > Authentication
> >>> > failed.
> >>> >     at
> >>> >
> >>> > org.apache.spark.network.crypto.AuthRpcHandler.receive(
> AuthRpcHandler.java:125)
> >>> >     at
> >>> >
> >>> > org.apache.spark.network.server.TransportRequestHandler.
> processRpcRequest(TransportRequestHandler.java:157)
> >>> >     at
> >>> >
> >>> > org.apache.spark.network.server.TransportRequestHandler.handle(
> TransportRequestHandler.java:105)
> >>> >     at
> >>> >
> >>> > org.apache.spark.network.server.TransportChannelHandler.channelRead(
> TransportChannelHandler.java:118)
> >>> >
> >>> > Any clue about this ?
> >>> >
> >>> >
> >>> > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin <van...@cloudera.com
> >
> >>> > wrote:
> >>> >>
> >>> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra
> >>> >> <udit.mehrotr...@gmail.com> wrote:
> >>> >> > Is there any additional configuration I need for external shuffle
> >>> >> > besides
> >>> >> > setting the following:
> >>> >> > spark.network.crypto.enabled true
> >>> >> > spark.network.crypto.saslFallback false
> >>> >> > spark.authenticate               true
> >>> >>
> >>> >> Have you set these options on the shuffle service configuration too
> >>> >> (which is the YARN xml config file, not spark-defaults.conf)?
> >>> >>
> >>> >> If you have there might be an issue, and you should probably file a
> >>> >> bug and include your NM's log file.
> >>> >>
> >>> >> --
> >>> >> Marcelo
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Marcelo
> >>
> >>
> >
> >
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>

Reply via email to