Could this be related to https://issues.apache.org/jira/browse/SPARK-17733 ?
-- Original --
From: "Cheng Lian-3 [via Apache Spark Developers
List]";;
Send time: Thursday, Feb 23, 2017 9:43 AM
To: "Stan Zhai";
Subject: Re: The driver hangs at DataFrame.rdd in
Hi Spark developers,
Currently my team at Microsoft is extending Spark's machine learning
functionalities to include new learners and transformers. We would like users
to use these within spark pipelines so that they can mix and match with
existing Spark learners/transformers, and overall have
Hi all,
The driver hangs at DataFrame.rdd in Spark 2.1.0 when the DataFrame(SQL) is
complex, Following thread dump of my driver:
org.apache.spark.sql.catalyst.expressions.AttributeReference.equals(namedExpressions.scala:230)
org.apache.spark.sql.catalyst.expressions.IsNotNull.equals(nullExpr
Well, the issue I'm trying to solve is slow writing due to S3's
implementation of move as copy/delete. It seems like your S3 committers and
S3Guard both ameliorate that somewhat by parallelizing the copy. I assume
there's no better way to solve this issue without sacrificing safety. Even
if ther
Hi, The example that I provided is not very clear. And I add a more clear
example in jira.
Thanks
Cheers
Gen
On Wed, Feb 22, 2017 at 3:47 PM, gen tang wrote:
> Hi Kazuaki Ishizaki
>
> Thanks a lot for your help. It works. However, a more strange bug appears
> as follows:
>
> import org.apache.