Re: [ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-05 Thread Jungtaek Lim
Thanks Andrew for reporting this. I just submitted the fix.
https://github.com/apache/spark/pull/24304

On Fri, Apr 5, 2019 at 3:21 PM Andrew Melo  wrote:

> Hello,
>
> I'm not sure if this is the proper place to report it, but the 2.4.1
> version of the config docs apparently didn't render right into HTML
> (scroll down to "Compression and Serialization")
>
> https://spark.apache.org/docs/2.4.1/configuration.html#available-properties
>
> By comparison, the 2.4.0 version of the docs renders correctly.
>
> Cheers
> Andrew
>
> On Fri, Apr 5, 2019 at 7:59 AM DB Tsai  wrote:
> >
> > +user list
> >
> > We are happy to announce the availability of Spark 2.4.1!
> >
> > Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4
> > maintenance branch of Spark. We strongly recommend all 2.4.0 users to
> > upgrade to this stable release.
> >
> > In Apache Spark 2.4.1, Scala 2.12 support is GA, and it's no longer
> > experimental.
> > We will drop Scala 2.11 support in Spark 3.0, so please provide us
> feedback.
> >
> > To download Spark 2.4.1, head over to the download page:
> > http://spark.apache.org/downloads.html
> >
> > To view the release notes:
> > https://spark.apache.org/releases/spark-release-2-4-1.html
> >
> > One more thing: to add a little color to this release, it's the
> > largest RC ever (RC9)!
> > We tried to incorporate many critical fixes at the last minute, and
> > hope you all enjoy it.
> >
> > We would like to acknowledge all community members for contributing to
> > this release. This release would not have been possible without you.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


Re: [ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-05 Thread Andrew Melo
On Fri, Apr 5, 2019 at 9:41 AM Jungtaek Lim  wrote:
>
> Thanks Andrew for reporting this. I just submitted the fix. 
> https://github.com/apache/spark/pull/24304

Thanks!

>
> On Fri, Apr 5, 2019 at 3:21 PM Andrew Melo  wrote:
>>
>> Hello,
>>
>> I'm not sure if this is the proper place to report it, but the 2.4.1
>> version of the config docs apparently didn't render right into HTML
>> (scroll down to "Compression and Serialization")
>>
>> https://spark.apache.org/docs/2.4.1/configuration.html#available-properties
>>
>> By comparison, the 2.4.0 version of the docs renders correctly.
>>
>> Cheers
>> Andrew
>>
>> On Fri, Apr 5, 2019 at 7:59 AM DB Tsai  wrote:
>> >
>> > +user list
>> >
>> > We are happy to announce the availability of Spark 2.4.1!
>> >
>> > Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4
>> > maintenance branch of Spark. We strongly recommend all 2.4.0 users to
>> > upgrade to this stable release.
>> >
>> > In Apache Spark 2.4.1, Scala 2.12 support is GA, and it's no longer
>> > experimental.
>> > We will drop Scala 2.11 support in Spark 3.0, so please provide us 
>> > feedback.
>> >
>> > To download Spark 2.4.1, head over to the download page:
>> > http://spark.apache.org/downloads.html
>> >
>> > To view the release notes:
>> > https://spark.apache.org/releases/spark-release-2-4-1.html
>> >
>> > One more thing: to add a little color to this release, it's the
>> > largest RC ever (RC9)!
>> > We tried to incorporate many critical fixes at the last minute, and
>> > hope you all enjoy it.
>> >
>> > We would like to acknowledge all community members for contributing to
>> > this release. This release would not have been possible without you.
>> >
>> > -
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread akirillov
Hi there! I'm trying to run Spark unit tests with the following profiles: 

And 'core' module fails with the following test failing with
NoClassDefFoundError: 

In the meantime building a distribution works fine when running: 

Also, there are no problems with running tests using Hadoop 2.7 profile.
Does this issue look familiar? Any help appreciated!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Anton Kirillov
Really sorry for the formatting. Here's the original message:

Hi there! I'm trying to run Spark unit tests with the following profiles:

./build/mvn test -Pmesos "-Phadoop-3.1" -Pnetlib-lgpl -Psparkr -Phive
-Phive-thriftserver

And 'core' module fails with the following test failing with
NoClassDefFoundError:

HadoopDelegationTokenManagerSuite:
- Correctly load default credential providers
- disable hive credential provider
- using deprecated configurations
- verify no credentials are obtained
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.conf.HiveConf
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:250)
  at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:173)
  at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
  at 
org.apache.spark.deploy.security.HiveDelegationTokenProvider$$anonfun$obtainDelegationTokens$2.apply$mcV$sp(HiveDelegationTokenProvider.scala:114)
  at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
  at 
org.apache.spark.deploy.security.HiveDelegationTokenProvider.obtainDelegationTokens(HiveDelegationTokenProvider.scala:113)
  at 
org.apache.spark.deploy.security.HadoopDelegationTokenManagerSuite$$anonfun$5.apply(HadoopDelegationTokenManagerSuite.scala:98)
  at 
org.apache.spark.deploy.security.HadoopDelegationTokenManagerSuite$$anonfun$5.apply(HadoopDelegationTokenManagerSuite.scala:90)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)

In the meantime building a distribution works fine when running:

./dev/make-distribution.sh --tgz -Pmesos "-Phadoop-3.1" -Pnetlib-lgpl
-Psparkr -Phive -Phive-thriftserver -DskipTests

Also, there are no problems with running tests using Hadoop 2.7 profile.
Does this issue look familiar? Any help appreciated!

On Fri, Apr 5, 2019 at 10:53 AM akirillov 
wrote:

> Hi there! I'm trying to run Spark unit tests with the following profiles:
>
> And 'core' module fails with the following test failing with
> NoClassDefFoundError:
>
> In the meantime building a distribution works fine when running:
>
> Also, there are no problems with running tests using Hadoop 2.7 profile.
> Does this issue look familiar? Any help appreciated!
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
The hadoop-3 profile doesn't really work yet, not even on master.
That's being worked on still.

On Fri, Apr 5, 2019 at 10:53 AM akirillov  wrote:
>
> Hi there! I'm trying to run Spark unit tests with the following profiles:
>
> And 'core' module fails with the following test failing with
> NoClassDefFoundError:
>
> In the meantime building a distribution works fine when running:
>
> Also, there are no problems with running tests using Hadoop 2.7 profile.
> Does this issue look familiar? Any help appreciated!
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Anton Kirillov
Marcelo, Sean, thanks for the clarification. So in order to support Hadoop
3+ the preferred way would be to use Hadoop-free builds and provide Hadoop
dependencies in the classpath, is that correct?

On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin  wrote:

> The hadoop-3 profile doesn't really work yet, not even on master.
> That's being worked on still.
>
> On Fri, Apr 5, 2019 at 10:53 AM akirillov 
> wrote:
> >
> > Hi there! I'm trying to run Spark unit tests with the following profiles:
> >
> > And 'core' module fails with the following test failing with
> > NoClassDefFoundError:
> >
> > In the meantime building a distribution works fine when running:
> >
> > Also, there are no problems with running tests using Hadoop 2.7 profile.
> > Does this issue look familiar? Any help appreciated!
> >
> >
> >
> > --
> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> --
> Marcelo
>


Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Sean Owen
Yes, you can try it, though I doubt that will 100% work. Have a look
at the "hadoop 3" JIRAs and PRs still in progress on master.

On Fri, Apr 5, 2019 at 1:14 PM Anton Kirillov
 wrote:
>
> Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ 
> the preferred way would be to use Hadoop-free builds and provide Hadoop 
> dependencies in the classpath, is that correct?
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
You can always try. But Hadoop 3 is not yet supported by Spark.

On Fri, Apr 5, 2019 at 11:13 AM Anton Kirillov
 wrote:
>
> Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ 
> the preferred way would be to use Hadoop-free builds and provide Hadoop 
> dependencies in the classpath, is that correct?
>
> On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin  wrote:
>>
>> The hadoop-3 profile doesn't really work yet, not even on master.
>> That's being worked on still.
>>
>> On Fri, Apr 5, 2019 at 10:53 AM akirillov  
>> wrote:
>> >
>> > Hi there! I'm trying to run Spark unit tests with the following profiles:
>> >
>> > And 'core' module fails with the following test failing with
>> > NoClassDefFoundError:
>> >
>> > In the meantime building a distribution works fine when running:
>> >
>> > Also, there are no problems with running tests using Hadoop 2.7 profile.
>> > Does this issue look familiar? Any help appreciated!
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Sean Owen
Hadoop 3 isn't supported yet, not quite even in master. I think the
profile there exists for testing at the moment.
Others may know a way that it can work but don't think it would out of the box.

On Fri, Apr 5, 2019 at 12:53 PM akirillov  wrote:
>
> Hi there! I'm trying to run Spark unit tests with the following profiles:
>
> And 'core' module fails with the following test failing with
> NoClassDefFoundError:
>
> In the meantime building a distribution works fine when running:
>
> Also, there are no problems with running tests using Hadoop 2.7 profile.
> Does this issue look familiar? Any help appreciated!
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Spark Columnar Processing

2019-04-05 Thread Bobby Evans
I just filed SPARK-27396 as the SPIP for this proposal.  Please use that
JIRA for further discussions.

Thanks for all of the feedback,

Bobby

On Wed, Apr 3, 2019 at 7:15 PM Bobby Evans  wrote:

> I am still working on the SPIP and should get it up in the next few days.
> I have the basic text more or less ready, but I want to get a high-level
> API concept ready too just to have something more concrete.  I have not
> really done much with contributing new features to spark so I am not sure
> where a design document really fits in here because from
> http://spark.apache.org/improvement-proposals.html and
> http://spark.apache.org/contributing.html it does not mention a design
> anywhere.  I am happy to put one up, but I was hoping the API concept would
> cover most of that.
>
> Thanks,
>
> Bobby
>
> On Tue, Apr 2, 2019 at 9:16 PM Renjie Liu  wrote:
>
>> Hi, Bobby:
>> Do you have design doc? I'm also interested in this topic and want to
>> help contribute.
>>
>> On Tue, Apr 2, 2019 at 10:00 PM Bobby Evans  wrote:
>>
>>> Thanks to everyone for the feedback.
>>>
>>> Overall the feedback has been really positive for exposing columnar as a
>>> processing option to users.  I'll write up a SPIP on the proposed changes
>>> to support columnar processing (not necessarily implement it) and then ping
>>> the list again for more feedback and discussion.
>>>
>>> Thanks again,
>>>
>>> Bobby
>>>
>>> On Mon, Apr 1, 2019 at 5:09 PM Reynold Xin  wrote:
>>>
 I just realized I didn't make it very clear my stance here ... here's
 another try:

 I think it's a no brainer to have a good columnar UDF interface. This
 would facilitate a lot of high performance applications, e.g. GPU-based
 accelerations for machine learning algorithms.

 On rewriting the entire internals of Spark SQL to leverage columnar
 processing, I don't see enough evidence to suggest that's a good idea yet.




 On Wed, Mar 27, 2019 at 8:10 AM, Bobby Evans  wrote:

> Kazuaki Ishizaki,
>
> Yes, ColumnarBatchScan does provide a framework for doing code
> generation for the processing of columnar data.  I have to admit that I
> don't have a deep understanding of the code generation piece, so if I get
> something wrong please correct me.  From what I had seen only input 
> formats
> currently inherent from ColumnarBatchScan, and from comments in the trait
>
>   /**
>* Generate [[ColumnVector]] expressions for our parent to consume
> as rows.
>* This is called once per [[ColumnarBatch]].
>*/
>
> https://github.com/apache/spark/blob/956b52b1670985a67e49b938ac1499ae65c79f6e/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L42-L43
>
> It appears that ColumnarBatchScan is really only intended to pull out
> the data from the batch, and not to process that data in a columnar
> fashion.  The Loading stage that you mentioned.
>
> > The SIMDzation or GPUization capability depends on a compiler that
> translates native code from the code generated by the whole-stage codegen.
> To be able to support vectorized processing Hive stayed with pure java
> and let the JVM detect and do the SIMDzation of the code.  To make that
> happen they created loops to go through each element in a column and 
> remove
> all conditionals from the body of the loops.  To the best of my knowledge
> that would still require a separate code path like I am proposing to make
> the different processing phases generate code that the JVM can compile 
> down
> to SIMD instructions.  The generated code is full of null checks for each
> element which would prevent the operations we want.  Also, the 
> intermediate
> results are often stored in UnsafeRow instances.  This is really fast for
> row-based processing, but the complexity of how they work I believe would
> prevent the JVM from being able to vectorize the processing.  If you have 
> a
> better way to take java code and vectorize it we should put it into 
> OpenJDK
> instead of spark so everyone can benefit from it.
>
> Trying to compile directly from generated java code to something a GPU
> can process is something we are tackling but we decided to go a different
> route from what you proposed.  From talking with several compiler experts
> here at NVIDIA my understanding is that IBM in partnership with NVIDIA
> attempted in the past to extend the JVM to run at least partially on GPUs,
> but it was really difficult to get right, especially with how java does
> memory management and memory layout.
>
> To avoid that complexity we decided to split the JITing up into two
> separate pieces.  I didn't mention any of this before because this
> discussion was intended to just be around the memory layout support, and
> not GPU processing.  The fi