Re: Does anyone know how to build spark with scala12.4?

2017-11-29 Thread Sean Owen
No, you have to run ./dev/change-scala-version.sh 2.12 before building for 2.12. That makes all the necessary POM changes. On Wed, Nov 29, 2017 at 8:11 PM Zhang, Liyun wrote: > Hi Sean: > > I have tried to use following script to build package but have problem( > I am building a spark package

RE: Does anyone know how to build spark with scala12.4?

2017-11-29 Thread Zhang, Liyun
Hi Sean: I have tried to use following script to build package but have problem( I am building a spark package for Hive on Spark, so use hadoop2-without-hive) ./dev/make-distribution.sh --name hadoop2-without-hive --tgz -Pscala-2.12 -Phadoop-2.7 -Pyarn -Pparquet-provided -Dhadoop.version=2.7

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Weichen Xu
+1 On Thu, Nov 30, 2017 at 6:27 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > +1 > > SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook. > > Thanks > Shivaram > > On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau > wrote: > >> +1 (non-binding) >> >> PySpar

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Shivaram Venkataraman
+1 SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook. Thanks Shivaram On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau wrote: > +1 (non-binding) > > PySpark install into a virtualenv works, PKG-INFO looks correctly > populated (mostly checking for the pypandoc conversion

Leveraging S3 select

2017-11-29 Thread Lalwani, Jayesh
AWS announced at re:Invent that they are launching S3 Select. This can allow Spark to push down predicates to S3, rather than read the entire file in memory. Are there any plans to update Spark to use S3 Select? The information contained i

Request for review of SPARK-22599

2017-11-29 Thread Nan Zhu
Hi, all When we do perf test for Spark, we found that enabling table cache does not bring the expected speedup comparing to cloud-storage + parquet in many scenarios. We identified that the performance cost is brought by the fact that the current InMemoryRelation/InMemorytTableScanExec will traver

Re: CrossValidation distribution - is it in the roadmap?

2017-11-29 Thread Nick Pentreath
Hi Tomasz Parallel evaluation for CrossValidation and TrainValidationSplit was added for Spark 2.3 in https://issues.apache.org/jira/browse/SPARK-19357 On Wed, 29 Nov 2017 at 16:31 Tomasz Dudek wrote: > Hey, > > is there a way to make the following code: > > val paramGrid = new ParamGridBuilde

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Holden Karau
+1 (non-binding) PySpark install into a virtualenv works, PKG-INFO looks correctly populated (mostly checking for the pypandoc conversion there). Thanks for your hard work Felix (and all of the testers :)) :) On Wed, Nov 29, 2017 at 9:33 AM, Wenchen Fan wrote: > +1 > > On Thu, Nov 30, 2017 at

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Wenchen Fan
+1 On Thu, Nov 30, 2017 at 1:28 AM, Kazuaki Ishizaki wrote: > +1 (non-binding) > > I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for > core/sql-core/sql-catalyst/mllib/mllib-local have passed. > > $ java -version > openjdk version "1.8.0_131" > OpenJDK Runtime Environment

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Kazuaki Ishizaki
+1 (non-binding) I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for core/sql-core/sql-catalyst/mllib/mllib-local have passed. $ java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) OpenJDK 64-Bit Server VM

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Mridul Muralidharan
We do support running on Apache Mesos via docker images - so this would not be restricted to k8s. But unlike mesos support, which has other modes of running, I believe k8s support more heavily depends on availability of docker images. Regards, Mridul On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Sean Owen
Would it be logical to provide Docker-based distributions of other pieces of Spark? or is this specific to K8S? The problem is we wouldn't generally also provide a distribution of Spark for the reasons you give, because if that, then why not RPMs and so on. On Wed, Nov 29, 2017 at 10:41 AM Anirudh

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Anirudh Ramanathan
In this context, I think the docker images are similar to the binaries rather than an extension. It's packaging the compiled distribution to save people the effort of building one themselves, akin to binaries or the python package. For reference, this is the base dockerfile

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Mark Hamstra
It's probably also worth considering whether there is only one, well-defined, correct way to create such an image or whether this is a reasonable avenue for customization. Part of why we don't do something like maintain and publish canonical Debian packages for Spark is because different organizati

Re: Spark.ml roadmap 2.3.0 and beyond

2017-11-29 Thread Stephen Boesch
There are several JIRA's and/or PR's that contain logic the Data Science teams that I work with use in their local models. We are trying to determine if/when these features may gain traction again. In at least one case all of the work were done but the shepherd said that getting it committed were

Re: Spark.ml roadmap 2.3.0 and beyond

2017-11-29 Thread Stephen Boesch
Any further information/ thoughts? 2017-11-22 15:07 GMT-08:00 Stephen Boesch : > The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available: > > 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813 > > 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581 > .. > > It seems those r

CrossValidation distribution - is it in the roadmap?

2017-11-29 Thread Tomasz Dudek
Hey, is there a way to make the following code: val paramGrid = new ParamGridBuilder().//omitted for brevity - lets say we have hundreds of param combinations here val cv = new CrossValidator().setNumFolds(3).setEstimator(pipeline).setEstimatorParamMaps(paramGrid) automatically distribute itsel

Re: [build system] power outage @ berkeley, again. jenkins offline ~2-6am nov 29th

2017-11-29 Thread shane knapp
this maintenance was cancelled last night, and will take place some time in 2018. i'll be sure to update the everyone when i get more information. On Tue, Nov 28, 2017 at 11:53 AM, shane knapp wrote: > more electrical repairs need to be done on the high voltage leads to our > building, and we w

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Sean Owen
Source code is the primary release; compiled binary releases are conveniences that are also released. A docker image sounds fairly different though. To the extent it's the standard delivery mechanism for some artifact (think: pyspark on PyPI as well) that makes sense, but is that the situation? if

Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Anirudh Ramanathan
Hi all, We're all working towards the Kubernetes scheduler backend (full steam ahead!) that's targeted towards Spark 2.3. One of the questions that comes up often is docker images. While we're making available dockerfiles to allow people to create their own docker images from source, ideally, we'