Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-23 Thread Yin Huai
-1 because of https://issues.apache.org/jira/browse/SPARK-16121. This jira was resolved after 2.0.0-RC1 was cut. Without the fix, Spark SQL effectively only uses the driver to list files when loading datasets and the driver-side file listing is very slow for datasets having many files and partitio

destroyPythonWorker job in PySpark

2016-06-23 Thread Krishna
Hi, I am running a PySpark app with 1000's of cores (partitions is a small multiple of # of cores) and the overall application performance is fine. However, I noticed that, at the end of the job, PySpark initiates job clean-up procedures and as part of this procedure, PySpark executes a job shown

Does CoarseGrainedSchedulerBackend care about cores only? And disregards memory?

2016-06-23 Thread Jacek Laskowski
Hi, After reviewing makeOffer and launchTasks in CoarseGrainedSchedulerBackend I came to the following conclusion: Scheduling in Spark relies on cores only (not memory), i.e. the number of tasks Spark can run on an executor is constrained by the number of cores available only. When submitting Spa

Re: Spark Thrift Server Concurrency

2016-06-23 Thread Michael Segel
Hi, There are a lot of moving parts and a lot of unknowns from your description. Besides the version stuff. How many executors, how many cores? How much memory? Are you persisting (memory and disk) or just caching (memory) During the execution… same tables… are you seeing a lot of shufflin

Re: [VOTE][RESULT] Release Apache Spark 1.6.2 (RC2)

2016-06-23 Thread Reynold Xin
Vote passed. Please see below. I will work on packaging the release. +1 (9 votes, 4 binding) Reynold Xin* Sean Owen* Tim Hunter Michael Armbrust* Sean McNamara* Kousuke Saruta Sameer Agarwal Krishna Sankar Vaquar Khan 0 none -1 Maciej Bryński * binding votes On Sun, Jun 19, 2016 at 9:24 PM,

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-23 Thread Reynold Xin
Maciej let's fix SPARK-13283. It won't block 1.6.2 though. On Thu, Jun 23, 2016 at 5:45 AM, Maciej Bryński wrote: > -1 > > I need SPARK-13283 to be solved. > > Regards, > Maciek Bryński > > 2016-06-23 0:13 GMT+02:00 Krishna Sankar : > >> +1 (non-binding, of course) >> >> 1. Compiled OSX 10.10 (Y

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-23 Thread vaquar khan
+1 (non-binding Regards, Vaquar khan On 23 Jun 2016 07:50, "Sean Owen" wrote: > I don't think that qualifies as a blocker; not even clear it's a > regression. Even non-binding votes here should focus on whether this > is OK to release as a maintenance update to 1.6.1. > > On Thu, Jun 23, 2016 at

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-23 Thread Sean Owen
I don't think that qualifies as a blocker; not even clear it's a regression. Even non-binding votes here should focus on whether this is OK to release as a maintenance update to 1.6.1. On Thu, Jun 23, 2016 at 1:45 PM, Maciej Bryński wrote: > -1 > > I need SPARK-13283 to be solved. > > Regards, >

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-23 Thread Maciej Bryński
-1 I need SPARK-13283 to be solved. Regards, Maciek Bryński 2016-06-23 0:13 GMT+02:00 Krishna Sankar : > +1 (non-binding, of course) > > 1. Compiled OSX 10.10 (Yosemite) OK Total time: 37:11 min > mvn clean package -Pyarn -Phadoop-2.6 -DskipTests > 2. Tested pyspark, mllib (iPython 4.0) >

Spark Thrift Server Concurrency

2016-06-23 Thread Prabhu Joseph
Hi All, On submitting 20 parallel same SQL query to Spark Thrift Server, the query execution time for some queries are less than a second and some are more than 2seconds. The Spark Thrift Server logs shows all 20 queries are submitted at same time 16/06/23 12:12:01 but the result schema are at

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-23 Thread Pete Robbins
I'm also seeing some of these same failures: - spilling with compression *** FAILED *** I have seen this occassionaly - to UTC timestamp *** FAILED *** This was fixed yesterday in branch-2.0 ( https://issues.apache.org/jira/browse/SPARK-16078) - offset recovery *** FAILED *** Haven't seen this f

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-23 Thread Sean Owen
First pass of feedback on the RC: all the sigs, hashes, etc are fine. Licensing is up to date to the best of my knowledge. I'm hitting test failures, some of which may be spurious. Just putting them out there to see if they ring bells. This is Java 8 on Ubuntu 16. - spilling with compression ***