[SparkSQL 1.4.0]The result of SUM(xxx) in SparkSQL is 0.0 but not null when the column xxx is all null

2015-07-02 Thread StanZhai
Hi all, I have a table named test like this: | a | b | | 1 | null | | 2 | null | After upgraded the cluster from spark 1.3.1 to 1.4.0, I found the Sum function in spark 1.4 and 1.3 are different. The SQL is: select sum(b) from test In Spark 1.4.0 the result is 0.0, in spark 1.3.1 the

Re: except vs subtract

2015-07-02 Thread Reynold Xin
"except" is a keyword in Python unfortunately. On Thu, Jul 2, 2015 at 11:54 PM, Krishna Sankar wrote: > Guys, >Scala says except while python has subtract. (I verified that except > doesn't exist in python) Why the difference in syntax for the same > functionality ? > Cheers > >

except vs subtract

2015-07-02 Thread Krishna Sankar
Guys, Scala says except while python has subtract. (I verified that except doesn't exist in python) Why the difference in syntax for the same functionality ? Cheers

Differential Equation Spark Solver

2015-07-02 Thread jamaica
Dear Spark Devs, I have written an experimental 1d laplace parallel Spark solver , out of curiousity regarding this

Re: Grouping runs of elements in a RDD

2015-07-02 Thread RJ Nowling
Thanks, Mohit. It sounds like we're on the same page -- I used a similar approach. On Thu, Jul 2, 2015 at 12:27 PM, Mohit Jaggi wrote: > if you are joining successive lines together based on a predicate, then > you are doing a "flatMap" not an "aggregate". you are on the right track > with a mu

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Andrew Or
@Sean I believe that is a real issue. I have submitted a patch to fix it: https://github.com/apache/spark/pull/7193. Unfortunately this would mean we need to cut a new RC to include it. When we do so we should also do another careful pass over the commits that are merged since the first RC. -1 20

Re: Grouping runs of elements in a RDD

2015-07-02 Thread Mohit Jaggi
if you are joining successive lines together based on a predicate, then you are doing a "flatMap" not an "aggregate". you are on the right track with a multi-pass solution. i had the same challenge when i needed a sliding window over an RDD(see below). [ i had suggested that the sliding window API

A proposal for Test matrix decompositions for speed/stability (SPARK-7210)

2015-07-02 Thread Chris Harvey
Hello, I am new to the Apache Spark project but I would like to contribute to issue SPARK-7210. There has been come conversation on that issue and I would like to take a shot at it. Before doing so, I want to run my plan by everyone. >From the description and the comments, the goal is to test oth

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Shivaram Venkataraman
+1 Tested the EC2 launch scripts and the Spark version and EC2 branch etc. look good. Shivaram On Thu, Jul 2, 2015 at 8:22 AM, Patrick Wendell wrote: > Hey Sean - yes I think that is an issue. Our published poms need to > have the dependency versions inlined. > > We probably need to revert that

[SPARK-8794] [SQL] PrunedScan problem

2015-07-02 Thread Eron Wright
I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal performance in ML pipelines. Sorry if the issue is already known. Having tried a few approaches to working with large binary files with Spark ML, I prefer loading the data into a vector-type column from a relation

Size of RDD partitions

2015-07-02 Thread prateek3.14
Hello everyone, Are there metrics for capturing the size of RDD partitions? Would the memory usage of an executor be a good proxy for the same? Thanks, --Prateek -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Size-of-RDD-partitions-tp12996.htm

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Patrick Wendell
Hey Sean - yes I think that is an issue. Our published poms need to have the dependency versions inlined. We probably need to revert that bit of the build patch. - Patrick On Thu, Jul 2, 2015 at 7:21 AM, vaquar khan wrote: > +1 > > On 2 Jul 2015 18:03, "shenyan zhen" wrote: >> >> +1 >> >> On J

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread vaquar khan
+1 On 2 Jul 2015 18:03, "shenyan zhen" wrote: > +1 > On Jun 30, 2015 8:28 PM, "Reynold Xin" wrote: > >> +1 >> >> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.4.1! >>> >>> This release fixes a ha

Re: enum-like types in Spark

2015-07-02 Thread Imran Rashid
Hi Stephen, I'm not sure which link you are referring to for the example code -- but yes, the recommendation is that you create the enum in Java, eg. see https://github.com/apache/spark/blob/v1.4.0/core/src/main/java/org/apache/spark/status/api/v1/StageStatus.java Then nothing special is require

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread shenyan zhen
+1 On Jun 30, 2015 8:28 PM, "Reynold Xin" wrote: > +1 > > On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 1.4.1! >> >> This release fixes a handful of known issues in Spark 1.4.0, listed here: >> http://s

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-02 Thread Sean Owen
I wanted to flag a potential blocker here, but pardon me if this is still after all this time just my misunderstanding of the POM/build theory -- So this is the final candiate release POM right? https://repository.apache.org/content/repositories/orgapachespark-1118/org/apache/spark/spark-core_2.10