Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
How about running this - select * from (select * , count() over (partition by id order by id) c from filteredDS) f where f.cnt < 7500 On Sun, Mar 5, 2017 at 12:05 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Yes every time I run this code with production scale data it fails. Test

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread Ankur Srivastava
Yes every time I run this code with production scale data it fails. Test case with small dataset of 50 records on local box runs fine. Thanks Ankur Sent from my iPhone > On Mar 4, 2017, at 12:09 PM, ayan guha wrote: > > Just to be sure, can you reproduce the error using sql api? > >> On Sat,

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
Just to be sure, can you reproduce the error using sql api? On Sat, 4 Mar 2017 at 2:32 pm, Ankur Srivastava wrote: > Adding DEV. > > Or is there any other way to do subtractByKey using Dataset APIs? > > Thanks > Ankur > > On Wed, Mar 1, 2017 at 1:28 PM, Ankur Srivastava < > ankur.srivast...@gmai

Re: Spark 2.0 issue with left_outer join

2017-03-03 Thread Ankur Srivastava
Adding DEV. Or is there any other way to do subtractByKey using Dataset APIs? Thanks Ankur On Wed, Mar 1, 2017 at 1:28 PM, Ankur Srivastava wrote: > Hi Users, > > We are facing an issue with left_outer join using Spark Dataset api in 2.0 > Java API. Below is the code we have > > Dataset badIds

Spark 2.0 issue with left_outer join

2017-03-01 Thread Ankur Srivastava
Hi Users, We are facing an issue with left_outer join using Spark Dataset api in 2.0 Java API. Below is the code we have Dataset badIds = filteredDS.groupBy(col("id").alias("bid")).count() .filter((FilterFunction) row -> (Long) row.getAs("count") > 75000); _logger.info("Id count with over

Re: Spark 2.0 issue

2016-09-29 Thread Xiao Li
Hi, Ashish, Will take a look at this soon. Thanks for reporting this, Xiao 2016-09-29 14:26 GMT-07:00 Ashish Shrowty : > If I try to inner-join two dataframes which originated from the same initial > dataframe that was loaded using spark.sql() call, it results in an error - > > // reading f

Spark 2.0 issue

2016-09-29 Thread Ashish Shrowty
If I try to inner-join two dataframes which originated from the same initial dataframe that was loaded using spark.sql() call, it results in an error - // reading from Hive .. the data is stored in Parquet format in Amazon S3 val d1 = spark.sql("select * from ") val df1 = d1.groupBy("

Re: spark 2.0 issue with yarn?

2016-05-10 Thread Steve Loughran
On 9 May 2016, at 21:24, Jesse F Chen mailto:jfc...@us.ibm.com>> wrote: I had been running fine until builds around 05/07/2016 If I used the "--master yarn" in builds after 05/07, I got the following error...sounds like something jars are missing. I am using YARN 2.7.2 and Hive 1.2.1. D

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
On Mon, May 9, 2016 at 3:34 PM, Matt Cheah wrote: > @Marcelo: Interesting - why would this manifest on the YARN-client side > though (as Spark is the client to YARN in this case)? Spark as a client > shouldn’t care about what auxiliary services are on the YARN cluster. The ATS client is based on

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Matt Cheah
@Marcelo: Interesting - why would this manifest on the YARN-client side though (as Spark is the client to YARN in this case)? Spark as a client shouldn’t care about what auxiliary services are on the YARN cluster. @Jesse: The change I wrote excludes all artifacts from the com.sun.jersey group. So

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
Hi Jesse, On Mon, May 9, 2016 at 2:52 PM, Jesse F Chen wrote: > Sean - thanks. definitely related to SPARK-12154. > Is there a way to continue use Jersey 1 for existing working environment? The error you're getting is because of a third-party extension that tries to talk to the YARN ATS; that's

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Sean Owen
this may be related to updating to > Jersey 2, which happened 4 days ago: https://issues.apache.or > > From: Sean Owen > To: Jesse F Chen/San Francisco/IBM@IBMUS > Cc: spark users , dev , Roy > Cecil , Matt Cheah > Date: 05/09/2016 02:19 PM > Subject: Re: spark 2.0 issue wi

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Jesse F Chen
, dev , Roy Cecil , Matt Cheah Date: 05/09/2016 02:19 PM Subject: Re: spark 2.0 issue with yarn? Hm, this may be related to updating to Jersey 2, which happened 4 days ago: https://issues.apache.org/jira/browse/SPARK-12154 That is a Jersey 1 class that's missing. H

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Sean Owen
Hm, this may be related to updating to Jersey 2, which happened 4 days ago: https://issues.apache.org/jira/browse/SPARK-12154 That is a Jersey 1 class that's missing. How are you building and running Spark? I think the theory was that Jersey 1 would still be supplied at runtime. We may have to re

spark 2.0 issue with yarn?

2016-05-09 Thread Jesse F Chen
I had been running fine until builds around 05/07/2016 If I used the "--master yarn" in builds after 05/07, I got the following error...sounds like something jars are missing. I am using YARN 2.7.2 and Hive 1.2.1. Do I need something new to deploy related to YARN? bin/spark-sql -driver-me