Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Sameer Agarwal
lity issues and SQL. Please > join me in welcoming Tejas! > > Matei > > ----- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: Timeline for Spark 2.3

2017-11-10 Thread Sameer Agarwal
lease. >> Specifically, the work on the history server, Kubernetes and continuous >> processing >> 3. Given the actual release date of Spark 2.2, I think we'll still get >> Spark 2.3 out roughly 6 months after. >> >> Thoughts? >> >> Michael >> > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: Timeline for Spark 2.3

2017-12-19 Thread Sameer Agarwal
> Dong >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> -- > Twitter: https://twitter.com/holdenkarau > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: Timeline for Spark 2.3

2017-12-29 Thread Sameer Agarwal
; > From:Felix Cheung > To:Michael Armbrust , Holden Karau < > hol...@pigscanfly.ca> > Cc:Sameer Agarwal , Erik Erlandson < > eerla...@redhat.com>, dev > Date:2017/12/21 04:48 > Subject:Re: Timeline for Spark 2.3 >

Branch 2.3 is cut

2018-01-01 Thread Sameer Agarwal
We've just cut the release branch for Spark 2.3. Committers, please backport all important bug fixes and PRs as appropriate. Next, I'll go ahead and create the jenkins jobs for the release branch and then follow up with an RC early next week.

Re: Branch 2.3 is cut

2018-01-08 Thread Sameer Agarwal
d create an RC as soon as they're resolved. All relevant jenkins jobs for the release branch can be accessed at: https://amplab.cs.berkeley.edu/jenkins/ Regards, Sameer On Mon, Jan 1, 2018 at 5:22 PM, Sameer Agarwal wrote: > We've just cut the release branch for Spark 2.3. Com

Re: Branch 2.3 is cut

2018-01-11 Thread Sameer Agarwal
ection, I'll shortly followup with an RC to get the QA started in parallel. Thanks, Sameer On Mon, Jan 8, 2018 at 5:03 PM, Sameer Agarwal wrote: > Hello everyone, > > Just a quick update on the release. There are currently 2 correctness > blockers (SPARK-22984 <https://iss

[VOTE] Spark 2.3.0 (RC1)

2018-01-12 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package be

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-16 Thread Sameer Agarwal
gt; Critical: >> SPARK-22739 Additional Expression Support for Objects >> >> I actually don't think any of those Blockers should be Blockers; not sure >> if the last one is really critical either. >> >> I think this release will have to be re-rolled so I'd

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-17 Thread Sameer Agarwal
ublish it's important to get > the key in the Apache web of trust. > > On Tue, Jan 16, 2018 at 3:00 PM, Sameer Agarwal > wrote: > >> Yes, I'll cut an RC2 as soon as the remaining blockers are resolved. In >> the meantime, please continue to report any other issu

Re: Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-17 Thread Sameer Agarwal
8] Bump master branch version to 2.4.0-SNAPSHOT > <https://github.com/apache/spark/commit/651f76153f5e9b185aaf593161d40cabe7994fea> > > 2. Marco Gaido reports a flaky test suite and it turns out that the test > suite hangs in SPARK-23055 > <https://issues.apache.org/jira/brow

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-18 Thread Sameer Agarwal
This vote has failed in favor of a new RC. I'll follow up with a new RC2 as soon as the 3 remaining test/UI blockers <https://s.apache.org/oXKi> are resolved. On 17 January 2018 at 16:38, Sameer Agarwal wrote: > Thanks, will do! > > On 16 January 2018 at 22:09, Holden K

[VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package beca

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
t;> >> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the >> >> documentation ones). It is not long, but it seems some of those need >> >> to be looked at. It would be nice for the committers who are involved >> >> in those bugs to t

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
rce archive; perhaps we should add "use GNU tar" >>> to the RM checklist? >>> >>> Also ran our internal tests and they seem happy. >>> >>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the >>> documentation ones). It is

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-26 Thread Sameer Agarwal
This vote has failed due to a number of aforementioned blockers. I'll follow up with RC3 as soon as the 2 remaining (non-QA) blockers are resolved: https://s.apache.org/oXKi On 25 January 2018 at 12:59, Sameer Agarwal wrote: > > Most tests pass on RC2, except I'm still se

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Sameer Agarwal
t; 2.2.0. The ticket has a simple repro included, showing a query that works > in prior releases but now fails with an exception in the catalyst optimizer. > > On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal > wrote: > >> This vote has failed due to a number of aforementioned bl

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Sameer Agarwal
>> Perhaps a mention in release notes? >> >>michael >> >> >> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath >> wrote: >> >> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side >> that should be everything outstan

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-06 Thread Sameer Agarwal
s://issues.apache.org/jira/browse/SPARK-23304> for >> the coalesce issue. >> >> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA >> >> <https://issues.apache.org/jira/browse/SPARK-23304> >> >> >> Tom >> >>

[VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
Now that all known blockers have once again been resolved, please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
I'll start the vote with a +1. As of today, all known release blockers and QA tasks have been resolved, and the jenkins builds are healthy: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ On 12 February 2018 at 22:30, Sameer Agarwal wrote: > Now that a

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-13 Thread Sameer Agarwal
PARK-23316 AnalysisException after max iteration reached for IN query > > ... though the pandas tests issue is "Critical". > > (SPARK-23083 is an update to the main site that should happen as the > artifacts are released, so it's OK.) > > On Tue, Feb 13, 2018 at 12:

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Sameer Agarwal
In addition to the issues mentioned above, Wenchen and Xiao have flagged two other regressions (https://issues.apache.org/jira/browse/SPARK-23316 and https://issues.apache.org/jira/browse/SPARK-23388) that were merged after RC3 was cut. Due to these, this vote fails. I'll follow-up with an RC4 in

[VOTE] Spark 2.3.0 (RC4)

2018-02-17 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package b

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-17 Thread Sameer Agarwal
I'll start with a +1 once again. All blockers reported against RC3 have been resolved and the builds are healthy. On 17 February 2018 at 13:41, Sameer Agarwal wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.0. The vote is open until Thursday

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Sameer Agarwal
> > this file shouldn't be included? https://dist.apache.org/repos/ > dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml > I've now deleted this file *From:* Sameer Agarwal > *Sent:* Saturday, February 17, 2018 1:43:39 PM > *To:* Sameer Agarwal > *Cc:* dev >

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> >>>>>> Wenchen Fan 于2018年2月20日 周二下午1:09写道: >>>>>> >>>>>>>

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
it's pretty safe. > > On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal > wrote: > > This RC has failed due to https://issues.apache.org/ > jira/browse/SPARK-23470. > > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up > > with an

[VOTE] Spark 2.3.0 (RC5)

2018-02-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package be

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Sameer Agarwal
>>> hol...@pigscanfly.ca> wrote: >>>>>>>>> >>>>>>>>>> Note: given the state of Jenkins I'd love to see Bryan Cutler or >>>>>>>>>> someone with Arrow experience sign off on this release. >>>>

[ANNOUNCE] Announcing Apache Spark 2.3.0

2018-02-28 Thread Sameer Agarwal
Hi all, Apache Spark 2.3.0 is the fourth major release in the 2.x line. This release adds support for continuous processing in structured streaming along with a brand new Kubernetes scheduler backend. Other major updates include the new data source and structured streaming v2 APIs, a standard imag

Re: Welcoming some new committers

2018-03-03 Thread Sameer Agarwal
Congratulations!! On 3 March 2018 at 13:12, Mridul Muralidharan wrote: > Congratulations ! > > > Regards, > Mridul > > > On Fri, Mar 2, 2018 at 2:41 PM, Matei Zaharia > wrote: > > Hi everyone, > > > > The Spark PMC has recently voted to add several new committers to the > project, based on thei

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Sameer Agarwal
henhua is the major contributor of the CBO project, and has been >>> contributing across several areas of Spark for a while, focusing especially >>> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua! >>> > >>> > Wenchen >>> &

Re: V2.3 Scala API to Github Links Incorrect

2018-04-15 Thread Sameer Agarwal
[+Hyukjin] Thanks for flagging this Jayesh. https://github.com/apache/spar k-website/pull/111 is tracking a short term fix to the API docs and https://issues.apache.org/jira/browse/SPARK-23732 tracks the fix to the release scripts. Regards, Sameer On 15 April 2018 at 18:50, Thakrar, Jayesh wro

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Sameer Agarwal
gt;>>>>>> [ ] +1 Release this package as Apache Spark 2.0.1 > >>>>>>>>>>> [ ] -1 Do not release this package because ... > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> The tag to be voted on is v2.0

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Sameer Agarwal
ase mark the fix version as 2.0.2, rather than 2.0.1. If a new RC > > (i.e. RC5) is cut, I will change the fix version of those patches to > 2.0.1. > > > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-12 Thread Sameer Agarwal
t; View this message in context: http://apache-spark-developers >> -list.1001551.n3.nabble.com/Uploading-PySpark-2-1-1-to- >> PyPi-tp21531p21532.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >>

Re: Will higher order functions in spark SQL be pushed upstream?

2017-06-09 Thread Sameer Agarwal
> > * As a heavy user of complex data types I was wondering if there was > any plan to push those changes upstream? > Yes, we intend to contribute this to open source. > * In addition, I was wondering if as part of this change it also tries > to solve the column pruning / filter pushdown issues

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-03 Thread Sameer Agarwal
he.org/~pwendell/spark-releases/spark- >>>> 2.2.0-rc6-docs/ >>>> >>>> >>>> *FAQ* >>>> >>>> *How can I help test this release?* >>>> >>>> If you are a Spark user, you can help us test this release by taking an >>>> existing Spark workload and running on this release candidate, then >>>> reporting any regressions. >>>> >>>> *What should happen to JIRA tickets still targeting 2.2.0?* >>>> >>>> Committers should look at those and triage. Extremely important bug >>>> fixes, documentation, and API tweaks that impact compatibility should be >>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1. >>>> >>>> *But my bug isn't fixed!??!* >>>> >>>> In order to make timely releases, we will typically not hold the >>>> release unless the bug in question is a regression from 2.1.1. >>>> >>>> >>>> >>> >>> >> > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-08 Thread Sameer Agarwal
Laskowski wrote: > >> Hi, >> >> Congrats!! Looks like Sean is gonna be less busy these days ;-) >> >> Jacek >> >> On 7 Aug 2017 5:53 p.m., "Matei Zaharia" wrote: >> >>> Hi everyone, >>> >>> The Spark PMC recently vo

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-06 Thread Sameer Agarwal
t; The vote will be up for the next 72 hours. Please reply with your vote: >>> >>> +1: Yeah, let's go forward and implement the SPIP. >>> +0: Don't really care. >>> -1: I don't think this is a good idea because of the following technical >>> reasons. >>> >>> Thanks! >>> >> >> > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Sameer Agarwal
g the length of the input value for 0-parameter >>>>>>> UDFs. >>>>>>> The return value should be pandas.Series of the specified type and >>>>>>> the length of the returned value should be the same as input value. >>>>>>> >>>>>>> We can define vectorized UDFs as: >>>>>>> >>>>>>> @pandas_udf(DoubleType()) >>>>>>> def plus(v1, v2): >>>>>>> return v1 + v2 >>>>>>> >>>>>>> or we can define as: >>>>>>> >>>>>>> plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType()) >>>>>>> >>>>>>> We can use it similar to row-by-row UDFs: >>>>>>> >>>>>>> df.withColumn('sum', plus(df.v1, df.v2)) >>>>>>> >>>>>>> As for 0-parameter UDFs, we can define and use as: >>>>>>> >>>>>>> @pandas_udf(LongType()) >>>>>>> def f0(size): >>>>>>> return pd.Series(1).repeat(size) >>>>>>> >>>>>>> df.select(f0()) >>>>>>> >>>>>>> >>>>>>> >>>>>>> The vote will be up for the next 72 hours. Please reply with your >>>>>>> vote: >>>>>>> >>>>>>> +1: Yeah, let's go forward and implement the SPIP. >>>>>>> +0: Don't really care. >>>>>>> -1: I don't think this is a good idea because of the following technical >>>>>>> reasons. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> -- >>>>>>> Takuya UESHIN >>>>>>> Tokyo, Japan >>>>>>> >>>>>>> http://twitter.com/ueshin >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Takuya UESHIN >>>>> Tokyo, Japan >>>>> >>>>> http://twitter.com/ueshin >>>>> >>>> >>> >> >> >> -- >> Takuya UESHIN >> Tokyo, Japan >> >> http://twitter.com/ueshin >> > > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Sameer Agarwal
pper.hasNext(Wrappers.scala:30) >> at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) >> at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) >> at >> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) >> at >> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) >> at org.apache.spark.scheduler.Task.run(Task.scala:85) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> > > -- Sameer Agarwal Software Engineer | Databricks Inc. http://cs.berkeley.edu/~sameerag

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sameer Agarwal
;>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time, >>>>> >>> >>>>> >>> >>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1, >>>>