lity issues and SQL. Please
> join me in welcoming Tejas!
>
> Matei
>
> -----
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
lease.
>> Specifically, the work on the history server, Kubernetes and continuous
>> processing
>> 3. Given the actual release date of Spark 2.2, I think we'll still get
>> Spark 2.3 out roughly 6 months after.
>>
>> Thoughts?
>>
>> Michael
>>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
> Dong
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
;
> From:Felix Cheung
> To:Michael Armbrust , Holden Karau <
> hol...@pigscanfly.ca>
> Cc:Sameer Agarwal , Erik Erlandson <
> eerla...@redhat.com>, dev
> Date:2017/12/21 04:48
> Subject:Re: Timeline for Spark 2.3
>
We've just cut the release branch for Spark 2.3. Committers, please
backport all important bug fixes and PRs as appropriate.
Next, I'll go ahead and create the jenkins jobs for the release branch and
then follow up with an RC early next week.
d create an RC as soon as
they're resolved. All relevant jenkins jobs for the release branch can be
accessed at: https://amplab.cs.berkeley.edu/jenkins/
Regards,
Sameer
On Mon, Jan 1, 2018 at 5:22 PM, Sameer Agarwal
wrote:
> We've just cut the release branch for Spark 2.3. Com
ection, I'll shortly followup
with an RC to get the QA started in parallel.
Thanks,
Sameer
On Mon, Jan 8, 2018 at 5:03 PM, Sameer Agarwal
wrote:
> Hello everyone,
>
> Just a quick update on the release. There are currently 2 correctness
> blockers (SPARK-22984 <https://iss
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.
[ ] +1 Release this package as Apache Spark 2.3.0
[ ] -1 Do not release this package be
gt; Critical:
>> SPARK-22739 Additional Expression Support for Objects
>>
>> I actually don't think any of those Blockers should be Blockers; not sure
>> if the last one is really critical either.
>>
>> I think this release will have to be re-rolled so I'd
ublish it's important to get
> the key in the Apache web of trust.
>
> On Tue, Jan 16, 2018 at 3:00 PM, Sameer Agarwal
> wrote:
>
>> Yes, I'll cut an RC2 as soon as the remaining blockers are resolved. In
>> the meantime, please continue to report any other issu
8] Bump master branch version to 2.4.0-SNAPSHOT
> <https://github.com/apache/spark/commit/651f76153f5e9b185aaf593161d40cabe7994fea>
>
> 2. Marco Gaido reports a flaky test suite and it turns out that the test
> suite hangs in SPARK-23055
> <https://issues.apache.org/jira/brow
This vote has failed in favor of a new RC. I'll follow up with a new RC2 as
soon as the 3 remaining test/UI blockers <https://s.apache.org/oXKi> are
resolved.
On 17 January 2018 at 16:38, Sameer Agarwal wrote:
> Thanks, will do!
>
> On 16 January 2018 at 22:09, Holden K
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
passes if a majority of at least 3 PMC +1 votes are cast.
[ ] +1 Release this package as Apache Spark 2.3.0
[ ] -1 Do not release this package beca
t;>
>> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> >> documentation ones). It is not long, but it seems some of those need
>> >> to be looked at. It would be nice for the committers who are involved
>> >> in those bugs to t
rce archive; perhaps we should add "use GNU tar"
>>> to the RM checklist?
>>>
>>> Also ran our internal tests and they seem happy.
>>>
>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>> documentation ones). It is
This vote has failed due to a number of aforementioned blockers. I'll
follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
resolved: https://s.apache.org/oXKi
On 25 January 2018 at 12:59, Sameer Agarwal wrote:
>
> Most tests pass on RC2, except I'm still se
t; 2.2.0. The ticket has a simple repro included, showing a query that works
> in prior releases but now fails with an exception in the catalyst optimizer.
>
> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal
> wrote:
>
>> This vote has failed due to a number of aforementioned bl
>> Perhaps a mention in release notes?
>>
>>michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstan
s://issues.apache.org/jira/browse/SPARK-23304> for
>> the coalesce issue.
>>
>> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA
>>
>> <https://issues.apache.org/jira/browse/SPARK-23304>
>>
>>
>> Tom
>>
>>
Now that all known blockers have once again been resolved, please vote on
releasing the following candidate as Apache Spark version 2.3.0. The vote
is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
majority of at least 3 PMC +1 votes are cast.
[ ] +1 Release this package as
I'll start the vote with a +1.
As of today, all known release blockers and QA tasks have been resolved,
and the jenkins builds are healthy:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
On 12 February 2018 at 22:30, Sameer Agarwal wrote:
> Now that a
PARK-23316 AnalysisException after max iteration reached for IN query
>
> ... though the pandas tests issue is "Critical".
>
> (SPARK-23083 is an update to the main site that should happen as the
> artifacts are released, so it's OK.)
>
> On Tue, Feb 13, 2018 at 12:
In addition to the issues mentioned above, Wenchen and Xiao have flagged
two other regressions (https://issues.apache.org/jira/browse/SPARK-23316
and https://issues.apache.org/jira/browse/SPARK-23388) that were merged
after RC3 was cut.
Due to these, this vote fails. I'll follow-up with an RC4 in
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.
[ ] +1 Release this package as Apache Spark 2.3.0
[ ] -1 Do not release this package b
I'll start with a +1 once again.
All blockers reported against RC3 have been resolved and the builds are
healthy.
On 17 February 2018 at 13:41, Sameer Agarwal wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.3.0. The vote is open until Thursday
>
> this file shouldn't be included? https://dist.apache.org/repos/
> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>
I've now deleted this file
*From:* Sameer Agarwal
> *Sent:* Saturday, February 17, 2018 1:43:39 PM
> *To:* Sameer Agarwal
> *Cc:* dev
>
>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>>
>>>>>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>>>>>
>>>>>>>
it's pretty safe.
>
> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal
> wrote:
> > This RC has failed due to https://issues.apache.org/
> jira/browse/SPARK-23470.
> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
> > with an
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.
[ ] +1 Release this package as Apache Spark 2.3.0
[ ] -1 Do not release this package be
>>> hol...@pigscanfly.ca> wrote:
>>>>>>>>>
>>>>>>>>>> Note: given the state of Jenkins I'd love to see Bryan Cutler or
>>>>>>>>>> someone with Arrow experience sign off on this release.
>>>>
Hi all,
Apache Spark 2.3.0 is the fourth major release in the 2.x line. This
release adds support for continuous processing in structured streaming
along with a brand new Kubernetes scheduler backend. Other major updates
include the new data source and structured streaming v2 APIs, a standard
imag
Congratulations!!
On 3 March 2018 at 13:12, Mridul Muralidharan wrote:
> Congratulations !
>
>
> Regards,
> Mridul
>
>
> On Fri, Mar 2, 2018 at 2:41 PM, Matei Zaharia
> wrote:
> > Hi everyone,
> >
> > The Spark PMC has recently voted to add several new committers to the
> project, based on thei
henhua is the major contributor of the CBO project, and has been
>>> contributing across several areas of Spark for a while, focusing especially
>>> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua!
>>> >
>>> > Wenchen
>>>
&
[+Hyukjin]
Thanks for flagging this Jayesh. https://github.com/apache/spar
k-website/pull/111 is tracking a short term fix to the API docs and
https://issues.apache.org/jira/browse/SPARK-23732 tracks the fix to the
release scripts.
Regards,
Sameer
On 15 April 2018 at 18:50, Thakrar, Jayesh
wro
gt;>>>>>> [ ] +1 Release this package as Apache Spark 2.0.1
> >>>>>>>>>>> [ ] -1 Do not release this package because ...
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The tag to be voted on is v2.0
ase mark the fix version as 2.0.2, rather than 2.0.1. If a new RC
> > (i.e. RC5) is cut, I will change the fix version of those patches to
> 2.0.1.
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
t; View this message in context: http://apache-spark-developers
>> -list.1001551.n3.nabble.com/Uploading-PySpark-2-1-1-to-
>> PyPi-tp21531p21532.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>>
>
> * As a heavy user of complex data types I was wondering if there was
> any plan to push those changes upstream?
>
Yes, we intend to contribute this to open source.
> * In addition, I was wondering if as part of this change it also tries
> to solve the column pruning / filter pushdown issues
he.org/~pwendell/spark-releases/spark-
>>>> 2.2.0-rc6-docs/
>>>>
>>>>
>>>> *FAQ*
>>>>
>>>> *How can I help test this release?*
>>>>
>>>> If you are a Spark user, you can help us test this release by taking an
>>>> existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>>
>>>> *But my bug isn't fixed!??!*
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from 2.1.1.
>>>>
>>>>
>>>>
>>>
>>>
>>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
Laskowski wrote:
>
>> Hi,
>>
>> Congrats!! Looks like Sean is gonna be less busy these days ;-)
>>
>> Jacek
>>
>> On 7 Aug 2017 5:53 p.m., "Matei Zaharia" wrote:
>>
>>> Hi everyone,
>>>
>>> The Spark PMC recently vo
t; The vote will be up for the next 72 hours. Please reply with your vote:
>>>
>>> +1: Yeah, let's go forward and implement the SPIP.
>>> +0: Don't really care.
>>> -1: I don't think this is a good idea because of the following technical
>>> reasons.
>>>
>>> Thanks!
>>>
>>
>>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
g the length of the input value for 0-parameter
>>>>>>> UDFs.
>>>>>>> The return value should be pandas.Series of the specified type and
>>>>>>> the length of the returned value should be the same as input value.
>>>>>>>
>>>>>>> We can define vectorized UDFs as:
>>>>>>>
>>>>>>> @pandas_udf(DoubleType())
>>>>>>> def plus(v1, v2):
>>>>>>> return v1 + v2
>>>>>>>
>>>>>>> or we can define as:
>>>>>>>
>>>>>>> plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType())
>>>>>>>
>>>>>>> We can use it similar to row-by-row UDFs:
>>>>>>>
>>>>>>> df.withColumn('sum', plus(df.v1, df.v2))
>>>>>>>
>>>>>>> As for 0-parameter UDFs, we can define and use as:
>>>>>>>
>>>>>>> @pandas_udf(LongType())
>>>>>>> def f0(size):
>>>>>>> return pd.Series(1).repeat(size)
>>>>>>>
>>>>>>> df.select(f0())
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The vote will be up for the next 72 hours. Please reply with your
>>>>>>> vote:
>>>>>>>
>>>>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>>>>> +0: Don't really care.
>>>>>>> -1: I don't think this is a good idea because of the following technical
>>>>>>> reasons.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> --
>>>>>>> Takuya UESHIN
>>>>>>> Tokyo, Japan
>>>>>>>
>>>>>>> http://twitter.com/ueshin
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Takuya UESHIN
>>>>> Tokyo, Japan
>>>>>
>>>>> http://twitter.com/ueshin
>>>>>
>>>>
>>>
>>
>>
>> --
>> Takuya UESHIN
>> Tokyo, Japan
>>
>> http://twitter.com/ueshin
>>
>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
pper.hasNext(Wrappers.scala:30)
>> at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>> at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>
>
--
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag
;>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>>> >>>
>>>>> >>>
>>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>>
44 matches
Mail list logo