Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-07 Thread Michael Armbrust
This vote passes! I'll followup with the release on Monday. +1: Michael Armbrust (binding) Kazuaki Ishizaki Sean Owen (binding) Joseph Bradley (binding) Ricardo Almeida Herman van Hövell tot Westerflier (binding) Yanbo Liang Nick Pentreath (binding) Wenchen Fan (binding) Sameer Agarwal Denny Lee F

Re: [build system] important: potential upgrading of the python build environment

2017-07-07 Thread shane knapp
this is done. i'll babysit builds today. On Fri, Jul 7, 2017 at 8:40 AM, shane knapp wrote: > doing this now. > > On Thu, Jul 6, 2017 at 1:34 PM, shane knapp wrote: >> (big CC list so the people involved have visibility) >> >> we're currently using a (very old) installation of anaconda python t

Re: [build system] important: potential upgrading of the python build environment

2017-07-07 Thread shane knapp
doing this now. On Thu, Jul 6, 2017 at 1:34 PM, shane knapp wrote: > (big CC list so the people involved have visibility) > > we're currently using a (very old) installation of anaconda python to > manage the python3 build deps, but it turns out that our (very old) > versions of numpy and pandas

RE: [SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?

2017-07-07 Thread assaf.mendelson
I actually asked the same thing a couple of weeks ago. Apparently, when you create a structured streaming plan, it is different than the batch plan and is fixed in order to properly aggregate. If you perform most operations on the dataframe it will recalculate the plan as a batch plan and will t

[SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?

2017-07-07 Thread Jacek Laskowski
Hi, Just noticed that the input DataFrame is collect'ed and then parallelize'd simply to show it to the console [1]. Why so many fairly expensive operations for show? I'd appreciate some help understanding this code. Thanks. [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scal

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-07 Thread Xiao Li
+1 Xiao Li 2017-07-06 22:18 GMT-07:00 Yin Huai : > +1 > > On Thu, Jul 6, 2017 at 8:40 PM, Hyukjin Kwon wrote: > >> +1 >> >> 2017-07-07 6:41 GMT+09:00 Reynold Xin : >> >>> +1 >>> >>> >>> On Fri, Jun 30, 2017 at 6:44 PM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> Please vote o