Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Xiangrui Meng
I posted my comment in the JIRA . Main concerns here: 1. Exposing third-party Java APIs in Spark is risky. Arrow might have 1.0 rele

RE: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread tcondie
+1 (non-binding) for better columnar data processing support. From: Jules Damji Sent: Friday, April 19, 2019 12:21 PM To: Bryan Cutler Cc: Dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support + (non-binding) Sent from my iPhone Pardon the du

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Jules Damji
+ (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) > On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote: > > +1 (non-binding) > >> On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: >> +1 (non-binding). Looking forward to seeing better support for processing >> columnar data.

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread shane knapp
-1, as i'd like to be sure that the python test infra change for jenkins is included (https://github.com/apache/spark/pull/24379) On Fri, Apr 19, 2019 at 12:01 PM Michael Armbrust wrote: > +1 (binding), we've test this and it LGTM. > > On Thu, Apr 18, 2019 at 7:51 PM Wenchen Fan wrote: > >> Ple

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread Michael Armbrust
+1 (binding), we've test this and it LGTM. On Thu, Apr 18, 2019 at 7:51 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.2. > > The vote is open until April 23 PST and passes if a majority +1 PMC votes > are cast, with > a minimum of 3 +1 vot

Re: pyspark.sql.functions ide friendly

2019-04-19 Thread educhana
It's not oly the linter, but also autocompletion and help. Aside, in the module some functions are declared statically and the difference is not clear. On 2019/04/17 11:35:53, Sean Owen wrote: > I use IntelliJ and have never seen an issue parsing the pyspark > functions... you're just saying

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
While we're on this subject, there are two more dependency updates that we could consider including in 2.4.2 on the same grounds, as they're dependencies with CVEs. However, it's not clear whether the CVEs actually affect Spark. These are already in master. https://issues.apache.org/jira/browse/SP

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
All: here is the backport of changes to update to 2.9.8 from master back to 2.4. https://github.com/apache/spark/pull/24418 master has been on 2.9.8 for a while, so the concern isnt' Spark so much. It's that user apps would face the same change of behavior if they used Jackson in a similar way. I'

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Bryan Cutler
+1 (non-binding) On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: > +1 (non-binding). Looking forward to seeing better support for processing > columnar data. > > Jason > > On Tue, Apr 16, 2019 at 10:38 AM Tom Graves > wrote: > >> Hi everyone, >> >> I'd like to call for a vote on SPARK-27396

Re: Spark 2.4.2

2019-04-19 Thread Driesprong, Fokko
For me a +1 on upgrading Jackson as well. This has been long overdue. There are some behavioural changes regarding handling null/None. This is also described in the PR: https://github.com/apache/spark/pull/21596 Also it has a positive impact on the performance. Cheers, Fokko Op vr 19 apr. 2019 o

Re: Spark 2.4.2

2019-04-19 Thread Arun Mahadevan
+1 to upgrade Jackson. It has come up multiple times due to CVEs and the back port has worked out but it may be good to include if its not going to delay the release. On Thu, 18 Apr 2019 at 19:53, Wenchen Fan wrote: > I've cut RC1. If people think we must upgrade Jackson in 2.4, I can cut > RC2

DataSourceV2 sync, 17 April 2019

2019-04-19 Thread Ryan Blue
Here are my notes from the last DSv2 sync. As always: - If you’d like to attend the sync, send me an email and I’ll add you to the invite. Everyone is welcome. - These notes are what I wrote down and remember. If you have corrections or comments, please reply. *Topics*: - TableCat

Re: [SPARK-25079][build system] the future of python3.6 is upon us!

2019-04-19 Thread shane knapp
and this is done. welcome to the brave new world of python3.6! On Fri, Apr 19, 2019 at 9:34 AM shane knapp wrote: > i will actually be doing this now! > > > > On Thu, Apr 18, 2019 at 2:57 PM shane knapp wrote: > >> well, upon us on monday. :) >> >> firstly, an important note: if you have an

Re: [SPARK-25079][build system] the future of python3.6 is upon us!

2019-04-19 Thread shane knapp
i will actually be doing this now! On Thu, Apr 18, 2019 at 2:57 PM shane knapp wrote: > well, upon us on monday. :) > > firstly, an important note: if you have an open PR, please check to see > if you need to rebase my changes on it before testing. > > monday @ 11am PST, i will begin. in or

Re: pyspark.sql.functions ide friendly

2019-04-19 Thread Hyukjin Kwon
+1 I'm good with changing too. On Thu, 18 Apr 2019, 01:18 Reynold Xin, wrote: > Are you talking about the ones that are defined in a dictionary? If yes, > that was actually not that great in hindsight (makes it harder to read & > change), so I'm OK changing it. > > E.g. > > _functions = { >

In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-19 Thread Hyukjin Kwon
Hi all, Looks 'spark/dev/github_jira_sync.py' is not running correctly somewhere. Usually the JIRA's status should be updated to "IN PROGRESS" when somebody opens a PR against a JIRA. Looks now it only leaves a link and does not change JIRA's status. Can someone else who knows where it's running