Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-23 Thread Hyukjin Kwon
Hi all, Seems like this github_jira_sync.py script seems stopped working completely now. https://issues.apache.org/jira/browse/SPARK-31532 <> https://github.com/apache/spark/pull/28316 https://issues.apache.org/jira/browse/SPAR

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Holden Karau
Sounds good, I’ll make the JIRAs for tracking then and I can ping the original PR authors in their and based on their feedback either include or not. On Thu, Apr 23, 2020 at 11:51 AM Xiao Li wrote: > Actually, SPARK-26390 https://github.com/apache/spark/pull/23343 is just > a small clean up. I d

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Xiao Li
Actually, SPARK-26390 https://github.com/apache/spark/pull/23343 is just a small clean up. I do not think it fixes any correctness bugs. I think we should discuss your backport plans one by one with the PR authors and reviewers, since most of them are not closely following the dev list. Xiao On

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Holden Karau
I included 26390 as a candidate since it sounded like it bordered on a correctness/expected behaviour fix (eg columpruning rule doing more than column pruning), but if it’s too big a change happy to drop that one. On Thu, Apr 23, 2020 at 11:43 AM Xiao Li wrote: > Hi, Holden, > > We are trying to

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Xiao Li
Hi, Holden, We are trying to avoid backporting the improvement/cleanup PRs to the maintenance releases, especially the core modules, like Spark Core and SQL. For example, SPARK-26390 is a good example. Xiao On Thu, Apr 23, 2020 at 11:17 AM Holden Karau wrote: > Tentatively I'm planning on this

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Holden Karau
Tentatively I'm planning on this list to start backporting. If no one sees any issues with those I'll start to make backport JIRAs for them for tracking this afternoon. SPARK-26390 ColumnPruning rule should only do column pruning SPARK-25407 Allow nested access for non-existent field fo

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Holden Karau
On Thu, Apr 23, 2020 at 9:07 AM edeesis wrote: > There's other information you can obtain from the Pod metadata on a > describe > than just from the logs, which are typically what's being printed by the > Application itself. Would get pods -w -o yaml do the trick here or is there going to be inf

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread edeesis
There's other information you can obtain from the Pod metadata on a describe than just from the logs, which are typically what's being printed by the Application itself. I've also found that Spark has some trouble obtaining the reason for a K8S executor death (as evident by the spark.kubernetes.ex

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-23 Thread Wenchen Fan
Yea, please report the bug on a supported Spark version like 2.4. On Thu, Apr 23, 2020 at 3:40 PM Dhrubajyoti Hati wrote: > FYI we are using Spark 2.2.0. Should the change be present in this spark > version? Wanted to check before opening a JIRA ticket? > > > > > *Regards,Dhrubajyoti Hati.* > >

Re: Error while reading hive tables with tmp/hidden files inside partitions

2020-04-23 Thread Dhrubajyoti Hati
FYI we are using Spark 2.2.0. Should the change be present in this spark version? Wanted to check before opening a JIRA ticket? *Regards,Dhrubajyoti Hati.* On Thu, Apr 23, 2020 at 10:12 AM Wenchen Fan wrote: > This looks like a bug that path filter doesn't work for hive table > reading. Can