Re: Time to cut an Apache 2.4.1 release?

2019-02-14 Thread Sean Owen
(That may be so, but it may still be correct to revert a change in Spark if necessary to not be exposed to it in the short term. I have no idea whether that's the right thing here or not, just answering the point about why we'd care about a bug in another project. Also, not clear which Hive release

Re: Time to cut an Apache 2.4.1 release?

2019-02-14 Thread Darcy Shen
Well, it is not a bug for Spark 2.4 but a bug for Hive 2.1.1 . My colleague will report it on the Spark JIRA later. Presto works fine when reading the ORC table created by Spark 2.4. We've decided to fix it in Hive 2.1.1 . Since Hive 2.1.1 is widely used, I suggest that we should keep a goo

Re: Time to cut an Apache 2.4.1 release?

2019-02-14 Thread Wenchen Fan
Do you know which bug ORC 1.5.2 introduced? Or is it because Hive uses a legacy version of ORC which has a bug? On Thu, Feb 14, 2019 at 2:35 PM Darcy Shen wrote: > > We found that ORC table created by Spark 2.4 failed to be read by Hive > 2.1.1. > > > spark-sql -e 'CREATE TABLE tmp.orcTable2 USI

Re: Vectorized R gapply[Collect]() implementation

2019-02-14 Thread Hyukjin Kwon
Thanks guys <3. FYI, I made a PR for collect and vectorized dapply too. Given my tests, it boosts up the speed 1500%+, and 4600%+ each. https://github.com/apache/spark/pull/23760 https://github.com/apache/spark/pull/23787 2019년 2월 11일 (월) 오전 4:45, Felix Cheung 님이 작성: > This is super awesome! >