Yes. From my side, it's -1 for RC3.
Bests,
Dongjoon.
On Sat, Oct 13, 2018 at 1:24 PM Holden Karau wrote:
> So if it's a blocker would you think this should be a -1?
>
> On Fri, Oct 12, 2018 at 3:52 PM Dongjoon Hyun
> wrote:
>
>> Hi, Holden.
>>
>> Since that's a performance at 2.4.0, I marked a
So if it's a blocker would you think this should be a -1?
On Fri, Oct 12, 2018 at 3:52 PM Dongjoon Hyun
wrote:
> Hi, Holden.
>
> Since that's a performance at 2.4.0, I marked as `Blocker` four days ago.
>
> Bests,
> Dongjoon.
>
>
> On Fri, Oct 12, 2018 at 11:45 AM Holden Karau
> wrote:
>
>> Fol
we have a collection of programs in dataframe api that all do big shuffles
for which we use 2048+ partitions. this works fine but it produces a lot of
(small) output files, which put pressure on the memory of the drivers
programs of any spark program that reads this data in again.
so one of our de
I've tried the same sample with DataFrame API and it's much more
stable although it's backed by RDD API.
This sample works without any issues and any additional Spark tuning
val rdd = sc.sequenceFile("/tmp/random-strings", classOf[Text], classOf[Text])
val df = rdd.map(item => item._1.toString ->
Hi,
I use HCatalog Streaming Mutation API to write data to hive transactional
table, and then, I use SparkSQL to read data from the hive transactional table.
I get the right result.
However, SparkSQL uses more time to read hive orc bucket transactional table,
beacause SparkSQL read all columns(