Hi,
How many files do you read ? Are they splittable ?
If you have 4 files non splittable, your dataset would have 4 partitions
and you will only see one task per partition handle by on executor
Regards,
Arnaud
On Tue, May 28, 2019 at 10:06 AM Sachit Murarka
wrote:
> Hi All,
>
> I am using spa
Hi Shivam,
At the end, the file is taking its own space regardless of the block size.
So if you're file is just a few ko bytes, it will take only this few ko
bytes.
But I've noticed that when the file is written, somehow a block is
allocated and the Namenode consider that all the block size is use
Hi Vladimir,
I've try to do the same here when I attempted to write a Spark connector
for remote file.
>From my point of view, There was a lot of change in the V2 API => Better
semantic at least !
I understood that only continuous streaming use datasourceV2 (Not sure if
im correct). But for file
Hi,
Indeed Spark use spark.sql.autoBroadcastJoinThreshold to choose if it
autobroadcasts a dataset or not. Default value are 10 mb.
You may execute an explain and check the different plans and see if the
broadcasthashjoins are being used. You may change accordingly. There is no
use to increase too