e
>>>> Then run simple spark job to read and partition based on 'n'.
>>>>
>>>> Hichame
>>>>
>>>> *From:* felixcheun...@hotmail.com
>>>> *Sent:* January 19, 2019 2:06 PM
>>>> *To:* 28shivamsha...@gmail.com; use
rtition, n= (size of your dataset)/hdfs block
>>> size
>>> Then run simple spark job to read and partition based on 'n'.
>>>
>>> Hichame
>>>
>>> *From:* felixcheun...@hotmail.com
>>> *Sent:* January 19, 2019 2:06 PM
>>> *T
e spark job to read and partition based on 'n'.
>>
>> Hichame
>>
>> *From:* felixcheun...@hotmail.com
>> *Sent:* January 19, 2019 2:06 PM
>> *To:* 28shivamsha...@gmail.com; user@spark.apache.org
>> *Subject:* Re: Persist Dataframe to HDFS considering H
19 2:06 PM
To: 28shivamsha...@gmail.com; user@spark.apache.org
Subject: Re: Persist Dataframe to HDFS considering HDFS Block Size.
You can call coalesce to combine partitions..
From: Shivam Sharma <28shivamsha...@gmail.com>
Sent: Saturday, January 19, 2019
You can call coalesce to combine partitions..
From: Shivam Sharma <28shivamsha...@gmail.com>
Sent: Saturday, January 19, 2019 7:43 AM
To: user@spark.apache.org
Subject: Persist Dataframe to HDFS considering HDFS Block Size.
Hi All,
I wanted to persist dat
Hi All,
I wanted to persist dataframe on HDFS. Basically, I am inserting data into
a HIVE table using Spark. Currently, at the time of writing to HIVE table I
have set total shuffle partitions = 400 so total 400 files are being
created which is not even considering HDFS block size. How can I tell