发件人:"zhangliyun"
> 发送日期:2019-12-03 05:56:55
> 收件人:"Wenchen Fan"
> 主题:Re:Re: A question about radd bytes size
>
> Hi Fan:
>thanks for reply, I agree that the how the data is stored decides the
> total bytes of the table file.
> In my experiment, I fou
转发邮件信息
发件人:"zhangliyun"
发送日期:2019-12-03 05:56:55
收件人:"Wenchen Fan"
主题:Re:Re: A question about radd bytes size
Hi Fan:
thanks for reply, I agree that the how the data is stored decides the total
bytes of the table file.
In my exper
When we talk about bytes size, we need to specify how the data is stored.
For example, if we cache the dataframe, then the bytes size is the number
of bytes of the binary format of the table cache. If we write to hive
tables, then the bytes size is the total size of the data files of the
table.
On
Hi:
I want to get the total bytes of a DataFrame by following function , but when
I insert the DataFrame into hive , I found the value of the function is
different from spark.sql.statistics.totalSize . The
spark.sql.statistics.totalSize is less than the result of following function
getRDDB