The Dataset is defined as case class with many fields with nested
structure(Map, List of another case class etc.)
The size of the Dataset is only 1T when saving to disk as Parquet file.
But when joining it, the shuffle write size becomes as large as 12T.
Is there a way to cut it down without changi
below is the error messages that seem run infinitely:
16/07/30 23:25:38 DEBUG ProtobufRpcEngine: Call: getApplicationReport took
1ms
16/07/30 23:25:39 DEBUG Client: IPC Client (1735131305) connection to
/10.80.1.168:8032 from zhuotao sending #147247
16/07/30 23:25:39 DEBUG Client: IPC Client (173