Thanks for the explanation Yin.
Justin
On Tue, Apr 7, 2015 at 7:36 PM, Yin Huai wrote:
> I think the slowness is caused by the way that we serialize/deserialize
> the value of a complex type. I have opened
> https://issues.apache.org/jira/browse/SPARK-6759 to track the improvement.
>
> On Tue,
I think the slowness is caused by the way that we serialize/deserialize the
value of a complex type. I have opened
https://issues.apache.org/jira/browse/SPARK-6759 to track the improvement.
On Tue, Apr 7, 2015 at 6:59 PM, Justin Yip wrote:
> The schema has a StructType.
>
> Justin
>
> On Tue, Ap
The schema has a StructType.
Justin
On Tue, Apr 7, 2015 at 6:58 PM, Yin Huai wrote:
> Hi Justin,
>
> Does the schema of your data have any decimal, array, map, or struct type?
>
> Thanks,
>
> Yin
>
> On Tue, Apr 7, 2015 at 6:31 PM, Justin Yip
> wrote:
>
>> Hello,
>>
>> I have a parquet file of
Hi Justin,
Does the schema of your data have any decimal, array, map, or struct type?
Thanks,
Yin
On Tue, Apr 7, 2015 at 6:31 PM, Justin Yip wrote:
> Hello,
>
> I have a parquet file of around 55M rows (~ 1G on disk). Performing simple
> grouping operation is pretty efficient (I get results w
Hello,
I have a parquet file of around 55M rows (~ 1G on disk). Performing simple
grouping operation is pretty efficient (I get results within 10 seconds).
However, after called DataFrame.cache, I observe a significant performance
degrade, the same operation now takes 3+ minutes.
My hunch is that