Re: Parquet problems

Cheng Lian Wed, 22 Jul 2015 08:30:36 -0700

How many columns are there in these Parquet files? Could you load asmall portion of the original large dataset successfully?


Cheng


On 6/25/15 5:52 PM, Anders Arpteg wrote:

Yes, both the driver and the executors. Works a little bit better withmore space, but still a leak that will cause failure after a number ofreads. There are about 700 different data sources that needs to beloaded, lots of data...

tor 25 jun 2015 08:02 Sabarish Sasidharan<[email protected]<mailto:[email protected]>> skrev:


    Did you try increasing the perm gen for the driver?

    Regards
    Sab

    On 24-Jun-2015 4:40 pm, "Anders Arpteg" <[email protected]
    <mailto:[email protected]>> wrote:

        When reading large (and many) datasets with the Spark 1.4.0
        DataFrames parquet reader (the org.apache.spark.sql.parquet
        format), the following exceptions are thrown:

        Exception in thread "sk-result-getter-0"
        Exception: java.lang.OutOfMemoryError thrown from the
        UncaughtExceptionHandler in thread "task-result-getter-0"
        Exception in thread "task-result-getter-3"
        java.lang.OutOfMemoryError: PermGen space
        Exception in thread "task-result-getter-1"
        java.lang.OutOfMemoryError: PermGen space
        Exception in thread "task-result-getter-2"
        java.lang.OutOfMemoryError: PermGen space

        and many more like these from different threads. I've tried
        increasing the PermGen space using the -XX:MaxPermSize VM
        setting, but even after tripling the space, the same errors
        occur. I've also tried storing intermediate results, and am
        able to get the full job completed by running it multiple
        times and starting for the last successful intermediate
        result. There seems to be some memory leak in the parquet
        format. Any hints on how to fix this problem?

        Thanks,
        Anders

Re: Parquet problems

Reply via email to