On 3/29/15 12:26 AM, Patrick Woody wrote:
Hey Cheng,

I didn't meant that catalyst casting was eager, just that my approaches thus far seem to have been. Maybe I should give a concrete example?

I have columns A, B, C where B is saved as a String but I'd like all references to B to go through a Cast to decimal regardless of the code used on the SchemaRDD. So if someone does a min(B) it uses Decimal ordering instead of String.

One approach that I had taken was to do a select of everything with the casts on certain columns, but then when I did a count(literal(1)) on top of that RDD it seemed to bring in the whole row.
What version of Spark SQL are you using? Would you mind to provide a brief snippet that can reproduce this issue? This might be a bug depending on your concrete usage. Thanks in advance!

Thanks!
-Pat

On Sat, Mar 28, 2015 at 11:35 AM, Cheng Lian <lian.cs....@gmail.com <mailto:lian.cs....@gmail.com>> wrote:

    Hi Pat,

    I don't understand what "lazy casting" mean here. Why do you think
    current Catalyst casting is "eager"? Casting happens at runtime,
    and doesn't disable column pruning.

    Cheng


    On 3/28/15 11:26 PM, Patrick Woody wrote:

        Hi all,

        In my application, we take input from Parquet files where
        BigDecimals are
        written as Strings to maintain arbitrary precision.

        I was hoping to convert these back over to Decimal with Unlimited
        precision, but I'd still like to maintain the Parquet column
        pruning (all
        my attempts thus far seem to bring in the whole Row). Is it
        possible to do
        this lazily through catalyst?

        Basically I'd want to do Cast(col, DecimalType()) whenever col
        is actually
        referenced. Any tips on how to approach this would be appreciated.

        Thanks!
        -Pat




Reply via email to