Re: Lazy casting with Catalyst

Cheng Lian Sat, 28 Mar 2015 09:37:37 -0700


On 3/29/15 12:26 AM, Patrick Woody wrote:

Hey Cheng,
I didn't meant that catalyst casting was eager, just that myapproaches thus far seem to have been. Maybe I should give a concreteexample?
I have columns A, B, C where B is saved as a String but I'd like allreferences to B to go through a Cast to decimal regardless of the codeused on the SchemaRDD. So if someone does a min(B) it uses Decimalordering instead of String.
One approach that I had taken was to do a select of everything withthe casts on certain columns, but then when I did a count(literal(1))on top of that RDD it seemed to bring in the whole row.

What version of Spark SQL are you using? Would you mind to provide abrief snippet that can reproduce this issue? This might be a bugdepending on your concrete usage. Thanks in advance!


Thanks!
-Pat

On Sat, Mar 28, 2015 at 11:35 AM, Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


    Hi Pat,

    I don't understand what "lazy casting" mean here. Why do you think
    current Catalyst casting is "eager"? Casting happens at runtime,
    and doesn't disable column pruning.

    Cheng


    On 3/28/15 11:26 PM, Patrick Woody wrote:

        Hi all,

        In my application, we take input from Parquet files where
        BigDecimals are
        written as Strings to maintain arbitrary precision.

        I was hoping to convert these back over to Decimal with Unlimited
        precision, but I'd still like to maintain the Parquet column
        pruning (all
        my attempts thus far seem to bring in the whole Row). Is it
        possible to do
        this lazily through catalyst?

        Basically I'd want to do Cast(col, DecimalType()) whenever col
        is actually
        referenced. Any tips on how to approach this would be appreciated.

        Thanks!
        -Pat

Re: Lazy casting with Catalyst

Reply via email to