On 3/29/15 12:26 AM, Patrick Woody wrote:
Hey Cheng,
I didn't meant that catalyst casting was eager, just that my
approaches thus far seem to have been. Maybe I should give a concrete
example?
I have columns A, B, C where B is saved as a String but I'd like all
references to B to go through a Cast to decimal regardless of the code
used on the SchemaRDD. So if someone does a min(B) it uses Decimal
ordering instead of String.
One approach that I had taken was to do a select of everything with
the casts on certain columns, but then when I did a count(literal(1))
on top of that RDD it seemed to bring in the whole row.
What version of Spark SQL are you using? Would you mind to provide a
brief snippet that can reproduce this issue? This might be a bug
depending on your concrete usage. Thanks in advance!
Thanks!
-Pat
On Sat, Mar 28, 2015 at 11:35 AM, Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
Hi Pat,
I don't understand what "lazy casting" mean here. Why do you think
current Catalyst casting is "eager"? Casting happens at runtime,
and doesn't disable column pruning.
Cheng
On 3/28/15 11:26 PM, Patrick Woody wrote:
Hi all,
In my application, we take input from Parquet files where
BigDecimals are
written as Strings to maintain arbitrary precision.
I was hoping to convert these back over to Decimal with Unlimited
precision, but I'd still like to maintain the Parquet column
pruning (all
my attempts thus far seem to bring in the whole Row). Is it
possible to do
this lazily through catalyst?
Basically I'd want to do Cast(col, DecimalType()) whenever col
is actually
referenced. Any tips on how to approach this would be appreciated.
Thanks!
-Pat