Yea - I'd just add a bunch of columns. Doesn't seem like that big of a deal.
On Wed, Jul 15, 2015 at 10:53 AM, RJ Nowling wrote:
> I'm considering a few approaches -- one of which is to provide new
> functions like mapLeft, mapRight, filterLeft, etc.
>
> But this all falls shorts with DataFrame
I'm considering a few approaches -- one of which is to provide new
functions like mapLeft, mapRight, filterLeft, etc.
But this all falls shorts with DataFrames. RDDs can easily be extended
from RDD[T] to RDD[Record[T]]. I guess with DataFrames, I could add
special columns?
On Wed, Jul 15, 2015
How about just using two fields, one boolean field to mark good/bad, and
another to get the source file?
On Wed, Jul 15, 2015 at 10:31 AM, RJ Nowling wrote:
> Hi all,
>
> I'm working on an ETL task with Spark. As part of this work, I'd like to
> mark records with some info such as:
>
> 1. Whet
Hi all,
I'm working on an ETL task with Spark. As part of this work, I'd like to
mark records with some info such as:
1. Whether the record is good or bad (e.g, Either)
2. Originating file and lines
Part of my motivation is to prevent errors with individual records from
stopping the entire pipe