It does generalize types, but only on the intersection of the columns it seems.
There might be a way to get the union of the columns too using HiveQL. Types
generalize up with string being the "most general".
Matei
> On Nov 1, 2014, at 6:22 PM, Daniel Mahler wrote:
>
> Thanks Matei. What does
Thanks Matei. What does unionAll do if the input RDD schemas are not 100%
compatible. Does it take the union of the columns and generalize the types?
thanks
Daniel
On Sat, Nov 1, 2014 at 6:08 PM, Matei Zaharia
wrote:
> Try unionAll, which is a special method on SchemaRDDs that keeps the
> schem
Try unionAll, which is a special method on SchemaRDDs that keeps the schema on
the results.
Matei
> On Nov 1, 2014, at 3:57 PM, Daniel Mahler wrote:
>
> I would like to combine 2 parquet tables I have create.
> I tried:
>
> sc.union(sqx.parquetFile("fileA"), sqx.parquetFile("fileB"))
>
I would like to combine 2 parquet tables I have create.
I tried:
sc.union(sqx.parquetFile("fileA"), sqx.parquetFile("fileB"))
but that just returns RDD[Row].
How do I combine them to get a SchemaRDD[Row]?
thanks
Daniel
>
> * unionAll preserve duplicate v/s union that does not
>
This is true, if you want to eliminate duplicate items you should follow
the union with a distinct()
> * SQL union and unionAll result in same output format i.e. another SQL v/s
> different RDD types here.
>
* Understand the existing un
Hi Aaron,
unionAll is a workaround ...
* unionAll preserve duplicate v/s union that does not
* SQL union and unionAll result in same output format i.e. another SQL v/s
different RDD types here.
* Understand the existing union contract issue. This may be a class
hierarchy discussion for SchemaRDD,
Looks like there is a "unionAll" function on SchemaRDD which will do what
you want. The contract of RDD#union is unfortunately too general to allow
it to return a SchemaRDD without downcasting.
On Sun, Mar 30, 2014 at 7:56 AM, Manoj Samel wrote:
> Hi,
>
> I am trying SparkSQL based on the exampl
Hi,
I am trying SparkSQL based on the example on doc ...
val people =
sc.textFile("/data/spark/examples/src/main/resources/people.txt").map(_.split(",")).map(p
=> Person(p(0), p(1).trim.toInt))
val olderThanTeans = people.where('age > 19)
val youngerThanTeans = people.where('age < 13)
val