Re: union of SchemaRDDs

2014-11-01 Thread Matei Zaharia
It does generalize types, but only on the intersection of the columns it seems. There might be a way to get the union of the columns too using HiveQL. Types generalize up with string being the "most general". Matei > On Nov 1, 2014, at 6:22 PM, Daniel Mahler wrote: > > Thanks Matei. What does

Re: union of SchemaRDDs

2014-11-01 Thread Daniel Mahler
Thanks Matei. What does unionAll do if the input RDD schemas are not 100% compatible. Does it take the union of the columns and generalize the types? thanks Daniel On Sat, Nov 1, 2014 at 6:08 PM, Matei Zaharia wrote: > Try unionAll, which is a special method on SchemaRDDs that keeps the > schem

Re: union of SchemaRDDs

2014-11-01 Thread Matei Zaharia
Try unionAll, which is a special method on SchemaRDDs that keeps the schema on the results. Matei > On Nov 1, 2014, at 3:57 PM, Daniel Mahler wrote: > > I would like to combine 2 parquet tables I have create. > I tried: > > sc.union(sqx.parquetFile("fileA"), sqx.parquetFile("fileB")) >

union of SchemaRDDs

2014-11-01 Thread Daniel Mahler
I would like to combine 2 parquet tables I have create. I tried: sc.union(sqx.parquetFile("fileA"), sqx.parquetFile("fileB")) but that just returns RDD[Row]. How do I combine them to get a SchemaRDD[Row]? thanks Daniel

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-31 Thread Michael Armbrust
> > * unionAll preserve duplicate v/s union that does not > This is true, if you want to eliminate duplicate items you should follow the union with a distinct() > * SQL union and unionAll result in same output format i.e. another SQL v/s > different RDD types here. > * Understand the existing un

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-30 Thread Manoj Samel
Hi Aaron, unionAll is a workaround ... * unionAll preserve duplicate v/s union that does not * SQL union and unionAll result in same output format i.e. another SQL v/s different RDD types here. * Understand the existing union contract issue. This may be a class hierarchy discussion for SchemaRDD,

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-30 Thread Aaron Davidson
Looks like there is a "unionAll" function on SchemaRDD which will do what you want. The contract of RDD#union is unfortunately too general to allow it to return a SchemaRDD without downcasting. On Sun, Mar 30, 2014 at 7:56 AM, Manoj Samel wrote: > Hi, > > I am trying SparkSQL based on the exampl

Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-30 Thread Manoj Samel
Hi, I am trying SparkSQL based on the example on doc ... val people = sc.textFile("/data/spark/examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)) val olderThanTeans = people.where('age > 19) val youngerThanTeans = people.where('age < 13) val