I agree with Yong Zhang, perhaps spark sql with hive could solve the problem:
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <java8...@hotmail.com> wrote: > If it works under hive, do you try just create the DF from Hive table > directly in Spark? That should work, right? > > > Yong > > > ------------------------------ > *From:* Begar, Veena <veena.be...@hpe.com> > *Sent:* Wednesday, February 15, 2017 10:16 AM > *To:* Yong Zhang; smartzjp; user@spark.apache.org > > *Subject:* RE: How to specify default value for StructField? > > > Thanks Yong. > > > > I know about merging the schema option. > > Using Hive we can read AVRO files having different schemas. And also we > can do the same in Spark also. > > Similarly we can read ORC files having different schemas in Hive. But, we > can’t do the same in Spark using dataframe. How we can do it using > dataframe? > > > > Thanks. > > *From:* Yong Zhang [mailto:java8...@hotmail.com] > *Sent:* Tuesday, February 14, 2017 8:31 PM > *To:* Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>; > user@spark.apache.org > *Subject:* Re: How to specify default value for StructField? > > > > You maybe are looking for something like "spark.sql.parquet.mergeSchema" > for ORC. Unfortunately, I don't think it is available, unless someone tells > me I am wrong. > > > You can create a JIRA to request this feature, but we all know that > Parquet is the first citizen format [image: 😊] > > > > Yong > > > ------------------------------ > > *From:* Begar, Veena <veena.be...@hpe.com> > *Sent:* Tuesday, February 14, 2017 10:37 AM > *To:* smartzjp; user@spark.apache.org > *Subject:* RE: How to specify default value for StructField? > > > > Thanks, it didn't work. Because, the folder has files from 2 different > schemas. > It fails with the following exception: > org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input > columns: [f1]; > > > -----Original Message----- > From: smartzjp [mailto:zjp_j...@163.com <zjp_j...@163.com>] > Sent: Tuesday, February 14, 2017 10:32 AM > To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org > Subject: Re: How to specify default value for StructField? > > You can try the below code. > > val df = spark.read.format("orc").load("/user/hos/orc_files_test_ > together") > df.select(“f1”,”f2”).show > > > > > > 在 2017/2/14 > 上午6:54,“vbegar”<user-return-67879-zjp_jdev=163....@spark.apache.org > 代表 veena.be...@hpe.com > <user-return-67879-zjp_jdev=163....@spark.apache.org%20代表%20veena.be...@hpe.com>> > 写入: > > >Hello, > > > >I specified a StructType like this: > > > >*val mySchema = StructType(Array(StructField("f1", StringType, > >true),StructField("f2", StringType, true)))* > > > >I have many ORC files stored in HDFS location:* > >/user/hos/orc_files_test_together > >* > > > >These files use different schema : some of them have only f1 columns > >and other have both f1 and f2 columns. > > > >I read the data from these files to a dataframe: > >*val df = > >spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes > >t_together")* > > > >But, now when I give the following command to see the data, it fails: > >*df.show* > > > >The error message is like "f2" comun doesn't exist. > > > >Since I have specified nullable attribute as true for f2 column, why it > >fails? > > > >Or, is there any way to specify default vaule for StructField? > > > >Because, in AVRO schema, we can specify the default value in this way > >and can read AVRO files in a folder which have 2 different schemas > >(either only > >f1 column or both f1 and f2 columns): > > > >*{ > > "type": "record", > > "name": "myrecord", > > "fields": > > [ > > { > > "name": "f1", > > "type": "string", > > "default": "" > > }, > > { > > "name": "f2", > > "type": "string", > > "default": "" > > } > > ] > >}* > > > >Wondering why it doesn't work with ORC files. > > > >thanks. > > > > > > > >-- > >View this message in context: > >http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa > >ult-value-for-StructField-tp28386.html > >Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > >--------------------------------------------------------------------- > >To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > >