I agree with Yong Zhang,
perhaps spark sql with hive could solve the problem:

http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables




On Thu, Feb 16, 2017 at 12:42 AM, Yong Zhang <java8...@hotmail.com> wrote:

> If it works under hive, do you try just create the DF from Hive table
> directly in Spark? That should work, right?
>
>
> Yong
>
>
> ------------------------------
> *From:* Begar, Veena <veena.be...@hpe.com>
> *Sent:* Wednesday, February 15, 2017 10:16 AM
> *To:* Yong Zhang; smartzjp; user@spark.apache.org
>
> *Subject:* RE: How to specify default value for StructField?
>
>
> Thanks Yong.
>
>
>
> I know about merging the schema option.
>
> Using Hive we can read AVRO files having different schemas. And also we
> can do the same in Spark also.
>
> Similarly we can read ORC files having different schemas in Hive. But, we
> can’t do the same in Spark using dataframe. How we can do it using
> dataframe?
>
>
>
> Thanks.
>
> *From:* Yong Zhang [mailto:java8...@hotmail.com]
> *Sent:* Tuesday, February 14, 2017 8:31 PM
> *To:* Begar, Veena <veena.be...@hpe.com>; smartzjp <zjp_j...@163.com>;
> user@spark.apache.org
> *Subject:* Re: How to specify default value for StructField?
>
>
>
> You maybe are looking for something like "spark.sql.parquet.mergeSchema"
> for ORC. Unfortunately, I don't think it is available, unless someone tells
> me I am wrong.
>
>
> You can create a JIRA to request this feature, but we all know that
> Parquet is the first citizen format [image: 😊]
>
>
>
> Yong
>
>
> ------------------------------
>
> *From:* Begar, Veena <veena.be...@hpe.com>
> *Sent:* Tuesday, February 14, 2017 10:37 AM
> *To:* smartzjp; user@spark.apache.org
> *Subject:* RE: How to specify default value for StructField?
>
>
>
> Thanks, it didn't work. Because, the folder has files from 2 different
> schemas.
> It fails with the following exception:
> org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input
> columns: [f1];
>
>
> -----Original Message-----
> From: smartzjp [mailto:zjp_j...@163.com <zjp_j...@163.com>]
> Sent: Tuesday, February 14, 2017 10:32 AM
> To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org
> Subject: Re: How to specify default value for StructField?
>
> You can try the below code.
>
> val df = spark.read.format("orc").load("/user/hos/orc_files_test_
> together")
> df.select(“f1”,”f2”).show
>
>
>
>
>
> 在 2017/2/14 
> 上午6:54,“vbegar”<user-return-67879-zjp_jdev=163....@spark.apache.org
> 代表 veena.be...@hpe.com
> <user-return-67879-zjp_jdev=163....@spark.apache.org%20代表%20veena.be...@hpe.com>>
> 写入:
>
> >Hello,
> >
> >I specified a StructType like this:
> >
> >*val mySchema = StructType(Array(StructField("f1", StringType,
> >true),StructField("f2", StringType, true)))*
> >
> >I have many ORC files stored in HDFS location:*
> >/user/hos/orc_files_test_together
> >*
> >
> >These files use different schema : some of them have only f1 columns
> >and other have both f1 and f2 columns.
> >
> >I read the data from these files to a dataframe:
> >*val df =
> >spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
> >t_together")*
> >
> >But, now when I give the following command to see the data, it fails:
> >*df.show*
> >
> >The error message is like "f2" comun doesn't exist.
> >
> >Since I have specified nullable attribute as true for f2 column, why it
> >fails?
> >
> >Or, is there any way to specify default vaule for StructField?
> >
> >Because, in AVRO schema, we can specify the default value in this way
> >and can read AVRO files in a folder which have 2 different schemas
> >(either only
> >f1 column or both f1 and f2 columns):
> >
> >*{
> >   "type": "record",
> >   "name": "myrecord",
> >   "fields":
> >   [
> >      {
> >         "name": "f1",
> >         "type": "string",
> >         "default": ""
> >      },
> >      {
> >         "name": "f2",
> >         "type": "string",
> >         "default": ""
> >      }
> >   ]
> >}*
> >
> >Wondering why it doesn't work with ORC files.
> >
> >thanks.
> >
> >
> >
> >--
> >View this message in context:
> >http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
> >ult-value-for-StructField-tp28386.html
> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> >---------------------------------------------------------------------
> >To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>

Reply via email to