Re: Do we need schema for Parquet files with Spark?

ashokkumar rajendran Fri, 04 Mar 2016 00:22:06 -0800

Thanks for the clarification Xinh.



On Fri, Mar 4, 2016 at 12:30 PM, Xinh Huynh <xinh.hu...@gmail.com> wrote:

> Hi Ashok,
>
> On the Spark SQL side, when you create a dataframe, it will have a schema
> (each column has a type such as Int or String). Then when you save that
> dataframe as parquet format, Spark translates the dataframe schema into
> Parquet data types. (See spark.sql.execution.datasources.parquet.) Then
> Parquet does the dictionary encoding automatically (if applicable) based on
> the data values; this encoding is not specified by the user. Parquet
> figures out the right encoding to use for you.
>
> Xinh
>
> > On Mar 3, 2016, at 7:32 PM, ashokkumar rajendran <
> ashokkumar.rajend...@gmail.com> wrote:
> >
> > Hi,
> >
> > I am exploring to use Apache Parquet with Spark SQL in our project. I
> notice that Apache Parquet uses different encoding for different columns.
> The dictionary encoding in Parquet will be one of the good ones for our
> performance. I do not see much documentation in Spark or Parquet on how to
> configure this. For example, how would Parquet know dictionary of words if
> there is no schema provided by user? Where/how to specify my schema /
> config for Parquet format?
> >
> > Could not find Apache Parquet mailing list in the official site. It
> would be great if anyone could share it as well.
> >
> > Regards
> > Ashok
>

Re: Do we need schema for Parquet files with Spark?

Reply via email to