Thanks for the clarification Xinh.
On Fri, Mar 4, 2016 at 12:30 PM, Xinh Huynh <xinh.hu...@gmail.com> wrote: > Hi Ashok, > > On the Spark SQL side, when you create a dataframe, it will have a schema > (each column has a type such as Int or String). Then when you save that > dataframe as parquet format, Spark translates the dataframe schema into > Parquet data types. (See spark.sql.execution.datasources.parquet.) Then > Parquet does the dictionary encoding automatically (if applicable) based on > the data values; this encoding is not specified by the user. Parquet > figures out the right encoding to use for you. > > Xinh > > > On Mar 3, 2016, at 7:32 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > > > > Hi, > > > > I am exploring to use Apache Parquet with Spark SQL in our project. I > notice that Apache Parquet uses different encoding for different columns. > The dictionary encoding in Parquet will be one of the good ones for our > performance. I do not see much documentation in Spark or Parquet on how to > configure this. For example, how would Parquet know dictionary of words if > there is no schema provided by user? Where/how to specify my schema / > config for Parquet format? > > > > Could not find Apache Parquet mailing list in the official site. It > would be great if anyone could share it as well. > > > > Regards > > Ashok >