subject:"Creating RDD from only few columns of a Parquet file"

Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Reynold Xin

What query did you run? Parquet should have predicate and column pushdown, i.e. if your query only needs to read 3 columns, then only 3 will be read. On Mon, Jan 12, 2015 at 10:20 PM, Ajay Srivastava < a_k_srivast...@yahoo.com.invalid> wrote: > Hi, > I am trying to read a parquet file using - > >

Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Ajay Srivastava

Setting spark.sql.hive.convertMetastoreParquet to true has fixed this. Regards,Ajay On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava wrote: Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile("people.parquet") There is no way to specify that

Creating RDD from only few columns of a Parquet file

2015-01-12 Thread Ajay Srivastava

Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile("people.parquet") There is no way to specify that I am interested in reading only some columns from disk. For example, If the parquet file has 10 columns and want to read only 3 columns from disk. We have don