Parquet file + increase read parallelism

SamyaMaiti Mon, 23 Mar 2015 10:12:39 -0700

Hi All,

Suppose I have a parquet file of 100 MB in HDFS & my HDFS block is 64MB, so
I have 2 block of data.


When I do, *sqlContext.parquetFile("path")* followed by an action , two
tasks are stared on two partitions.

My intend is to read this 2 blocks in more partitions to fully utilize my
cluster resources & increase parallelism. 

Is there a way to do so like in case of
sc.textFile("path",*numberOfPartitions*).

Please note, I don't want to do *repartition* as that would result in lot of
shuffle.

Thanks in advance.

Regards,
Sam



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-file-increase-read-parallelism-tp22190.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Parquet file + increase read parallelism

Reply via email to