Re: hdfs read performance issue

2014-08-20 Thread Gurvinder Singh
I got some time to look in to it. It appears as that Spark (latest git) is doing this operation much more often compare to Aug 1 version. Here is the log from operation I am referring to 14/08/19 12:37:26 INFO spark.CacheManager: Partition rdd_8_414 not found, computing it 14/08/19 12:37:26 INFO r

read performance issue

2014-08-14 Thread Gurvinder Singh
Hi, I am running spark from the git directly. I recently compiled the newer version Aug 13 version and it has performance drop of 2-3x in read from HDFS compare to git version of Aug 1. So I am wondering which commit would have cause such an issue in read performance. The performance is almost sam