Re: Spark SQL step with many tasks takes a long time to begin processing

2016-02-17 Thread Teng Qiu
From:* Teng Qiu [mailto:teng...@gmail.com] > *Sent:* Tuesday, February 16, 2016 12:11 PM > *To:* Dukek, Dillon > *Cc:* user@spark.apache.org > *Subject:* Re: Spark SQL step with many tasks takes a long time to begin > processing > > > > i believe this is a known issue for using sp

RE: Spark SQL step with many tasks takes a long time to begin processing

2016-02-16 Thread Dukek, Dillon
360-316-9309 Email: dillon.du...@t-mobile.com From: Teng Qiu [mailto:teng...@gmail.com] Sent: Tuesday, February 16, 2016 12:11 PM To: Dukek, Dillon Cc: user@spark.apache.org Subject: Re: Spark SQL step with many tasks takes a long time to begin processing i believe this is a known issue for u

Re: Spark SQL step with many tasks takes a long time to begin processing

2016-02-16 Thread Teng Qiu
i believe this is a known issue for using spark/hive with files on s3, this huge delay on driver side is caused by partition listing and split computation, and it is more like a issue by hive, since you are using thrift server, the sql queries are running in HiveContext. qubole made some optimizat