Hi Gopal,
Thanks for the informative answer, but my question was around difference in
the processing of Spark SQL and Hive. Right now I am not trying to
optimizing either. I totally agree that Hive can perform much better than
the number I got.
I was just wondering, even though both systems would
On 1/22/15, 4:36 PM, chandra Reddy Bogala wrote:
My question is related to GZIP files. I am sure single GZIP file is a
anti pattern. Is small zip files (20 to 50 mb) also anti pattern. The
reason I am asking this question is, my application collectors generate
gzip files of that size. So
Hi Gopal,
My question is related to GZIP files. I am sure single GZIP file is a
anti pattern. Is small zip files (20 to 50 mb) also anti pattern. The
reason I am asking this question is, my application collectors generate
gzip files of that size. So I copy those to HDFS and add as a partition
On 1/22/15, 3:03 AM, Saumitra Shahapure (Vizury) wrote:
We were comparing performance of some of our production hive queries
between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both
Spark 0.9 and 1.1. We could see that the performance gains have been good
in Spark.
Is there an
I'm not answering your question but, could you give me more insight on where
and how do you use spark? I know that spark has in memory capabilities.
Also, I have a similar question on ways to optimize hive queries and file
storage. Which is better Orc vs parquet along with when to use compressi
Hello,
We were comparing performance of some of our production hive queries
between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both
Spark 0.9 and 1.1. We could see that the performance gains have been good
in Spark.
We tried a very simple query,
select count(*) from T where col