For the second question, I would say it is mainly because the projects have not the same aim. Impala does have a "cost-based optimizer and predicate propagation capability" which is natural because it is interpreting pseudo-SQL query. In the realm of relational database, it is often not a good idea to compete against the optimizer, it is of course also true for 'BigData'.
Bertrand On Sun, Jun 22, 2014 at 1:32 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi folks, > I was looking at the benchmark provided by Cloudera at > http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ > . > Is it real that Shark cannot execute some query if you don't have enough > memory? > And is it true/reliable that Impala overcome so much Spark when executing > complex queries? > > Best, > Flavio >