Take this simple join: SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON solr.movie_id = m.movie_id ORDER BY aggCount DESC
I would like the ability to push the inner sub-query aliased as "solr" down into the data source engine, in this case Solr as it will greatlly reduce the amount of data that has to be transferred from Solr into Spark. I would imagine this issue comes up frequently if the underlying engine is a JDBC data source as well ... Is this possible? Of course, my example is a bit cherry-picked so determining if a sub-query can be pushed down into the data source engine is probably not a trivial task, but I'm wondering if Spark has the hooks to allow me to try ;-) Cheers, Tim --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org