Hi Adam Are you familiar with MBR (Minimum Bounding Rectangle) and tessellations technics? You might be able to significantly reduce the number of potentially intersected polygons.
Dudu From: Adam [mailto:work....@gmail.com] Sent: Sunday, September 04, 2016 4:07 AM To: user@hive.apache.org Subject: RE: Beeline throws OOM on large input query Reply to Stephen Sprague 1) confirm your beeline java process is indeed running with expanded memory I used the -XX:+PrintCommandLineFlags which showed: -XX:MaxHeapSize=17179869184 confirming the 16g setting. 2) try the hive-cli (or the python one even.) or "beeline -u jdbc:hive2://" I was using the beeline jdbc connect: issuing: !connect jdbc:hive2: ........ 3) chop down your 6K points to 3K or something smaller to see just where the breaking point is I didn't bother though it would be good information since I found a work around and troubleshooting beeline wasn't my primary goal :) Reply to Markovitz, Dudu The query is basically finding geometry intersections. If you are familiar with Postgis, it is a Java version of the Postgis function ST_Intersects (http://postgis.net/docs/ST_Intersects.html) wrapped in a Hive UDF. We are checking intersection of a table's geometry column with a set of N geometries (6000+ in this case). select from table where st_intersects(table.geom, g1) OR st_intersects(table.geom, g2), etc. Unfortunately doing it with a table join requires a theta condition which Hive doesn't support, something like select from table inner join reftable on st_intersects(table.geom, reftable.geom) I tried pushing down the predicate but that required a cross join which was not feasible for the huge table sizes.