Hi Adam

Are you familiar with MBR (Minimum Bounding Rectangle) and tessellations 
technics?
You might be able to significantly reduce the number of potentially intersected 
polygons.

Dudu

From: Adam [mailto:work....@gmail.com]
Sent: Sunday, September 04, 2016 4:07 AM
To: user@hive.apache.org
Subject: RE: Beeline throws OOM on large input query

Reply to Stephen Sprague
1) confirm your beeline java process is indeed running with expanded
memory
I used the -XX:+PrintCommandLineFlags which showed:
    -XX:MaxHeapSize=17179869184
confirming the 16g setting.

2)
try the hive-cli (or the python one even.)  or "beeline -u
jdbc:hive2://"
I was using the beeline jdbc connect:
  issuing: !connect jdbc:hive2: ........

3) chop down your 6K points to 3K or something smaller to see just where
the breaking point is
I didn't bother though it would be good information since I found a work around 
and troubleshooting beeline wasn't my primary goal :)

Reply to Markovitz, Dudu
The query is basically finding geometry intersections.
If you are familiar with Postgis, it is a Java version of the Postgis function 
ST_Intersects (http://postgis.net/docs/ST_Intersects.html) wrapped in a Hive 
UDF.

We are checking intersection of a table's geometry column with a set of N 
geometries (6000+ in this case).

select from table
where st_intersects(table.geom, g1) OR st_intersects(table.geom, g2), etc.

Unfortunately doing it with a table join requires a theta condition which Hive 
doesn't support, something like

select from table inner join reftable on st_intersects(table.geom, 
reftable.geom)

I tried pushing down the predicate but that required a cross join which was not 
feasible for the huge table sizes.




Reply via email to