[ 
https://issues.apache.org/jira/browse/IMPALA-13851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13851:
-------------------------------------
    Description: 
It is a common (and currently the most efficient) way to store point data as 
double x/double y pairs instead of a single geometry (BINARY) column.

For predicates where these points must intersect a complex geometry it can be a 
useful optimization to prefilter x/y columns directly with the bounding rect of 
the complex geometry in addittiton to running the st_ predicate. An example:

{code}
WHERE st_contains(<const_geometry>, st_point(x,y)) 
{code}
can be rewritten as:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
AND x<=st_xmax(<const_geometry>)  AND  y<=st_ymax(<const_geometry>) 
{code}

This has two benefits:
1. the planner will move the >= <= predicates before st_contains which allows 
avoiding the expensive per row st_contains() call for points that failed the 
bounding box check  
2. >=/<= predicates can be pushed down in more cases, for example for Parquet 
min/max stat filtering or Iceberg min/max stat filtering - if the files/pages 
have limited bounding boxes then this can save IO.

An expression rewrite rule can be added that does this automatically. 

  was:
It is a common (and currently the most efficient) way to store point data as 
double x/double y pairs instead of a single geometry (BINARY) column.

For predicates where these points must intersect a complex geometry it can be a 
useful optimization to prefilter x/y columns directly with the bounding rect of 
the complex geometry in addittiton to running the st_ predicate. An example:

{code}
WHERE st_contains(<const_geometry>, st_point(x,y)) 
{code}
can be rewritten as:
{code}
WHERE st_contains(<const_geometry>, st_point(x,y))
AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
AND  x<=st_xmax(<const_geometry>)  AND  y<=st_ymax(<const_geometry>) 
{code}

This has two benefits:
1. the planner will move the >= <= predicates before st_contains which allows 
avoiding the expensive per row st_contains() call for points that failed the 
bounding box check  
2. >=/<= predicates can be pushed down in more cases, for example for Parquet 
min/max stat filtering or Iceberg min/max stat filtering - if the files/pages 
have limited bounding boxes then this can save IO.

An expression rewrite rule can be added that does this automatically. 


> Add geospatial expression rewrites for lat/lon coded points
> -----------------------------------------------------------
>
>                 Key: IMPALA-13851
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13851
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: geospatial
>
> It is a common (and currently the most efficient) way to store point data as 
> double x/double y pairs instead of a single geometry (BINARY) column.
> For predicates where these points must intersect a complex geometry it can be 
> a useful optimization to prefilter x/y columns directly with the bounding 
> rect of the complex geometry in addittiton to running the st_ predicate. An 
> example:
> {code}
> WHERE st_contains(<const_geometry>, st_point(x,y)) 
> {code}
> can be rewritten as:
> {code}
> WHERE st_contains(<const_geometry>, st_point(x,y))
> AND x>=st_xmin(<const_geometry>) AND y>=st_ymin(<const_geometry>)
> AND x<=st_xmax(<const_geometry>)  AND  y<=st_ymax(<const_geometry>) 
> {code}
> This has two benefits:
> 1. the planner will move the >= <= predicates before st_contains which allows 
> avoiding the expensive per row st_contains() call for points that failed the 
> bounding box check  
> 2. >=/<= predicates can be pushed down in more cases, for example for Parquet 
> min/max stat filtering or Iceberg min/max stat filtering - if the files/pages 
> have limited bounding boxes then this can save IO.
> An expression rewrite rule can be added that does this automatically. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to