[ 
https://issues.apache.org/jira/browse/IMPALA-12961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin updated IMPALA-12961:
----------------------------------
        Parent:     (was: IMPALA-12871)
    Issue Type: Bug  (was: Sub-task)

> Use a Map instead of an ArrayList for Expr in HDFS RelNode
> ----------------------------------------------------------
>
>                 Key: IMPALA-12961
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12961
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Steve Carlin
>            Priority: Major
>
> This came up in code review in ImpalaHdfsScanRel:
> "For wide tables where we are only needing a few columns projected, we will 
> end up with a long list with mostly Nulls. A LinkedHashMap (preserves 
> Insertion order) where the key is position and value is the SlotRef would be 
> better suited despite the cpu cost of hashing. In general, in a query 
> planner, memory is the most precious commodity since the plan search space 
> can be large, so anything we can do to reduce memory footprint would be 
> preferred."
> One counter argument:  The list is used in other Rel Nodes, and it seems more 
> natural.  For instance, the Project RelNode will have a RexInputRef RexNode 
> which is "$2".  It seems more natural to have an array in this case.  Every 
> other RelNode works this way except for the ScanNode.
> To add to the counter argument: Let's take a worst case scenario of a query 
> that has 10 tables with 500 columns apiece.    If we are allocating 8 byte 
> pointers, we would need 10*500*8 to hold this information, which is 40,000 
> bytes.  While reducing the memory footprint is more important, reducing it by 
> 40,000 bytes really isn't going to make an impact.  Even if we take into 
> account that multiple queries would be running simultaneously, this is a very 
> shortlived code path.  So should we go with the more natural approach versus 
> the less memory intensive approach?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to