[ 
https://issues.apache.org/jira/browse/HIVE-24510?focusedWorklogId=533014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533014
 ]

ASF GitHub Bot logged work on HIVE-24510:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jan/21 13:35
            Start Date: 08/Jan/21 13:35
    Worklog Time Spent: 10m 
      Work Description: abstractdog edited a comment on pull request #1824:
URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376


   > I made a quick fix to allow that in early versions of this patch. Then I 
decided to not pursue it because I did not see the need for allowing constant 
argument in runtime.
   > 
   > > you can still do something like:
   > > if compute_bit_vector: -> handle constant parameter
   > 
   > We do exactly that. Not in vectorizer but earlier in 
`ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra 
functionality or add special cases unless it is necessary. Note that 
compute_bit_vector is a newly added UDF in 4.0. So there is no backward 
compatibility concern either.
   > Do you see any other benefit than preserving the earlier q.out outputs?
   
   no, I'm concerned only about the qout changes
   you're right, if compute_bit_vector is a relatively new thing then we can 
also ignore backward compatibility problems and go on with 
compute_bit_vector_hll
   I would personally keep pursuing a smaller patch as having 
"compute_bit_vector_hll" has no benefits either, but it's up to you, I think if 
the default hll algo won't be changed in the near future for stats, we can go 
with the updated qouts :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 533014)
    Time Spent: 3h  (was: 2h 50m)

> Vectorize compute_bit_vector
> ----------------------------
>
>                 Key: HIVE-24510
>                 URL: https://issues.apache.org/jira/browse/HIVE-24510
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Mustafa İman
>            Assignee: Mustafa İman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute 
> stats functions are vectorizable. Only function that is not vectorizable is 
> "compute_bit_vector" for ndv statistics computation. This causes "create 
> table as select" and "insert overwrite select" queries to run in 
> non-vectorized mode. 
> Even a very naive implementation of vectorized compute_bit_vector gives about 
> 50% performance improvement on simple "insert overwrite select" queries. That 
> is because entire mapper or reducer can run in vectorized mode.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to