[ 
https://issues.apache.org/jira/browse/HIVE-22702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010482#comment-17010482
 ] 

Peter Vary commented on HIVE-22702:
-----------------------------------

[~michaelchirico]: You did not mention the version of hive you are using, but I 
suspect that it does not contain HIVE-6980, HIVE-19783 as we did some 
optimization in drop table in these jiras. Since the fixes are not in any 
current releases, you might be able to give it a try by building your own hive.

Thanks,
Peter

> ALTER TABLE REMOVE PARTITION is inefficient
> -------------------------------------------
>
>                 Key: HIVE-22702
>                 URL: https://issues.apache.org/jira/browse/HIVE-22702
>             Project: Hive
>          Issue Type: Improvement
>          Components: Database/Schema
>            Reporter: Michael Chirico
>            Priority: Major
>
> I recently realized the poor partitioning of a table of mine was becoming a 
> major bottleneck and endeavored to reset the partitioning.
> At this point, the table had about 56K partitions (year|month|day|city) 
> combinations; moving to the more efficient year|month partitions means 
> there's about 24.
> In the process, I was having trouble fixing the registration of the table 
> because of the size of its partition DB; I happened upon this SO Q&A which 
> addresses the same issue:
> https://stackoverflow.com/questions/50715939/drop-table-in-hive-via-spark-hangs/50814566#comment105440563_50814566
> I set about batching through ALTER TABLE x DROP PARTITION (...), PARTITION 
> (...) 200 at a time; it would run for about 2 hours to accomplish this, which 
> strikes me as being quite inefficient.
> (apologies that I haven't done a fully proper analysis of the scaling 
> efficiency in this ticket)
> If I were designing it from scratch, I would:
> * Keep the database of existing partitions sorted
> * Sort the incoming partitions to remove
> * Iterate via "shrinking binary search" (each partition is searched with 
> binary search, and we can eliminate from the existing DB anything "less than" 
> the current index when moving to the next iteration)
> Is there something preventing this from being achieved?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to