[ https://issues.apache.org/jira/browse/HIVE-22702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010482#comment-17010482 ]
Peter Vary commented on HIVE-22702: ----------------------------------- [~michaelchirico]: You did not mention the version of hive you are using, but I suspect that it does not contain HIVE-6980, HIVE-19783 as we did some optimization in drop table in these jiras. Since the fixes are not in any current releases, you might be able to give it a try by building your own hive. Thanks, Peter > ALTER TABLE REMOVE PARTITION is inefficient > ------------------------------------------- > > Key: HIVE-22702 > URL: https://issues.apache.org/jira/browse/HIVE-22702 > Project: Hive > Issue Type: Improvement > Components: Database/Schema > Reporter: Michael Chirico > Priority: Major > > I recently realized the poor partitioning of a table of mine was becoming a > major bottleneck and endeavored to reset the partitioning. > At this point, the table had about 56K partitions (year|month|day|city) > combinations; moving to the more efficient year|month partitions means > there's about 24. > In the process, I was having trouble fixing the registration of the table > because of the size of its partition DB; I happened upon this SO Q&A which > addresses the same issue: > https://stackoverflow.com/questions/50715939/drop-table-in-hive-via-spark-hangs/50814566#comment105440563_50814566 > I set about batching through ALTER TABLE x DROP PARTITION (...), PARTITION > (...) 200 at a time; it would run for about 2 hours to accomplish this, which > strikes me as being quite inefficient. > (apologies that I haven't done a fully proper analysis of the scaling > efficiency in this ticket) > If I were designing it from scratch, I would: > * Keep the database of existing partitions sorted > * Sort the incoming partitions to remove > * Iterate via "shrinking binary search" (each partition is searched with > binary search, and we can eliminate from the existing DB anything "less than" > the current index when moving to the next iteration) > Is there something preventing this from being achieved? -- This message was sent by Atlassian Jira (v8.3.4#803005)