[ https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746299#comment-15746299 ]
Chris Drome commented on HIVE-14870: ------------------------------------ [~alangates], let me answer from the bottom up. Page 2-3 explains what I did regarding deduplicating data. In short, I have removed LOCATION and CD_ID from the SDS table, because that results in a unique entry per table/partition. I also collapsed SDS and SERDES tables into a single table. These two changes result in a decrease from 3.4M records to 15 records. I didn't check the impact of each individual change, but all of the changes in aggregate result in a 3-4x speed up for getTable calls. I haven't tested array types to replace columns, etc because some of our table consist of 100s of columns and felt that the tradeoff would not be worth it. I plan to implement the same caching mechanism that you employ in HBaseStore, so the savings we get would be minimized. Furthermore, getTable calls take a fraction of the time that getPartitions calls take, so the majority of the effort was to optimize those calls. I'm currently working with our QE to hammer out the last couple of failures that we are hitting in regression/integration tests. I'd like to refactor and clean up some code around the getPartitions calls as well. I hope to have a cleaner version that I can post before the end of the year. > OracleStore: RawStore implementation optimized for Oracle > --------------------------------------------------------- > > Key: HIVE-14870 > URL: https://issues.apache.org/jira/browse/HIVE-14870 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Chris Drome > Assignee: Chris Drome > Attachments: OracleStoreDesignProposal.pdf > > > The attached document is a proposal for a RawStore implementation which is > optimized for Oracle and replaces DataNucleus. The document outlines schema > changes, OracleStore implementation details, and performance tests against > ObjectStore, ObjectStore+DirectSQL, and OracleStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)