[ 
https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746299#comment-15746299
 ] 

Chris Drome commented on HIVE-14870:
------------------------------------

[~alangates], let me answer from the bottom up.

Page 2-3 explains what I did regarding deduplicating data. In short, I have 
removed LOCATION and CD_ID from the SDS table, because that results in a unique 
entry per table/partition. I also collapsed SDS and SERDES tables into a single 
table. These two changes result in a decrease from 3.4M records to 15 records.

I didn't check the impact of each individual change, but all of the changes in 
aggregate result in a 3-4x speed up for getTable calls.

I haven't tested array types to replace columns, etc because some of our table 
consist of 100s of columns and felt that the tradeoff would not be worth it. I 
plan to implement the same caching mechanism that you employ in HBaseStore, so 
the savings we get would be minimized. Furthermore, getTable calls take a 
fraction of the time that getPartitions calls take, so the majority of the 
effort was to optimize those calls.

I'm currently working with our QE to hammer out the last couple of failures 
that we are hitting in regression/integration tests. I'd like to refactor and 
clean up some code around the getPartitions calls as well. I hope to have a 
cleaner version that I can post before the end of the year.

> OracleStore: RawStore implementation optimized for Oracle
> ---------------------------------------------------------
>
>                 Key: HIVE-14870
>                 URL: https://issues.apache.org/jira/browse/HIVE-14870
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Chris Drome
>            Assignee: Chris Drome
>         Attachments: OracleStoreDesignProposal.pdf
>
>
> The attached document is a proposal for a RawStore implementation which is 
> optimized for Oracle and replaces DataNucleus. The document outlines schema 
> changes, OracleStore implementation details, and performance tests against 
> ObjectStore, ObjectStore+DirectSQL, and OracleStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to