[ https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549918#comment-15549918 ]
Sergey Shelukhin edited comment on HIVE-14870 at 10/5/16 9:07 PM: ------------------------------------------------------------------ We only need very limited functionality compared to DN. The layer like this already exists in ACID so I don't see why it cannot be reused and augmented. The only changes needed would be the ability to replace some parts to optimize for Oracle (or other DBs), via some sort of a plugin option (or even a switch statement) which will not be pretty but is imho preferable to the alternatives. As I see it, I would be merely -0 on the thing in itself - it's bad enough to have 2.5 SQL "engines" (ORM, the one in acid, and directsql), to add the third and then another federation thing that is not hidden on a lower level like the direct sql one. The direct sql one caused (and will probably cause ;)) a few problems and special cases, simple as it is... plus the confusion with failures-that-are-not-really-failures, failure to fall back, sudden unexplained slowdowns when the fallback is successful, etc.). There are probably all kinds of other issues; e.g. off the top of my head, how does this work with upgrade scripts - would we need to create and maintain another set? Would scripts to switch the schema between the old and the new always be the same, or would there need to be a back and forth script for every version eventually (I don't think one would ever need that but it is a possibility)? Etc. However, my main meta concern is about the approach - what do we do if someone wants to have an optimized MySqlEngine, or MsSqlEngine, AzureEngine, etc? They would totally c/p the Oracle one, rewrite a few critical SQL queries, and submits a patch. That can quickly turn into a maintenance nightmare. It appears to me that the existing custom-SQL layer in ACID could be reused, if desired (or used as inspiration) to make this store ANSI-ish (does it have any significant limitations currently?). That way we can keep query optimizations in a plugin (or even a switch statement if need be). This also has an additional advantage of being able to deprecate and then ditch ORM altogether, which would simplify things instead of making them more complex. Another alternative path (that could be pursued in parallel) is making RawStore pluggable so that such specific implementations could be used, while not being a supported part of Hive codebase. Perhaps if there is already a patch we can have a collective effort to do the ANSI SQL thing. Making an entirely SQL access layer is a very valuable thing to Hive community... however we want to make sure that we don't actually go in opposite direction with this effort. was (Author: sershe): We only need very limited functionality compared to DN. The layer like this already exists in ACID so I don't see why it cannot be reused and augmented. The only changes needed would be the ability to replace some parts to optimize for Oracle (or other DBs), via some sort of a plugin option (or even a switch statement) which will not be pretty but is imho preferable to the alternatives. As I see it, I would be merely -0 on the thing in itself - it's bad enough to have 2.5 SQL "engines" (ORM, the one in acid, and directsql), to add the third and then another federation thing that is not hidden on a lower level like the direct sql one. The direct sql one caused (and will probably cause ;)) a few problems and special cases, simple as it is... plus the confusion with failures-that-are-not-really-failures, failure to fall back, sudden unexplained slowdowns when the fallback is successful, etc.). There are probably all kinds of other issues; e.g. off the top of my head, how does this work with upgrade scripts - would we need to create and maintain another set? Would scripts to switch the schema between the old and the new always be the same, or would there need to be a back and forth script for every version eventually (I don't think one would ever need that but it is a possibility)? Etc. However, my main meta concern is about the approach - what do we do if someone wants to have an optimized MySqlEngine, or MsSqlEngine, AzureEngine, etc? They would totally c/p the Oracle one, rewrite a few critical SQL queries, and submits a patch. That can quickly turn into a maintenance nightmare. It appears to me that the existing custom-SQL layer in ACID could be reused, if desired (or used as inspiration) to make this store ANSI-ish (does it have any significant limitations currently?). That way we can keep query optimizations in a plugin (or even a switch statement if need be). This also has an additional advantage of being able to deprecate and then ditch ORM altogether, which would simplify things instead of making them more complex. Another alternative path (that could be pursued in parallel) is making RawStore pluggable so that such specific implementations could be used, while not being a supported part of Hive codebase. > OracleStore: RawStore implementation optimized for Oracle > --------------------------------------------------------- > > Key: HIVE-14870 > URL: https://issues.apache.org/jira/browse/HIVE-14870 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Chris Drome > Assignee: Chris Drome > Attachments: OracleStoreDesignProposal.pdf > > > The attached document is a proposal for a RawStore implementation which is > optimized for Oracle and replaces DataNucleus. The document outlines schema > changes, OracleStore implementation details, and performance tests against > ObjectStore, ObjectStore+DirectSQL, and OracleStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)