[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

ASF GitHub Bot (Jira) Thu, 25 Mar 2021 06:21:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=571876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571876
 ]


ASF GitHub Bot logged work on HIVE-24396:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Mar/21 13:20
            Start Date: 25/Mar/21 13:20
    Worklog Time Spent: 10m 
      Work Description: nrg4878 commented on a change in pull request #2037:
URL: https://github.com/apache/hive/pull/2037#discussion_r601477518



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/dataconnector/create/CreateDataConnectorOperation.java
##########
@@ -0,0 +1,71 @@
+/*

Review comment:
       1) We check for null/empty values for URL. We error out in those cases. 
Other than that, any non-empty value is accepted. I dont think we should check 
for correctness of the URL or even can for that matter.
   a) The URL is meant to be a freeform value against dozens of datasource 
types (mysql, postgres, hive, AWS Glue, Redshift etc). For each such source 
type, there could be dozens of variations of the url (includes properties and 
other params specific to the source). So I dont think we can meaningfully 
detect incorrect URLs.
   For example, MySQL though the URL might look fine syntactically, we cannot 
confirm dbName1 or dbName2 exist without actually attempting to connect to the 
DB.
   jdbc:mysql://<hostname>:3306/<dbName1>
   jdbc:mysql://<hostname>:3306/<dbName2>
   
   b) The format for the URLs could be changing overtime as well. It is 
unnecessary burden for maintaining new formats in hive. We want to be able to 
plugin a new datasource type by simply adding a provider.
   
   c) To be able to validate the URL, we have to establish the connection to 
the datasource at the time of creation. We are trying to delay making that 
connection as long as possible. When actual show tables is called. We avoid 
using up extra resources and leak connections.
   
   d) Users can do "create connector" .. followed by "alter connector set url". 
So any incorrect URLS can be modified using alter. Also in this case, we would 
be checking the URL twice. Better to have the onus of configuring it correctly 
on the end user.
   
   2) Passwords can be secured using jceks files as described in the "Securing 
Password" section of the doc below.
   https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler
   So users have an option of using non-CTVs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 571876)
    Time Spent: 4h 20m  (was: 4h 10m)

> [New Feature] Add data connector support for remote datasources
> ---------------------------------------------------------------
>
>                 Key: HIVE-24396
>                 URL: https://issues.apache.org/jira/browse/HIVE-24396
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING <dataconnector> WITH DCPROPERTIES 
> (....);
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

Reply via email to