[jira] [Work logged] (HIVE-25405) Implement Connector Provider for Amazon Redshift

ASF GitHub Bot (Jira) Wed, 17 Nov 2021 03:56:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25405?focusedWorklogId=682553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-682553
 ]


ASF GitHub Bot logged work on HIVE-25405:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Nov/21 11:55
            Start Date: 17/Nov/21 11:55
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on a change in pull request #2546:
URL: https://github.com/apache/hive/pull/2546#discussion_r751148777



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();
+
+    public RedshiftConnectorProvider(String dbName, DataConnector dataConn) {
+        super(dbName, dataConn, DRIVER_CLASS);
+    }
+
+    protected String getDataType(String dbDataType, int size) {
+        String mappedType = super.getDataType(dbDataType, size);
+
+        // The VOID type points to the corresponding datatype not existing in 
hive. These datatypes are datastore
+        // specific. They need special handling. An example would be the 
Geometric type that is not supported in Hive.
+        // The other cases where a datatype in redshift is resolved to a VOID 
type are during the use of aliases like
+        // float8, int8 etc. These can be mapped to existing hive types and 
are done below.
+        if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+            return mappedType;
+        }
+
+        // map any db specific types here.

Review comment:
       The Intention is clear, comment is redundant.
   ```suggestion
   ```

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();

Review comment:
       There is no benefit at all calling intern() on static final field. 
Actually, I don't see a good reason to have field declaration here. We could 
inline the string to the constructor.

##########
File path: ql/src/test/queries/clientpositive/redshift_data_connector.q
##########
@@ -0,0 +1,65 @@
+-- SORT_QUERY_RESULTS
+SHOW CONNECTORS;
+
+-- CREATE with comment
+CREATE CONNECTOR redshift_test
+TYPE 'redshift'
+URL 
'jdbc:redshift://redshift-cluster-1.c1gffkxfot1v.us-east-2.redshift.amazonaws.com:5439/dev'

Review comment:
       Agree with Zoltan, I don't think we can add this. Very likely now the 
Redhshift instance specified here may not be available and the test will fail.
   
   Since Postgres and Redshift have many similarities I would suggest adding at 
least CONNECTOR test with Postgres. An easy way to do it would be using 
`--!qt:database:postgres:` option.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/AbstractJDBCConnectorProvider.java
##########
@@ -185,9 +185,18 @@ protected Connection getConnection() {
     return null;
   }
 
-  protected abstract ResultSet fetchTableMetadata(String tableName) throws 
MetaException;
+  protected ResultSet fetchTableMetadata(String tableName) throws 
MetaException {
+    try {
+      return fetchTablesViaDBMetaData(tableName);
+    }
+    catch (SQLException sqle) {
+      throw new MetaException("Error while trying to access the table names in 
the database" + sqle);
+    }
+  }
 
-  protected abstract ResultSet fetchTableNames() throws MetaException;
+  protected ResultSet fetchTableNames() throws MetaException {
+    return fetchTableMetadata(null);
+  }

Review comment:
       I don't see any usages of `fetchTableMetadata` or `fetchTableNames` in 
the project. How about removing these methods completely?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {

Review comment:
       Maybe it would be better to extend the Postgres connector instead of the 
Abstract. Most of the datatypes listed below do exist in Postgres as well so we 
could possibly add them there and override `getDataType` only for those that do 
not exist.

##########
File path: ql/src/test/queries/clientpositive/redshift_data_connector.q
##########
@@ -0,0 +1,65 @@
+-- SORT_QUERY_RESULTS
+SHOW CONNECTORS;
+
+-- CREATE with comment
+CREATE CONNECTOR redshift_test
+TYPE 'redshift'
+URL 
'jdbc:redshift://redshift-cluster-1.c1gffkxfot1v.us-east-2.redshift.amazonaws.com:5439/dev'

Review comment:
       As mentioned previously, the test should have `SELECT` and `EXPLAIN 
SELECT` queries to verify the type mapping is done correctly.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();
+
+    public RedshiftConnectorProvider(String dbName, DataConnector dataConn) {
+        super(dbName, dataConn, DRIVER_CLASS);
+    }
+
+    protected String getDataType(String dbDataType, int size) {
+        String mappedType = super.getDataType(dbDataType, size);
+
+        // The VOID type points to the corresponding datatype not existing in 
hive. These datatypes are datastore
+        // specific. They need special handling. An example would be the 
Geometric type that is not supported in Hive.
+        // The other cases where a datatype in redshift is resolved to a VOID 
type are during the use of aliases like
+        // float8, int8 etc. These can be mapped to existing hive types and 
are done below.
+        if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+            return mappedType;
+        }
+
+        // map any db specific types here.
+        //TODO: Geometric and other redshift specific types need to be 
supported.
+        switch (dbDataType.toLowerCase())
+        {
+            // The below mappings were tested by creating table definitions in 
redshift with the respective datatype
+            // and querying from the tables.
+            // e.g.
+            // create table test_int_8(i int8);
+            // insert into test_int_8 values(1);
+            // 0: jdbc:hive2://> select * from test_int_8;
+            //+---------------+
+            //| test_int_8.i  |
+            //+---------------+
+            //| 1             |
+            //+---------------+
+            //1 row selected (13.381 seconds)
+            case "bpchar":
+                mappedType = ColumnType.CHAR_TYPE_NAME + wrapSize(size);
+                break;
+            case "int2":
+                mappedType = ColumnType.SMALLINT_TYPE_NAME;
+                break;
+            case "int4":
+                mappedType = ColumnType.INT_TYPE_NAME;
+                break;
+            case "int8":
+                mappedType = ColumnType.BIGINT_TYPE_NAME;
+                break;
+            case "real":
+                mappedType = ColumnType.FLOAT_TYPE_NAME;
+                break;
+            case "float4":
+                mappedType = ColumnType.FLOAT_TYPE_NAME;
+                break;
+            case "float8":
+                mappedType = ColumnType.DOUBLE_TYPE_NAME;
+                break;
+            case "time":
+            case "time without time zone":
+                // A table with a time column gets resolved to "time without 
time zone" in hive.
+                mappedType = ColumnType.TIMESTAMP_TYPE_NAME;
+                break;
+            case "timez":
+            case "time with time zone":
+                mappedType = ColumnType.TIMESTAMPTZ_TYPE_NAME;
+                break;
+            default:
+                mappedType = ColumnType.VOID_TYPE_NAME;
+                break;

Review comment:
       ```suggestion
   ```

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();
+
+    public RedshiftConnectorProvider(String dbName, DataConnector dataConn) {
+        super(dbName, dataConn, DRIVER_CLASS);
+    }
+
+    protected String getDataType(String dbDataType, int size) {
+        String mappedType = super.getDataType(dbDataType, size);
+
+        // The VOID type points to the corresponding datatype not existing in 
hive. These datatypes are datastore
+        // specific. They need special handling. An example would be the 
Geometric type that is not supported in Hive.
+        // The other cases where a datatype in redshift is resolved to a VOID 
type are during the use of aliases like
+        // float8, int8 etc. These can be mapped to existing hive types and 
are done below.
+        if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+            return mappedType;
+        }
+
+        // map any db specific types here.
+        //TODO: Geometric and other redshift specific types need to be 
supported.

Review comment:
       Please create a JIRA and associate it with the TODO.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();
+
+    public RedshiftConnectorProvider(String dbName, DataConnector dataConn) {
+        super(dbName, dataConn, DRIVER_CLASS);
+    }
+
+    protected String getDataType(String dbDataType, int size) {
+        String mappedType = super.getDataType(dbDataType, size);
+
+        // The VOID type points to the corresponding datatype not existing in 
hive. These datatypes are datastore
+        // specific. They need special handling. An example would be the 
Geometric type that is not supported in Hive.
+        // The other cases where a datatype in redshift is resolved to a VOID 
type are during the use of aliases like
+        // float8, int8 etc. These can be mapped to existing hive types and 
are done below.
+        if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+            return mappedType;
+        }
+
+        // map any db specific types here.
+        //TODO: Geometric and other redshift specific types need to be 
supported.
+        switch (dbDataType.toLowerCase())
+        {
+            // The below mappings were tested by creating table definitions in 
redshift with the respective datatype
+            // and querying from the tables.
+            // e.g.
+            // create table test_int_8(i int8);
+            // insert into test_int_8 values(1);
+            // 0: jdbc:hive2://> select * from test_int_8;
+            //+---------------+
+            //| test_int_8.i  |
+            //+---------------+
+            //| 1             |
+            //+---------------+
+            //1 row selected (13.381 seconds)

Review comment:
       ```suggestion
   ```
   Instead of explaining how things were tested concrete tests could be added. 
Create tables in Redshift/Postgres, populate them with some data and then run 
`SELECT` and `EXPLAIN SELECT` queries in Hive so that we can verify results are 
correct and types are mapped correctly.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/RedshiftConnectorProvider.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.metastore.dataconnector.jdbc;
+
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.DataConnector;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.ResultSet;
+import java.sql.SQLException;
+
+public class RedshiftConnectorProvider extends AbstractJDBCConnectorProvider {
+    private static Logger LOG = 
LoggerFactory.getLogger(RedshiftConnectorProvider.class);
+
+    private static final String DRIVER_CLASS = 
"com.amazon.redshift.jdbc42.Driver".intern();
+
+    public RedshiftConnectorProvider(String dbName, DataConnector dataConn) {
+        super(dbName, dataConn, DRIVER_CLASS);
+    }
+
+    protected String getDataType(String dbDataType, int size) {
+        String mappedType = super.getDataType(dbDataType, size);
+
+        // The VOID type points to the corresponding datatype not existing in 
hive. These datatypes are datastore
+        // specific. They need special handling. An example would be the 
Geometric type that is not supported in Hive.
+        // The other cases where a datatype in redshift is resolved to a VOID 
type are during the use of aliases like
+        // float8, int8 etc. These can be mapped to existing hive types and 
are done below.
+        if (!mappedType.equalsIgnoreCase(ColumnType.VOID_TYPE_NAME)) {
+            return mappedType;
+        }

Review comment:
       ```suggestion
   ```
   The `if` is not necessary and possibly wrong. In terms of inheritance it is 
more natural to continue to the switch case below to attempt to override the 
`mappedType` if necessary. To do that we have to skip the `default` case below.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 682553)
    Time Spent: 50m  (was: 40m)

> Implement Connector Provider for Amazon Redshift
> ------------------------------------------------
>
>                 Key: HIVE-25405
>                 URL: https://issues.apache.org/jira/browse/HIVE-25405
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Narayanan Venkateswaran
>            Assignee: Narayanan Venkateswaran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h2. +DB Level Federation to Redshift+
> The work in this Jira helps setup federation to redshift databases at the DB 
> level, in the following way.
> {noformat}
> 0: jdbc:hive2://> describe connector redshift_test_7;
> 21/07/30 20:53:53 [343005b1-06f3-4891-81d1-0c6ad8612260 main]: WARN 
> lazy.LazyStruct: Extra bytes detected at the end of the row! Ignoring similar 
> problems.
> +------------------+-----------+----------------------------------------------------+-------------+-------------+--------------------------+
> |       name       |   type    |                        url                   
>       | owner_name  | owner_type  |         comment          |
> +------------------+-----------+----------------------------------------------------+-------------+-------------+--------------------------+
> | redshift_test_7  | redshift  | jdbc:redshift://<url>                        
>       | narayanan   | USER        | test redshift connector  |
> +------------------+-----------+----------------------------------------------------+-------------+-------------+--------------------------+
> 1 row selected (0.054 seconds) {noformat}
> {noformat}
> 0: jdbc:hive2://> describe database db_redshift_7;
> 21/07/30 20:53:38 [343005b1-06f3-4891-81d1-0c6ad8612260 main]: WARN 
> lazy.LazyStruct: Extra bytes detected at the end of the row! Ignoring similar 
> problems.
> +----------------+----------+-----------+------------------+-------------+-------------+------------------+----------------+
> |    db_name     | comment  | location  | managedlocation  | owner_name  | 
> owner_type  |  connector_name  | remote_dbname  |
> +----------------+----------+-----------+------------------+-------------+-------------+------------------+----------------+
> | db_redshift_7  |          |           |                  | narayanan   | 
> USER        | redshift_test_7  | dev            |
> +----------------+----------+-----------+------------------+-------------+-------------+------------------+----------------+
> 1 row selected (0.066 seconds) {noformat}
> {noformat}
> 0: jdbc:hive2://> use db_redshift_7;
> No rows affected (1.391 seconds) {noformat}
> {noformat}
> 0: jdbc:hive2://> show tables;
> +-----------------+
> |    tab_name     |
> +-----------------+
> | accommodations  |
> | category        |
> | date            |
> | event           |
> | listing         |
> | sales           |
> | sample          |
> | test_float_4    |
> | test_int_2      |
> | test_int_4      |
> | test_int_8      |
> | test_time       |
> | test_time_2     |
> | test_timestamp  |
> | users           |
> | venue           |
> | zipcode         |
> +-----------------+
> 17 rows selected (7.773 seconds) {noformat}
> {noformat}
> 0: jdbc:hive2://> select * from test_float_4;
> +-----------------+
> | test_float_4.i  |
> +-----------------+
> | 1.0             |
> +-----------------+
> 1 row selected (18.214 seconds) {noformat}
> {noformat}
> 0: jdbc:hive2://> select * from test_int_8;
> +---------------+
> | test_int_8.i  |
> +---------------+
> | 1             |
> +---------------+
> 1 row selected (12.626 seconds) {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25405) Implement Connector Provider for Amazon Redshift

Reply via email to