[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=804769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-804769 ]
ASF GitHub Bot logged work on HIVE-26046: ----------------------------------------- Author: ASF GitHub Bot Created on: 30/Aug/22 10:09 Start Date: 30/Aug/22 10:09 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r957951934 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ########## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { + case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { + if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; + } else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: > option 1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns? @nrg4878 I have tried set the hive.sql.query to include query like` select bin(id) from testmysqlbit,` and it worked as expected. That is to say, hive.sql.query can push down MySQL's native query to remote MySQL datasouce, and we can get MySQL's bit datatype values using a visible hive's bigint type or hive's string type. Step to test: 1. create jdbc-mapping table with **hive.sql.query** in hive: ` CREATE EXTERNAL TABLE jdbc_testmysqlbit_with_query` `(` ` id bigint` `)` `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'` `TBLPROPERTIES (` `"hive.sql.database.type" = "MYSQL",` `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",` `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",` `"hive.sql.dbcp.username" = "user",` `"hive.sql.dbcp.password" = "passwd",` `"hive.sql.dbcp.maxActive" = "1",` `"hive.sql.query" = "select bin(id) from testmysqlbit"` `);` 2. `select * from jdbc_testmysqlbit_with_query;` using hive beeline: ![image](https://user-images.githubusercontent.com/9760681/187336443-51942986-1284-4fc2-90b8-eae02db3beb2.png) However, one concern is that the _**hive.sql.query**_ is a fixed values and can not be rewrited with more optimized pushdown computation on top of the table. As the doc descripted: https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler#JDBCStorageHandler-ComputationPushdown > Computation pushdown will only happen when the jdbc table is defined by “hive.sql.table”. Hive will rewrite the data source with a “hive.sql.query” property with more computation on top of the table. In the above example, mysql will run the query and retrieve the join result, rather than fetch both tables and do the join in Hive. Issue Time Tracking ------------------- Worklog Id: (was: 804769) Time Spent: 4h 20m (was: 4h 10m) > MySQL's bit datatype is default to void datatype in hive > -------------------------------------------------------- > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore > Affects Versions: 4.0.0 > Reporter: Naveen Gangam > Assignee: zhangbutao > Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +-------------------------------+---------------------------------------------------+----------------------------------------------------+ > | col_name | data_type > | comment | > +-------------------------------+---------------------------------------------------+----------------------------------------------------+ > | tbl_id | bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type | varchar(10) > | from deserializer | > | retention | int > | from deserializer | > | sd_id | bigint > | from deserializer | > | tbl_name | varchar(256) > | from deserializer | > | tbl_type | varchar(128) > | from deserializer | > | view_expanded_text | string > | from deserializer | > | view_original_text | string > | from deserializer | > | is_rewrite_enabled | void > | from deserializer | > | write_id | bigint > | from deserializer > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)