[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=780513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780513 ]
ASF GitHub Bot logged work on HIVE-26046: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/Jun/22 14:32 Start Date: 11/Jun/22 14:32 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r895032467 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ########## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { + case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { + if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; + } else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: @nrg4878 Sorry for not updating the issue for a long time. In my opinion, we have two ways to get origina MySQL BITs values, that is b'111000' in MySQL will be read as '111000'(binary number) in hive. First way: In MySQL, we can get bit values using 'select bin(days) from work_calendar'. So i was thinking how we use MySQL native language to get bit values in hive. After some research, I found we could use jdbc table computation pushdown(hive.sql.query) to rewrite query sql. That is to say, when we query a bit type table using 'select days from work_calendar' in hive, we can rewrite the pushdown query to 'select bin(days) from work_calendar'. In this way, we may need check original MySQL data type and rewrite bit type query when pushdown. Maybe we should do some change in JdbcSerde.java. https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler#JDBCStorageHandler-ComputationPushdown Second way: I was also thinking using udf to to cast valus to BITs as you suggested. That is, 56(bigint) is cast to '111000'(binary number) in hive internal. Fortunately, I found hive already has this udf bin: https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBin.java#L28-L37 So, based on this pr, we can get bit values using hive query 'select bin(days) from work_calendar'. And we no need change current jdbc code(JdbcSerDe.java or others). I like this simple way as we can reuse bin udf to acheive our goal. wdyt? if you have better ways, feel free to come up with. thank you. Issue Time Tracking ------------------- Worklog Id: (was: 780513) Time Spent: 2h 10m (was: 2h) > MySQL's bit datatype is default to void datatype in hive > -------------------------------------------------------- > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore > Affects Versions: 4.0.0 > Reporter: Naveen Gangam > Assignee: zhangbutao > Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +-------------------------------+---------------------------------------------------+----------------------------------------------------+ > | col_name | data_type > | comment | > +-------------------------------+---------------------------------------------------+----------------------------------------------------+ > | tbl_id | bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type | varchar(10) > | from deserializer | > | retention | int > | from deserializer | > | sd_id | bigint > | from deserializer | > | tbl_name | varchar(256) > | from deserializer | > | tbl_type | varchar(128) > | from deserializer | > | view_expanded_text | string > | from deserializer | > | view_original_text | string > | from deserializer | > | is_rewrite_enabled | void > | from deserializer | > | write_id | bigint > | from deserializer > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)