[ https://issues.apache.org/jira/browse/HIVE-26015?focusedWorklogId=745708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-745708 ]
ASF GitHub Bot logged work on HIVE-26015: ----------------------------------------- Author: ASF GitHub Bot Created on: 22/Mar/22 10:50 Start Date: 22/Mar/22 10:50 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #3084: URL: https://github.com/apache/hive/pull/3084#discussion_r832014955 ########## File path: hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseStorageHandler.java ########## @@ -46,6 +53,46 @@ public void testHbaseConfigIsAddedToJobConf() { jobConfToConfigure.get("hbase.some.fake.option.from.xml.file") != null); } + @Test + public void testGetUriForAuth() { + try { + HBaseStorageHandler hbaseStorageHandler = new HBaseStorageHandler(); + hbaseStorageHandler.setConf(new JobConf(new HiveConf())); + Table table = createMockTable(new HashMap<>()); + URI uri = hbaseStorageHandler.getURIForAuth(table); + // If there is no tablename provided, the default "null" is still + // written out. At the time this test was written, this was the current + // behavior, so I left this test as/is. Need to research if a null + // table can be provided here. + Assert.assertEquals("hbase://localhost:2181/null", uri.toString()); + + Map<String, String> serdeParams = new HashMap<>(); + serdeParams.put("hbase.zookeeper.quorum", "testhost"); + serdeParams.put("hbase.zookeeper.property.clientPort", "8765"); + table = createMockTable(serdeParams); + uri = hbaseStorageHandler.getURIForAuth(table); + Assert.assertEquals("hbase://testhost:8765/null", uri.toString()); + + serdeParams.put("hbase.table.name", "mytbl"); + table = createMockTable(serdeParams); + uri = hbaseStorageHandler.getURIForAuth(table); + Assert.assertEquals("hbase://testhost:8765/mytbl", uri.toString()); + + serdeParams.put("hbase.columns.mapping", "mycolumns"); + table = createMockTable(serdeParams); + uri = hbaseStorageHandler.getURIForAuth(table); + Assert.assertEquals("hbase://testhost:8765/mytbl/mycolumns", uri.toString()); + + serdeParams.put("hbase.table.name", "my#tbl"); + serdeParams.put("hbase.columns.mapping", "myco#lumns"); + table = createMockTable(serdeParams); + uri = hbaseStorageHandler.getURIForAuth(table); + Assert.assertEquals("hbase://testhost:8765/my%23tbl/myco%23lumns", uri.toString()); + } catch (URISyntaxException e) { + throw new RuntimeException(e); + } Review comment: The catch and rethrow pattern is rarely necessary in unit tests. Moreover, it has few disadvantages such as obscuring the original exception & error message, increasing indentation etc. The try-catch block can be removed and the `URISyntaxException` can go into the declaration of the method if necessary. ########## File path: hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseStorageHandler.java ########## @@ -56,4 +103,16 @@ private TableDesc getHBaseTableDesc() { Mockito.when(tableDesc.getProperties()).thenReturn(properties); return tableDesc; } + + private Table createMockTable(Map<String, String> serdeParams) { + Table table = new Table(); + StorageDescriptor sd = new StorageDescriptor(); + SerDeInfo sdi = new SerDeInfo(); + Map<String, String> params = new HashMap<>(); + sdi.setParameters(serdeParams); + sd.setSerdeInfo(sdi); + table.setSd(sd); + table.setParameters(params); Review comment: nit: Change to `table.setParameters(Collections.emptyMap())` or `table.setParameters(new HashMap<>())` or remove entirely if it doesn't call failures ########## File path: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java ########## @@ -293,14 +294,23 @@ public void configureTableJobProperties( public URI getURIForAuth(Table table) throws URISyntaxException { Map<String, String> tableProperties = HiveCustomStorageHandlerUtils.getTableProperties(table); hbaseConf = getConf(); - String hbase_host = tableProperties.containsKey(HBASE_HOST_NAME)? tableProperties.get(HBASE_HOST_NAME) : hbaseConf.get(HBASE_HOST_NAME); - String hbase_port = tableProperties.containsKey(HBASE_CLIENT_PORT)? tableProperties.get(HBASE_CLIENT_PORT) : hbaseConf.get(HBASE_CLIENT_PORT); - String table_name = tableProperties.getOrDefault(HBaseSerDe.HBASE_TABLE_NAME, null); - String column_family = tableProperties.getOrDefault(HBaseSerDe.HBASE_COLUMNS_MAPPING, null); - if (column_family != null) - return new URI(HBASE_PREFIX+"//"+hbase_host+":"+hbase_port+"/"+table_name+"/"+column_family); - else - return new URI(HBASE_PREFIX+"//"+hbase_host+":"+hbase_port+"/"+table_name); + String hbase_host = tableProperties.getOrDefault(HBASE_HOST_NAME, + hbaseConf.get(HBASE_HOST_NAME)); + String hbase_port = tableProperties.getOrDefault(HBASE_CLIENT_PORT, + hbaseConf.get(HBASE_CLIENT_PORT)); + String table_name = encodeString(tableProperties.getOrDefault(HBaseSerDe.HBASE_TABLE_NAME, + null)); + String column_family = encodeString(tableProperties.getOrDefault( + HBaseSerDe.HBASE_COLUMNS_MAPPING, null)); + String URIString = HBASE_PREFIX + "//" + hbase_host + ":" + hbase_port + "/" + table_name; + if (column_family != null) { + URIString += "/" + column_family; + } + return new URI(URIString); + } + + private static String encodeString(String encodedString) { Review comment: nit: `encodedString -> rawString` ########## File path: hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseStorageHandler.java ########## @@ -46,6 +53,46 @@ public void testHbaseConfigIsAddedToJobConf() { jobConfToConfigure.get("hbase.some.fake.option.from.xml.file") != null); } + @Test + public void testGetUriForAuth() { Review comment: The method seems to contains five tests. They all test `getURIForAuth` method but the input for every test is different. I am a big fan of keeping one assert per test method which tends to make test more readable and maintainable. There are various books (e.g., "The art of unit testing" by Roy Osherove) which go into more detail of why this is a good idea and in this case it seems beneficial to split and refactor the method into five separate tests. The common test code could go into a method `checkUriForAuth(serdeParams, expectedUri)` and the 5 test methods would call this with a different set of parameters. ########## File path: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java ########## @@ -293,14 +294,23 @@ public void configureTableJobProperties( public URI getURIForAuth(Table table) throws URISyntaxException { Map<String, String> tableProperties = HiveCustomStorageHandlerUtils.getTableProperties(table); hbaseConf = getConf(); - String hbase_host = tableProperties.containsKey(HBASE_HOST_NAME)? tableProperties.get(HBASE_HOST_NAME) : hbaseConf.get(HBASE_HOST_NAME); - String hbase_port = tableProperties.containsKey(HBASE_CLIENT_PORT)? tableProperties.get(HBASE_CLIENT_PORT) : hbaseConf.get(HBASE_CLIENT_PORT); - String table_name = tableProperties.getOrDefault(HBaseSerDe.HBASE_TABLE_NAME, null); - String column_family = tableProperties.getOrDefault(HBaseSerDe.HBASE_COLUMNS_MAPPING, null); - if (column_family != null) - return new URI(HBASE_PREFIX+"//"+hbase_host+":"+hbase_port+"/"+table_name+"/"+column_family); - else - return new URI(HBASE_PREFIX+"//"+hbase_host+":"+hbase_port+"/"+table_name); + String hbase_host = tableProperties.getOrDefault(HBASE_HOST_NAME, + hbaseConf.get(HBASE_HOST_NAME)); + String hbase_port = tableProperties.getOrDefault(HBASE_CLIENT_PORT, + hbaseConf.get(HBASE_CLIENT_PORT)); + String table_name = encodeString(tableProperties.getOrDefault(HBaseSerDe.HBASE_TABLE_NAME, + null)); + String column_family = encodeString(tableProperties.getOrDefault( + HBaseSerDe.HBASE_COLUMNS_MAPPING, null)); + String URIString = HBASE_PREFIX + "//" + hbase_host + ":" + hbase_port + "/" + table_name; + if (column_family != null) { + URIString += "/" + column_family; + } + return new URI(URIString); + } + + private static String encodeString(String encodedString) { + return encodedString != null ? URLEncoder.encode(encodedString): null; Review comment: Is it safe to rely on the platforms default encoding? Note that URLEncoder.encode(s) is deprecated and the suggested replacement is `URLEncoder.encode(String s, String enc)` with `UTF-8`. In various places in Hive we are using the latter. ########## File path: hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseStorageHandler.java ########## @@ -56,4 +103,16 @@ private TableDesc getHBaseTableDesc() { Mockito.when(tableDesc.getProperties()).thenReturn(properties); return tableDesc; } + + private Table createMockTable(Map<String, String> serdeParams) { Review comment: Make static? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 745708) Time Spent: 1.5h (was: 1h 20m) > HBase table with Ranger authentication fails; needs URLEncoding > --------------------------------------------------------------- > > Key: HIVE-26015 > URL: https://issues.apache.org/jira/browse/HIVE-26015 > Project: Hive > Issue Type: New Feature > Reporter: Steve Carlin > Assignee: Steve Carlin > Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > A Create table statement is failing for HBase going through Ranger. > The stack trace shows a problem with the getURIForAuth method. > The table is creating someting like this: > CREATE EXTERNAL TABLE `mytesttbl`( `field1` string COMMENT 'from > deserializer',`field2` string COMMENT 'from deserializer',`field3` string > COMMENT 'from deserializer',`field4` string COMMENT 'from > deserializer',`field5` string COMMENT 'from deserializer',`field6` int > COMMENT 'from deserializer', `field7` string COMMENT 'from deserializer', > `field8` int COMMENT 'from deserializer') ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( > 'hbase.columns.mapping'=':key,field1,field2,field3,field4,field5#b,field6,cf:field7#b','serialization.format'='1') > TBLPROPERTIES ( 'hbase.table.name'='mytesttbl'); > Essentially, the SERDEPROPERTIES contain hash tabs which is causing a problem > when creating a URI -- This message was sent by Atlassian Jira (v8.20.1#820001)