[ https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794502#comment-17794502 ]
Zhihua Deng commented on HIVE-27775: ------------------------------------ On Jdo path, we use the partition name to fetch the matched partitions, {code:java} SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS DN_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.PART_NAME AS NUCORDER0,A0.WRITE_ID,A0.PART_ID FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = <'payments'> AND C0."NAME" = <'default'> AND C0.CTLG_NAME = <'hive'> AND A0.PART_NAME = <'txn_datetime=2023-03-26 03%3A30%3A00'> ORDER BY NUCORDER0 {code} In the above example, the filter A0.PART_NAME = <'txn_datetime=2023-03-26 03%3A30%3A00'> A0.PART_NAME is timezone agnostic, this could lead to wrong result if <'txn_datetime=2023-03-26 03%3A30%3A00'> is timezone based timestamp. Compared to direct mode, {code:java} select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = <'payments'> inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = <'default'> inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = <'hive'> and ((cast((case when "FILTER0"."PART_KEY_VAL" <> <'__HIVE_DEFAULT_PARTITION__'> and "TBLS"."TBL_NAME" = <'payments'> and "DBS"."NAME" = <'default'> and "DBS"."CTLG_NAME" = <'hive'> and "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 then cast("FILTER0"."PART_KEY_VAL" as TIMESTAMP) else null end) as TIMESTAMP) = <'2023-03-26 03:30:00'>)) {code} the filter is cast("FILTER0"."PART_KEY_VAL" as TIMESTAMP) else null end) as TIMESTAMP) = <'2023-03-26 03:30:00'>, so if we push a timezone based <'2023-03-26 03:30:00'> and the underlying database can handle cast("FILTER0"."PART_KEY_VAL" as TIMESTAMP) else null end) as TIMESTAMP) properly, then we could get the expected result. However when I switch the backing db to Postgres, the direct sql throws an exception: {noformat} Caused by: org.postgresql.util.PSQLException: ERROR: operator does not exist: timestamp without time zone = character varying Hint: No operator matches the given name and argument types. You might need to add explicit type casts. Position: 662{noformat} MySQL as well: {code:java} Caused by: java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TIMESTAMP) else null end) as TIMESTAMP) = '2023-03-26 03:30:00'))' at line 1 {code} So I think we should fix the errors on direct sql path and make the timestamp/date timezone agnostic on JDO path. > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift > ------------------------------------------------------------------------------------------ > > Key: HIVE-27775 > URL: https://issues.apache.org/jira/browse/HIVE-27775 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore > Affects Versions: 4.0.0-beta-1 > Reporter: Stamatis Zampetakis > Assignee: Zhihua Deng > Priority: Critical > > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift. > {code:sql} > --! qt:timezone:Europe/Paris > CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime > TIMESTAMP) STORED AS ORC; > INSERT into payments VALUES('3333-4444-2222-9999', '2023-03-26 02:30:00'); > SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; > {code} > The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls > exactly in the middle of the DST shift. In this particular timezone this date > time never really exists since we are jumping directly from 02:00:00 to > 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic > (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) > so it is a perfectly valid timestamp that can be inserted in a table and we > must be able to recover it back. > For the SELECT query above, partition pruning kicks in and calls the > ObjectStore#getPartitionsByExpr method in order to fetch the respective > partitions matching the timestamp from HMS. > The tests however reveal that DirectSQL and JDO paths are not returning the > same results leading to an exception when VerifyingObjectStore is used. > According to the error below DirectSQL is able to recover one partition from > HMS (expected) while JDO/ORM returns empty (not expected). > {noformat} > 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0 > 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.RetryingHMSHandler: MetaException(message:Lists are not the same > size: SQL 1, ORM 0) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98) > at > org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82) > at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) > at com.sun.proxy.$Proxy60.listPartitionsSpecByExpr(Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:4507) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:457) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:230) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.computePartitionList(RelOptHiveTable.java:480) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePartitionPruneRule.perform(HivePartitionPruneRule.java:63) > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePartitionPruneRule.onMatch(HivePartitionPruneRule.java:46) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2471) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2430) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:1968) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1712) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1593) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1345) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:467) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)