[ https://issues.apache.org/jira/browse/HIVE-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-5304: ----------------------------------- Description: [removed old description] Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op 'SomeString' to underlying SQL datastore. However, the datastore may handle these differently based on the encoding and collation used for the columns of the database. So, query results can change depending on the underlying store for the metastore, and OS used I am assuming that byte-order sort if the correct way to order things. Our MySQL script specifies _bin collation, and Postgres, as far as I see, defaults to "C"; both of those are byte-order collations. However, MySQL by default doesn't use _bin collation, so if database is auto-created, the order of things is going to change. Derby also uses the non-byte-order by drop_partitions_filter.q illustrates this problem. In byte order collation (proper way) USA is sorted before Uganda was: Hive uses JDOQL filters to optimize partition retrieval; recently direct SQL was added to optimize it further. Both of these methods may end up pushing StringCol op 'SomeString' to underlying SQL datastore. Many paths also pushes order by-s, although these are not as problematic. The problem is that different datastores handle string compares differently. While testing on Postgres, I see that results in different things, from innocent like order changes in "show partitions", to more serious like {code} alter table ptestfilter drop partition (c>='US', d<='2') {code} in drop_partitions_filter.q - in Derby, with which the .q.out file was generated, it drops "c=Uganda/d=2"; this also passes on MySQL (I ran tests with autocreated db); on Postgres with a db from the script it doesn't. Looks like we need to enforce collation in partition names and part_key_values-es; both in the create scripts, as well as during autocreate (via package.jdo?) EDIT: also affected are - show indexes. So all names need to be taken care of Then; describe_comment_nonascii.q fails against MySQL on autocreated db due to problems with commends. > Hive results can depend on metastore's underlying datastore, if autocreate is > used > ---------------------------------------------------------------------------------- > > Key: HIVE-5304 > URL: https://issues.apache.org/jira/browse/HIVE-5304 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Sergey Shelukhin > > [removed old description] > Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op > 'SomeString' to underlying SQL datastore. However, the datastore may handle > these differently based on the encoding and collation used for the columns of > the database. > So, query results can change depending on the underlying store for the > metastore, and OS used > I am assuming that byte-order sort if the correct way to order things. > Our MySQL script specifies _bin collation, and Postgres, as far as I see, > defaults to "C"; both of those are byte-order collations. > However, MySQL by default doesn't use _bin collation, so if database is > auto-created, the order of things is going to change. > Derby also uses the non-byte-order by > drop_partitions_filter.q illustrates this problem. In byte order collation > (proper way) USA is sorted before Uganda -- This message was sent by Atlassian JIRA (v6.1#6144)