Hey Zoltan, Thanks for the feedback and for sharing HIVE-16496.
I think HIVE-16496 is a better approach because it allows for the standard SQL behavior of object identifiers, but the SQL syntax is expanded (instead of overloaded) to provide this feature. Also, if a user would like to do some sort of regex, they can query the information_schema (if/when Hive gets that). Also, I just re-read my previous email and I do apologize, I provided the wrong jira. The correct one for removal is: https://issues.apache.org/jira/browse/HIVE-23176 Thanks. David On Tue, Apr 14, 2020 at 12:16 PM Zoltan Haindrich <k...@rxd.hu> wrote: > Hey, > > I don't want to protect this feature - but I think it could be usefull; > probably it would be ok to remove it but we should provide something else > instead - I think this is > the only way to "exclude" some specific columns from the output - without > listing all the columns. > > How much are users actually use this feature? > > We had a somewhat related discussion a few years ago: > https://issues.apache.org/jira/browse/HIVE-16496 > > cheers, > Zoltan > > On 4/13/20 3:56 PM, David Mollitor wrote: > > Hello Gang, > > > > I've been tracking a lot of issues recently regarding qualified tables > > names, qualified table names, table names using back ticks, and other > > similar circumstances. > > > > I've looked into trying to address some of these and noted that these > issue > > goes way back and are go all the way down to the core of Hive. > > > > To start with, I wanted to use the ANTLR grammar to address some of these > > issues and to standardize behavior across all queries. For example, > there > > is currently a patch that disallows table names from having a 'dot' in > the > > name. I'm not 100% sure it applies to all queries, so I wanted to > codify > > this restriction in the parser grammar. So it got me looking at the > > grammar. > > > > In parallel, I also tried to build a supplemental parser in Java for > > parsing table names (HIVE-23150) and I was hitting some weird, and > > confusing, edge cases bubbling up from the parser. I eventually traced > it > > back to the fact that there are a lot of weird rules around table names > in > > the grammar including something called "REGEX Column Specification." > > > > This feature is problematic as it blindly labels most table names as > being > > a regex. It really should only apply to column names, but the grammar > > defines a table name as also possibly being a regex. There is a lot of > > ambiguity because a table named "a" could be a literal value or a legal > > regex. When a table name is defined as a regex, a different code path is > > taken from when a table name is considered to be a literal value. Where I > > first saw this issue was in a qtest where a table name `s/c` was > producing > > a different result than a table named `s+c`. > > > > This regex feature is not something I've seen in MySQL or Postgres. In > > MySQL, any table name surrounded with a back tick can be just about any > > UTF-8 character, so it's not really feasible to tell, without some kind > of > > SQL hint, that this table name is a regex or a literal value. > > > > This feature adds a lot of ambiguity and complexity, it is not supported > by > > other major RDBMS, and it adds only very minor benefit. I also hope to > > move Hive in a direction of fully supporting UTF-8. > > > > I have put a patch up to remove it: > > https://issues.apache.org/jira/browse/HIVE-23183 > > > > > > References: > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification > > > > > > https://dev.mysql.com/doc/refman/8.0/en/identifiers.html > > > > > > Thanks, > > David > > >