Re: Remove REGEX Column Specification

David Mollitor Tue, 14 Apr 2020 09:29:14 -0700

Hey Zoltan,

Thanks for the feedback and for sharing HIVE-16496.


I think HIVE-16496 is a better approach because it allows for the standard
SQL behavior of object identifiers, but the SQL syntax is expanded (instead
of overloaded) to provide this feature.

Also, if a user would like to do some sort of regex, they can query the
information_schema (if/when Hive gets that).

Also, I just re-read my previous email and I do apologize, I provided the
wrong jira.  The correct one for removal is:

https://issues.apache.org/jira/browse/HIVE-23176

Thanks.



David

On Tue, Apr 14, 2020 at 12:16 PM Zoltan Haindrich <k...@rxd.hu> wrote:

> Hey,
>
> I don't want to protect this feature - but I think it could be usefull;
> probably it would be ok to remove it but we should provide something else
> instead - I think this is
> the only way to "exclude" some specific columns from the output - without
> listing all the columns.
>
> How much are users actually use this feature?
>
> We had a somewhat related discussion a few years ago:
> https://issues.apache.org/jira/browse/HIVE-16496
>
> cheers,
> Zoltan
>
> On 4/13/20 3:56 PM, David Mollitor wrote:
> > Hello Gang,
> >
> > I've been tracking a lot of issues recently regarding qualified tables
> > names, qualified table names, table names using back ticks, and other
> > similar circumstances.
> >
> > I've looked into trying to address some of these and noted that these
> issue
> > goes way back and are go all the way down to the core of Hive.
> >
> > To start with, I wanted to use the ANTLR grammar to address some of these
> > issues and to standardize behavior across all queries.  For example,
> there
> > is currently a patch that disallows table names from having a 'dot' in
> the
> > name.  I'm not 100% sure it applies to all queries, so  I wanted to
> codify
> > this restriction in the parser grammar.  So it got me looking at the
> > grammar.
> >
> > In parallel, I also tried to build a supplemental parser in Java for
> > parsing table names (HIVE-23150) and I was hitting some weird, and
> > confusing, edge cases bubbling up from the parser.  I eventually traced
> it
> > back to the fact that there are a lot of weird rules around table names
> in
> > the grammar including something called "REGEX Column Specification."
> >
> > This feature is problematic as it blindly labels most table names as
> being
> > a regex.  It really should only apply to column names, but the grammar
> > defines a table name as also possibly being a regex. There is a lot of
> > ambiguity because a table named "a" could be a literal value or a legal
> > regex.  When a table name is defined as a regex, a different code path is
> > taken from when a table name is considered to be a literal value. Where I
> > first saw this issue was in a qtest where a table name `s/c` was
> producing
> > a different result than a table named `s+c`.
> >
> > This regex feature is not something I've seen in MySQL or Postgres.  In
> > MySQL, any table name surrounded with a back tick can be just about any
> > UTF-8 character, so it's not really feasible to tell, without some kind
> of
> > SQL hint, that this table name is a regex or a literal value.
> >
> > This feature adds a lot of ambiguity and complexity, it is not supported
> by
> > other major RDBMS, and it adds only very minor benefit.  I also hope to
> > move Hive in a direction of fully supporting UTF-8.
> >
> > I have put a patch up to remove it:
> > https://issues.apache.org/jira/browse/HIVE-23183
> >
> >
> > References:
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
> >
> >
> > https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
> >
> >
> > Thanks,
> > David
> >
>

Re: Remove REGEX Column Specification

Reply via email to