Hi Jay, Yes, Hive parser is stripping out the "as". As Gopal explained in the earlier thread, parser is used to translate a SQL string into an AST, i.e., an AST that the compiler can understand. That said, as long as the following compiler knows that "oh, this is a renaming", it is OK. In another word, it is OK if you want to add the "as" back as long as your compiler understands. In Hive's case, IMHO, I do not think it is necessary to add it back because the compiler already knows what it means.
Best Pengcheng On Fri, Apr 13, 2018 at 6:29 AM, Furcy Pin <pin.fu...@gmail.com> wrote: > Hi Jay, > > I noticed the same thing when I did my tool, and it makes sense are > syntactically they are both equivalent, so the Hive parser does not care. > You could probably update the HiveParser so that it keeps the information > in the AST, but not without breaking every part of the code that reads that > AST I'm afraid. > > In the long run, I believe that it would be really valuable if someone had > the courage to completely rewrite Hive parsing using best practices > and good tools (e.g. ANTLR 4), but I doubt that it will ever happen > unfortunately :-( > > > > > On 13 April 2018 at 14:25, Jay Green-Stevens <t-jgreenstev...@hotels.com> > wrote: > >> Afternoon all, >> >> >> >> We have successfully managed to build a java tool which will translate a >> hive query into a syntax tree, and then turn this back into a hive query >> equivalent to the input. >> >> >> >> But we have found that for a query such as >> >> *select a * >> >> *from (* >> >> *select a * >> >> *from b* >> >> *) as c* >> >> >> >> the hive parser is stripping out the ‘as’ when building the tree. This >> then means that when the query string is rebuilt the output is ‘*select >> a from (select a from b) c’*, and although this is technically valid it >> is not equivalent to the input query. >> >> >> >> Is there any way we can fix this issue? >> >> Could we change the parser in some way to stop the ‘as’ from being >> stripped out? >> >> >> >> Any ideas would be greatly appreciated! >> >> >> >> Thanks, >> >> >> >> Jay >> >> >> >> >> >> >> >> >> From: *Elliot West* <tea...@gmail.com> >> Date: 19 March 2018 at 21:33 >> Subject: Re: HQL parser internals >> To: user@hive.apache.org >> >> Hello again, >> >> >> >> We're now testing our system against a corpus of Hive SQL statements in >> an effort to quickly highlight edge cases, limitations etc. We're finding >> that org.apache.hadoop.hive.ql.parse.ParseUtils is stumbling on >> variables such as ${hiveconf:varname}. Are variable substitutions >> handled prior to parsing or within the parser itself? If in a pre-procesing >> stage, is there any code or utility classes within Hive that we can use as >> a reference, or to provide this functionality? >> >> >> >> Cheers, >> >> >> >> Elliot. >> >> >> >> On 19 February 2018 at 11:10, Elliot West <tea...@gmail.com> wrote: >> >> Thank you all for your rapid responses; some really useful information >> and pointers in there. >> >> >> >> We'll keep the list updated with our progress. >> >> >> >> On 18 February 2018 at 19:00, Dharmesh Kakadia <dhkaka...@gmail.com> >> wrote: >> >> +1 for using ParseDriver for this. I also have used it to intercept and >> augment query AST. >> >> >> >> Also, I would echo others sentiment that its quite ugly. It would be >> great if we can refactor/standardize this. That will make integrating >> other system a lot easier. >> >> >> >> Thanks, >> >> Dharmesh >> >> >> >> On Sat, Feb 17, 2018 at 12:07 AM, Furcy Pin <pin.fu...@gmail.com> wrote: >> >> Hi Elliot, >> >> >> >> Actually, I have done quite similar work regarding Hive custom Parsing, >> you should have a look at my project: https://github.com/flaminem/flamy >> >> >> >> The Hive parsing related stuff is here: https://github.com/flami >> nem/flamy/tree/master/src/main/scala/com/flaminem/flamy/parsing/hive >> A good starting point to see how to parse queries is here: >> >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/PopulateParserInfo.scala#L492 >> >> >> >> >> >> Basically, all you need is to use a org.apache.hadoop.hive.ql.pars >> e.ParseDriver. >> >> >> >> val pd: ParseDriver = new ParseDriver >> >> val tree: ASTNode = pd.parse(query, hiveContext) >> >> You then get the ASTNode, that you can freely parse and change. >> >> Also, I must say that it is quite ugly to manipulate, and the Presto >> Parser seems to be much better designed (but it is not the same syntax, >> unfortunately), >> >> I recommend to look at it to get better design ideas. >> >> >> >> >> >> If you want to enrich your Hive syntax like I did (I wanted to be able to >> parse ${VARS} in queries), >> >> you will not be able to use the HiveParser without some workaround. >> >> What I did was replacing these ${VARS} by strings "${VARS}" that the >> HiveParser would agree to parse, >> >> and that I could recognize afterwards... >> >> >> >> Also, if you are familiar with Scala, I recommend using it, it helps a >> lot... >> >> >> >> For instance, I have this class that transforms an AST back into a string >> query: >> >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/ast/SQLFormatter.scala >> >> I could never have done something that good looking in Java... >> >> >> >> Finally this method helps a lot to understand how the hell the AST works: >> >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/HiveParserUtils.scala#L593 >> >> >> >> Make sure to write *tons* of unit tests too, you'll need them. >> >> >> >> Hope this helps, >> >> >> >> Furcy >> >> >> >> >> >> >> >> On 16 February 2018 at 21:20, Gopal Vijayaraghavan <gop...@apache.org> >> wrote: >> >> >> > However, ideally we wish to manipulate the original query as delivered >> by the user (or as close to it as possible), and we’re finding that the >> tree has been modified significantly by the time it hits the hook >> >> That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook - >> the bushy join conversion is already done by the time the hook gets called. >> >> We need a Parser hook to hook it ahead of CBO, not a Semantic Analyzer >> hook. >> >> > Additionally we wish to track back ASTNodes to the character sequences >> in the source HQL that were their origin (where sensible), and ultimately >> hope to be able regenerate the query text from the AST. >> >> I started work on a Hive-unparser a while back based on this class, but >> it a world of verbose coding. >> >> https://github.com/apache/hive/blob/master/ql/src/java/org/ >> apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L850 >> >> If you're doing active work on this, I'd like to help, because I need the >> AST -> query to debug CBO. >> >> > The use case, if you are interested, is a mutation testing framework >> for HQL. The testing of mutants is operational, but now we need to report >> on survivors, hence the need to track back from specific query elements to >> character sequences in the original query string. >> >> This sounds a lot like the fuzzing random-query-gen used in Cloudera to >> have Impala vs Hive bug-for-bug compat. >> >> https://cwiki.apache.org/confluence/download/attachments/ >> 27362054/Random%20Query%20Gen-%20Hive%20Meetup.pptx >> >> Cheers, >> Gopal >> >> >> >> >> >> >> >> >> >> >> > >