Viicos opened a new issue, #2236: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2236
https://github.com/apache/datafusion-sqlparser-rs/pull/235 introduced support for HiveQL, and modified how CTEs are parsed to [parse an additional `FROM` keyword](https://github.com/apache/datafusion-sqlparser-rs/commit/9d9d681cbabf31d1d07ad166da7a1fd87d07d960#diff-4a04259da480a6b794a2e947e4cc03eff4d1aa9330836f5b91cac68c5398193fR2182-L2173). A user [reported some questions](https://github.com/apache/datafusion-sqlparser-rs/pull/235#issuecomment-1189199817) on the PR, and looking at the documentation links provided, it seems like HiveQL has the ability to use `FROM` directly after a CTE, but it is unclear what for. [This link](https://docs-archive.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/using-hiveql/content/hive_create_a_table_using_a_cte.html) shows an example to insert from a CTE, and [this one](https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression) shows a `SELECT` statement, using the _FROM_ first variant (but it also seems like the Hive dialect doesn't has [`supports_from_first_select()`](https://github.com/apache/datafusion-sqlparser-rs/blob/d9b53a0cdb369124d9b6ce6237959e66bad859af/src/dialect/mod.rs#L640-L649)?). The issue is that when using the generic dialect (or dialects supporting from first), the parsing of the FROM keyword breaks, e.g.: ```sql WITH test AS (FROM t SELECT a) FROM test SELECT a ``` The AST looks like (reduced for visibility): ```rs Query( Query { with: Some( With { with_token: TokenWithSpan { token: Word( Word { value: "WITH", quote_style: None, keyword: WITH, }, ), span: Span(Location(1,1)..Location(1,5)), }, recursive: false, cte_tables: [ Cte { alias: TableAlias { name: Ident { value: "test", quote_style: None, span: Span(Location(1,6)..Location(1,10)), }, columns: [], }, query: Query { with: None, body: Select( Select { select_token: Some( TokenWithSpan { token: Word( Word { value: "SELECT", quote_style: None, keyword: SELECT, }, ), span: Span(Location(1,22)..Location(1,28)), }, ), projection: [ UnnamedExpr( Identifier( Ident { value: "a", quote_style: None, span: Span(Location(1,29)..Location(1,30)), }, ), ), ], from: [ TableWithJoins { relation: Table { name: ObjectName( [ Identifier( Ident { value: "t", quote_style: None, span: Span(Location(1,20)..Location(1,21)), }, ), ], ), }, }, ], flavor: FromFirst, }, ), }, from: Some( // CTE parsed the FROM Ident { value: "test", quote_style: None, span: Span(Location(1,37)..Location(1,41)), }, ), closing_paren_token: TokenWithSpan { token: RParen, span: Span(Location(1,30)..Location(1,31)), }, }, ], }, ), body: Select( Select { select_token: Some( TokenWithSpan { token: Word( Word { value: "SELECT", quote_style: None, keyword: SELECT, }, ), span: Span(Location(1,42)..Location(1,48)), }, ), from_token: None, // The actual SELECT query doesn't have the FROM projection: [ UnnamedExpr( Identifier( Ident { value: "a", quote_style: None, span: Span(Location(1,49)..Location(1,50)), }, ), ), ], from: [], // and no FROM available flavor: Standard, }, ), }, ) ``` I think the simplest fix (although not ideal according to https://github.com/apache/datafusion-sqlparser-rs/issues/1430) would be to gate the parsing of the FROM keyword in CTEs only if the current dialect is Hive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
