findepi commented on code in PR #13706: URL: https://github.com/apache/datafusion/pull/13706#discussion_r1879654987
########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,53 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] Review Comment: ```suggestion - The SQL parser and [SQL planner] ``` ########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,53 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] +- Type checking, analyzer, and type coercions +- Semantics of functions bundled with DataFusion + +Notable exceptions: + +- Array/List functions and semantics follow the [DuckDB SQL dialect]. Review Comment: Not quite? DuckDB array seems to be fixed size. Is this saying DF array follows DuckDB's list semantics? ########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,53 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] +- Type checking, analyzer, and type coercions +- Semantics of functions bundled with DataFusion + +Notable exceptions: + +- Array/List functions and semantics follow the [DuckDB SQL dialect]. +- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. +- DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) + +As Apache DataFusion is designed to be fully customizable, systems built on +DataFusion can and do implement different SQL semantics. Using DataFusion's APs, Review Comment: ```suggestion DataFusion can and do implement different SQL semantics. Using DataFusion's APIs, ``` ########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,38 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +By default, Apache DataFusion follows the [PostgreSQL SQL dialect]. +For Array/List functions and semantics, it follows the [DuckDB SQL dialect]. + +[duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array +[postgresql sql dialect]: https://www.postgresql.org/docs/current/sql.html Review Comment: I agree we shouldn't follow PostgreSQL arrays as the guide. PostgreSQL arrays are not structured. You can declare dimensions of an array column, but that's advisory only. Each value can be array of different dimensions. That won't work well with Arrow type system - there is no arrow type that corresponds to PostgreSQL array type. But we need to admit the fact that we cannot just say "we follow dialect/database X, but for subcomponent we follow dialect/database Y", as this leads to inconsistencies. Yes, we will address needs of downstream projects by providing necessary extension points. In the worst case, they can override "everything". But DataFusion also has its frontend and it should strive to be consistent. Consistency is important both to end users and also to people building on top of datafusion. ########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,53 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] +- Type checking, analyzer, and type coercions +- Semantics of functions bundled with DataFusion + +Notable exceptions: + +- Array/List functions and semantics follow the [DuckDB SQL dialect]. +- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. +- DataFusion has its own syntax (dialect) for certain operations (like [`CREATE EXTERNAL TABLE`]) + +As Apache DataFusion is designed to be fully customizable, systems built on +DataFusion can and do implement different SQL semantics. Using DataFusion's APs, +you can provide alternate function definitions, type rules, and/or SQL syntax +that matches other systems such as Apache Spark or MySQL or your own custom +semantics. + +[postgresql sql dialect]: https://www.postgresql.org/docs/current/sql.html +[sql planner]: https://docs.rs/datafusion/latest/datafusion/sql/planner/struct.SqlToRel.html +[duckdb sql dialect]: https://duckdb.org/docs/sql/functions/array +[apache arrow type system]: https://arrow.apache.org/docs/format/Columnar.html#data-types +[`create external table`]: ddl.md#create-external-table + +## Rationale + +SQL Engines have a choice to either use an existing SQL dialect or define their +own. Using an existing dialect may not fit perfectly as it is hard to match +semantics exactly (need bug-for-bug compatibility), and is likely not what all Review Comment: > rather well aligned w/ the SQL standard (at least that's my personal impression, after having faced MySQL) mostly true (but i know of some deviations) if we wanted something "executable but also aligned with SQL std", I'd recommend Trino i kind of assumed PostgreSQL ship has sailed and we're just retro-documenting. But if the ball (choice) is still in play, my vote goes to Trino as a good reference implementation. ########## docs/source/user-guide/sql/dialect.md: ########## @@ -0,0 +1,53 @@ +<!--- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# SQL Dialect + +The included SQL supported in Apache DataFusion mostly follows the [PostgreSQL +SQL dialect], including: + +- The sql parser and [SQL planner] +- Type checking, analyzer, and type coercions +- Semantics of functions bundled with DataFusion + +Notable exceptions: + +- Array/List functions and semantics follow the [DuckDB SQL dialect]. +- DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgrSQL types is not always 1:1. Review Comment: ```suggestion - DataFusion's type system is based on the [Apache Arrow type system], and the mapping to PostgreSQL types is not always 1:1. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
