alamb commented on code in PR #13664:
URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931145846


##########
datafusion/sqllogictest/test_files/create_external_table.slt:
##########
@@ -33,23 +33,23 @@ statement error DataFusion error: SQL error: 
ParserError\("Missing LOCATION clau
 CREATE EXTERNAL TABLE t STORED AS CSV
 
 # Option value is missing
-statement error DataFusion error: SQL error: ParserError\("Expected: string or 
numeric value, found: \)"\)
+statement error DataFusion error: SQL error: ParserError\("Expected: string or 
numeric value, found: \) at Line: 1, Column: 66"\)

Review Comment:
   😍 



##########
datafusion/common/src/column.rs:
##########
@@ -18,10 +18,11 @@
 //! Column
 
 use arrow_schema::{Field, FieldRef};
+use sqlparser::tokenizer::Span;

Review Comment:
   I think this direct dependence on `sqlparser` is non ideal (because it makes 
the DataFuson `Expr` directly dependent on the SQL parser version.
   
   That being said the only alternative I have is to basically add a `Span` in 
`DataFusion` that is basically a copy/paste version of the one in SQLParser



##########
datafusion/sql/tests/cases/diagnostic.rs:
##########
@@ -0,0 +1,231 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::HashMap;
+
+use datafusion_common::{Diagnostic, Result};
+use datafusion_sql::planner::{ParserOptions, SqlToRel};
+use regex::Regex;
+use sqlparser::{
+    dialect::GenericDialect,
+    parser::Parser,
+    tokenizer::{Location, Span},
+};
+
+use crate::{MockContextProvider, MockSessionState};
+
+fn do_query(sql: &'static str) -> Diagnostic {
+    let dialect = GenericDialect {};
+    let statement = Parser::new(&dialect)
+        .try_with_sql(sql)
+        .expect("unable to create parser")
+        .parse_statement()
+        .expect("unable to parse query");
+    let options = ParserOptions {
+        collect_spans: true,
+        ..ParserOptions::default()
+    };
+    let state = MockSessionState::default();
+    let context = MockContextProvider { state };
+    let sql_to_rel = SqlToRel::new_with_options(&context, options);
+    match sql_to_rel.sql_statement_to_plan(statement) {
+        Ok(_) => panic!("expected error"),
+        Err(err) => match err.diagnostic() {
+            Some(diag) => diag.clone(),
+            None => panic!("expected diagnostic"),
+        },
+    }
+}
+
+/// Given a query that contains tag delimited spans, returns a mapping from the
+/// span name to the [`Span`]. Tags are comments of the form `/*tag*/`. In case
+/// you want the same location to open two spans, or close open and open
+/// another, you can use '+' to separate the names (the order matters). The
+/// names of spans can only contain letters, digits, underscores, and dashes.
+///
+///
+/// ## Example
+///
+/// ```rust
+/// let spans = get_spans("SELECT /*myspan*/car/*myspan*/ FROM cars");

Review Comment:
   this is very cool



##########
datafusion/common/src/config.rs:
##########
@@ -250,6 +250,11 @@ config_namespace! {
         /// specified. The Arrow type system does not have a notion of maximum
         /// string length and thus DataFusion can not enforce such limits.
         pub support_varchar_with_length: bool, default = true
+
+        /// When set to true, the source locations relative to the original SQL

Review Comment:
   Is the idea that we will keep this off by default at first as we evaluate 
the impact it has on performance?



##########
datafusion/expr-common/Cargo.toml:
##########
@@ -41,3 +41,4 @@ arrow = { workspace = true }
 datafusion-common = { workspace = true }
 itertools = { workspace = true }
 paste = "^1.0"
+sqlparser = { workspace = true }

Review Comment:
   this new dependecy is also unfortunate, but I see it is required as the 
BinaryTypeCoercer needs to know about spans



##########
datafusion/expr/src/expr.rs:
##########
@@ -1663,6 +1667,13 @@ impl Expr {
             | Expr::Placeholder(..) => false,
         }
     }
+
+    pub fn spans(&self) -> Option<&Spans> {

Review Comment:
   Can you please add comments about what `Spans` means in this context?



##########
datafusion/common/src/column.rs:
##########
@@ -34,6 +35,7 @@ pub struct Column {
     pub relation: Option<TableReference>,
     /// field/column name.
     pub name: String,
+    pub spans: Spans,

Review Comment:
   ```suggestion
       /// Original source code location, if known
       pub spans: Spans,
   ```



##########
datafusion/common/src/diagnostic.rs:
##########
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use sqlparser::tokenizer::Span;
+
+/// Additional contextual information intended for end users, to help them
+/// understand what went wrong by providing human-readable messages, and
+/// locations in the source query that relate to the error in some way.
+///
+/// You can think of a single [`Diagnostic`] as a single "block" of output from
+/// rustc. i.e. either an error or a warning, optionally with some notes and
+/// help messages.
+///
+/// If the diagnostic, a note, or a help message doesn't need to point to a
+/// specific location in the original SQL query (or the [`Span`] is not
+/// available), use [`Span::empty`].
+///
+/// Example:
+///
+/// ```rust
+/// # use datafusion_common::Diagnostic;
+/// # use sqlparser::tokenizer::Span;
+/// # let span = Span::empty();
+/// let diagnostic = Diagnostic::new_error("Something went wrong", span)
+///     .with_help("Have you tried turning it on and off again?", 
Span::empty());
+/// ```
+#[derive(Debug, Clone)]
+pub struct Diagnostic {
+    pub kind: DiagnosticKind,
+    pub message: String,
+    pub span: Span,

Review Comment:
   Since we don't always have a `Span` I think this would make more sense to be 
   
   ```suggestion
       pub span: Option<Span>,
   ```
   
   Which would make it easier to tell when we didn't have any Span information 
(and would likely avoid the use of `first_or_empty`)



##########
datafusion/sql/tests/cases/diagnostic.rs:
##########
@@ -0,0 +1,231 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::HashMap;

Review Comment:
   This is a really nice way to test things



##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -68,11 +70,57 @@ impl Signature {
     }
 }
 
-/// Returns a [`Signature`] for applying `op` to arguments of type `lhs` and 
`rhs`
-fn signature(lhs: &DataType, op: &Operator, rhs: &DataType) -> 
Result<Signature> {
-    use arrow::datatypes::DataType::*;
-    use Operator::*;
-    match op {
+pub struct BinaryTypeCoercer<'a> {
+    lhs: &'a DataType,
+    op: &'a Operator,
+    rhs: &'a DataType,
+
+    lhs_spans: Spans,

Review Comment:
   What about simply having a single `spans: Span` that is computed on 
construction? Is there any reason to keep around the different set of spans?



##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -68,11 +70,57 @@ impl Signature {
     }
 }
 
-/// Returns a [`Signature`] for applying `op` to arguments of type `lhs` and 
`rhs`
-fn signature(lhs: &DataType, op: &Operator, rhs: &DataType) -> 
Result<Signature> {
-    use arrow::datatypes::DataType::*;
-    use Operator::*;
-    match op {
+pub struct BinaryTypeCoercer<'a> {

Review Comment:
   I like the idea of using this struct rather than a free function. If we are 
going to make it a `pub struct` I think it should have comments explaining what 
it does



##########
datafusion/expr/src/expr_rewriter/mod.rs:
##########
@@ -181,10 +178,14 @@ pub fn create_col_from_scalar_expr(
             Some::<TableReference>(subqry_alias.into()),
             name,
         )),
-        Expr::Column(Column { relation: _, name }) => Ok(Column::new(
-            Some::<TableReference>(subqry_alias.into()),
+        Expr::Column(Column {
+            relation: _,
             name,
-        )),
+            spans,
+        }) => Ok(
+            Column::new(Some::<TableReference>(subqry_alias.into()), name)
+                .with_spans(spans.clone()),
+        ),

Review Comment:
   You might be able to make this look nicer (and less likely to miss the span) 
via something like
   
   ```rust
           Expr::Column(col) => col.with_qualifier(subqry_alias.into())
   ```



##########
datafusion/sql/src/expr/mod.rs:
##########
@@ -165,7 +165,9 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
         schema: &DFSchema,
         planner_context: &mut PlannerContext,
     ) -> Result<Expr> {
-        let mut expr = self.sql_expr_to_logical_expr(sql, schema, 
planner_context)?;
+        // The location of the original SQL expression in the source code
+        let mut expr =
+            self.sql_expr_to_logical_expr(sql.clone(), schema, 
planner_context)?;

Review Comment:
   We can now remove this clone, right? I tried it locally and it worked fine



##########
datafusion/expr-common/src/type_coercion/binary.rs:
##########
@@ -68,11 +70,57 @@ impl Signature {
     }
 }
 
-/// Returns a [`Signature`] for applying `op` to arguments of type `lhs` and 
`rhs`
-fn signature(lhs: &DataType, op: &Operator, rhs: &DataType) -> 
Result<Signature> {
-    use arrow::datatypes::DataType::*;
-    use Operator::*;
-    match op {
+pub struct BinaryTypeCoercer<'a> {
+    lhs: &'a DataType,
+    op: &'a Operator,
+    rhs: &'a DataType,
+
+    lhs_spans: Spans,
+    op_spans: Spans,
+    rhs_spans: Spans,
+}
+
+impl<'a> BinaryTypeCoercer<'a> {
+    pub fn new(lhs: &'a DataType, op: &'a Operator, rhs: &'a DataType) -> Self 
{
+        Self {
+            lhs,
+            op,
+            rhs,
+            lhs_spans: Spans::new(),
+            op_spans: Spans::new(),
+            rhs_spans: Spans::new(),
+        }
+    }
+
+    pub fn set_lhs_spans(&mut self, spans: Spans) {
+        self.lhs_spans = spans;
+    }
+
+    pub fn set_op_spans(&mut self, spans: Spans) {
+        self.op_spans = spans;
+    }
+
+    pub fn set_rhs_spans(&mut self, spans: Spans) {
+        self.rhs_spans = spans;
+    }
+
+    fn span(&self) -> Span {
+        Span::union_iter(
+            [
+                self.lhs_spans.first_or_empty(),

Review Comment:
   Why use `first_or_empty` if you are just going to union them together?
   
   Would this be the same?
   
   ```rust
           Span::union_iter(
               
self.lhs_spans.iter().chain(self.op_spans.iter()).chain(self.rhs_spans.iter())
                   .filter(|&&span| span != Span::empty())
               .copied()
           )
   ```
   
   (no change needed, just checking my understanding)



##########
datafusion/sql/src/set_expr.rs:
##########
@@ -36,8 +40,17 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                 right,
                 set_quantifier,
             } => {
-                let left_plan = self.set_expr_to_plan(*left, planner_context)?;
-                let right_plan = self.set_expr_to_plan(*right, 
planner_context)?;
+                let left_plan = self.set_expr_to_plan(*left.clone(), 
planner_context)?;

Review Comment:
   Can we please try and avoid this clone ? It looks like only the left and 
right spans are needed so we could copy them out rather than the entire set 
expr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to