davisp opened a new issue, #17349: URL: https://github.com/apache/datafusion/issues/17349
### Describe the bug I got nerd sniped by a question in the DataFusion Discord channel into seeing if I couldn't implement something similar to [BigQuery's gap_fill function](https://cloud.google.com/bigquery/docs/reference/standard-sql/time-series-functions#gap_fill). I managed to get the table variant working with slightly different syntax, but the interesting bit was when I tried to pass a Subquery expression as an argument. The bad query looks like such: ```sql SELECT * FROM gap_fill((SELECT * FROM my_table), ts, INTERVAL 1 SECOND); ``` When I print out the `&[Exprs]` received by the `gap_fill` UDTF I get: ``` exprs: [ Column( Column { relation: None, name: "ts", }, ), Literal( IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }"), None, ), ] ``` I would have expected to either see a `Subquery` expression to be included or for some other parser/logical_plan error. Either way, having no error but the argument being elided seems like a bug. ### To Reproduce ```rust // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information // regarding copyright ownership. The ASF licenses this file // to you under the Apache License, Version 2.0 (the // "License"); you may not use this file except in compliance // with the License. You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, // software distributed under the License is distributed on an // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, either express or implied. See the License for the // specific language governing permissions and limitations // under the License. use std::sync::Arc; use async_trait::async_trait; use datafusion::arrow::datatypes::SchemaRef; use datafusion::arrow::record_batch::RecordBatch; use datafusion::catalog::Session; use datafusion::catalog::TableFunctionImpl; use datafusion::common::arrow::datatypes::Schema; use datafusion::datasource::memory::MemorySourceConfig; use datafusion::datasource::TableProvider; use datafusion::error::Result; use datafusion::logical_expr::{Expr, TableType}; use datafusion::physical_plan::ExecutionPlan; use datafusion::prelude::*; #[tokio::main] async fn main() -> Result<()> { // create local execution context let ctx = SessionContext::new(); // register the table function that will be called in SQL statements by `read_csv` ctx.register_udtf("gap_fill", Arc::new(GapFillTableFunc {})); let df = ctx .sql("CREATE TABLE test(ts TIMESTAMP, value BIGINT);") .await?; df.show().await?; let df = ctx .sql("SELECT * FROM gap_fill(test.ts, INTERVAL 1 seconds);") .await?; df.show().await?; let df = ctx .sql("SELECT * FROM gap_fill((SELECT * FROM test), ts, INTERVAL 1 seconds);") .await?; df.show().await?; Ok(()) } #[derive(Debug)] struct GapFillTableFunc {} impl TableFunctionImpl for GapFillTableFunc { fn call(&self, exprs: &[Expr]) -> Result<Arc<dyn TableProvider>> { eprintln!("exprs: {:#?}", exprs); let schema = Arc::new(Schema::empty()); let table = MyTable { schema, batches: Default::default(), }; Ok(Arc::new(table)) } } #[derive(Debug)] struct MyTable { schema: SchemaRef, batches: Vec<RecordBatch>, } #[async_trait] impl TableProvider for MyTable { fn as_any(&self) -> &dyn std::any::Any { self } fn schema(&self) -> SchemaRef { self.schema.clone() } fn table_type(&self) -> TableType { TableType::Base } async fn scan( &self, _state: &dyn Session, projection: Option<&Vec<usize>>, _filters: &[Expr], _limit: Option<usize>, ) -> Result<Arc<dyn ExecutionPlan>> { Ok(MemorySourceConfig::try_new_exec( &[self.batches.clone()], TableProvider::schema(self), projection.cloned(), )?) } } ``` Output: ``` ++ ++ exprs: [ Column( Column { relation: Some( Bare { table: "test", }, ), name: "ts", }, ), Literal( IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }"), None, ), ] ++ ++ exprs: [ Column( Column { relation: None, name: "ts", }, ), Literal( IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }"), None, ), ] ++ ++ ``` ### Expected behavior I would expect to either see some sort of error or to have that Subquery expression be part of the expression list passed to the function. Silently discarding it seems like a bug. My personal preference would be allow Subquery expressions in arguments as it seems generally useful, but I may be missing a reason that's not possible. ### Additional context Also of note, I tried being a little sneaky and attempted to smuggle the subquery expression through one of the other `Expr` variants: 1. `(SELECT * FROM test) AS foo` - Parse error at `AS` 2. `(SELECT * FROM test) OR true` - Silently elided 3. `1 IN (SELECT * FROM test)` - Silently elided 4. `NOT (SELECT * FROM test)` - Silently elided 5. `EXISTS (SELECT * FROM test)` - Holy moly, this one actually works! Might be interesting to run down why the `IN` variant doesn't work given that `EXISTS` does as both of those allow for subqueries while the others are fairly non-sensical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org