davisp opened a new issue, #17349:
URL: https://github.com/apache/datafusion/issues/17349

   ### Describe the bug
   
   I got nerd sniped by a question in the DataFusion Discord channel into 
seeing if I couldn't implement something similar to [BigQuery's gap_fill 
function](https://cloud.google.com/bigquery/docs/reference/standard-sql/time-series-functions#gap_fill).
 I managed to get the table variant working with slightly different syntax, but 
the interesting bit was when I tried to pass a Subquery expression as an 
argument. The bad query looks like such:
   
   ```sql
   SELECT * FROM gap_fill((SELECT * FROM my_table), ts, INTERVAL 1 SECOND);
   ```
   
   When I print out the `&[Exprs]` received by the `gap_fill` UDTF I get:
   
   ```
   exprs: [
       Column(
           Column {
               relation: None,
               name: "ts",
           },
       ),
       Literal(
           IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, 
nanoseconds: 1000000000 }"),
           None,
       ),
   ]
   ```
   
   I would have expected to either see a `Subquery` expression to be included 
or for some other parser/logical_plan error. Either way, having no error but 
the argument being elided seems like a bug.
   
   ### To Reproduce
   
   ```rust
   // Licensed to the Apache Software Foundation (ASF) under one
   // or more contributor license agreements.  See the NOTICE file
   // distributed with this work for additional information
   // regarding copyright ownership.  The ASF licenses this file
   // to you under the Apache License, Version 2.0 (the
   // "License"); you may not use this file except in compliance
   // with the License.  You may obtain a copy of the License at
   //
   //   http://www.apache.org/licenses/LICENSE-2.0
   //
   // Unless required by applicable law or agreed to in writing,
   // software distributed under the License is distributed on an
   // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   // KIND, either express or implied.  See the License for the
   // specific language governing permissions and limitations
   // under the License.
   
   use std::sync::Arc;
   
   use async_trait::async_trait;
   use datafusion::arrow::datatypes::SchemaRef;
   use datafusion::arrow::record_batch::RecordBatch;
   use datafusion::catalog::Session;
   use datafusion::catalog::TableFunctionImpl;
   use datafusion::common::arrow::datatypes::Schema;
   use datafusion::datasource::memory::MemorySourceConfig;
   use datafusion::datasource::TableProvider;
   use datafusion::error::Result;
   use datafusion::logical_expr::{Expr, TableType};
   use datafusion::physical_plan::ExecutionPlan;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       // create local execution context
       let ctx = SessionContext::new();
   
       // register the table function that will be called in SQL statements by 
`read_csv`
       ctx.register_udtf("gap_fill", Arc::new(GapFillTableFunc {}));
   
       let df = ctx
           .sql("CREATE TABLE test(ts TIMESTAMP, value BIGINT);")
           .await?;
       df.show().await?;
   
       let df = ctx
           .sql("SELECT * FROM gap_fill(test.ts, INTERVAL 1 seconds);")
           .await?;
       df.show().await?;
   
       let df = ctx
           .sql("SELECT * FROM gap_fill((SELECT * FROM test), ts, INTERVAL 1 
seconds);")
           .await?;
       df.show().await?;
   
       Ok(())
   }
   
   #[derive(Debug)]
   struct GapFillTableFunc {}
   
   impl TableFunctionImpl for GapFillTableFunc {
       fn call(&self, exprs: &[Expr]) -> Result<Arc<dyn TableProvider>> {
           eprintln!("exprs: {:#?}", exprs);
   
           let schema = Arc::new(Schema::empty());
   
           let table = MyTable {
               schema,
               batches: Default::default(),
           };
           Ok(Arc::new(table))
       }
   }
   
   #[derive(Debug)]
   struct MyTable {
       schema: SchemaRef,
       batches: Vec<RecordBatch>,
   }
   
   #[async_trait]
   impl TableProvider for MyTable {
       fn as_any(&self) -> &dyn std::any::Any {
           self
       }
   
       fn schema(&self) -> SchemaRef {
           self.schema.clone()
       }
   
       fn table_type(&self) -> TableType {
           TableType::Base
       }
   
       async fn scan(
           &self,
           _state: &dyn Session,
           projection: Option<&Vec<usize>>,
           _filters: &[Expr],
           _limit: Option<usize>,
       ) -> Result<Arc<dyn ExecutionPlan>> {
           Ok(MemorySourceConfig::try_new_exec(
               &[self.batches.clone()],
               TableProvider::schema(self),
               projection.cloned(),
           )?)
       }
   }
   ```
   
   Output:
   
   ```
   ++
   ++
   exprs: [
       Column(
           Column {
               relation: Some(
                   Bare {
                       table: "test",
                   },
               ),
               name: "ts",
           },
       ),
       Literal(
           IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, 
nanoseconds: 1000000000 }"),
           None,
       ),
   ]
   ++
   ++
   exprs: [
       Column(
           Column {
               relation: None,
               name: "ts",
           },
       ),
       Literal(
           IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, 
nanoseconds: 1000000000 }"),
           None,
       ),
   ]
   ++
   ++
   ```
   
   ### Expected behavior
   
   I would expect to either see some sort of error or to have that Subquery 
expression be part of the expression list passed to the function. Silently 
discarding it seems like a bug. My personal preference would be allow Subquery 
expressions in arguments as it seems generally useful, but I may be missing a 
reason that's not possible.
   
   ### Additional context
   
   Also of note,  I tried being a little sneaky and attempted to smuggle the 
subquery expression through one of the other `Expr` variants:
   
   1. `(SELECT * FROM test) AS foo` - Parse error at `AS`
   2. `(SELECT * FROM test) OR true` - Silently elided
   3. `1 IN (SELECT * FROM test)` - Silently elided
   4. `NOT (SELECT * FROM test)` - Silently elided
   5. `EXISTS (SELECT * FROM test)` - Holy moly, this one actually works!
   
   Might be interesting to run down why the `IN` variant doesn't work given 
that `EXISTS` does as both of those allow for subqueries while the others are 
fairly non-sensical.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to