TheBuilderJR commented on issue #13785:
URL: https://github.com/apache/datafusion/issues/13785#issuecomment-2564053984

   @alamb @zhuqi-lucas 
   
   Thank you for the quick turnaround. I've rebased on top of your changes but 
still seem to see growth in query times as the data size grows for a relatively 
simple ordered query
   
   ```
   SELECT * FROM revenue_logs ORDER BY timestamp_utc ASC LIMIT 15
   ```
   
   Here is the code for my read path
   ```
                       let config = ListingTableConfig::new_with_multi_paths(
                           paths_str
                               .into_iter()
                               .map(|p| ListingTableUrl::parse(&p))
                               .collect::<Result<Vec<_>, _>>()? // Collect into 
Result<Vec<ListingTableUrl>, _> and propagate errors
                       )
                           .with_schema(Arc::new(schema.clone()))
                           .infer(&ctx.state()).await?;
   
                       let config = ListingTableConfig {
                           options: Some(ListingOptions {
                               file_sort_order: 
vec![vec![col("timestamp_utc").sort(true, true)]],
                               ..config.options.unwrap_or_else(|| 
ListingOptions::new(Arc::new(ParquetFormat::default())))
                           }),
                           ..config
                       };
   
                       let listing_table = ListingTable::try_new(config)?;
                       ctx.register_table(table_name, Arc::new(listing_table))?;
   ```
   
   Here is the code for my write path
   ```
           df
               .clone()
               .write_parquet(
                   file_path.to_str().ok_or(anyhow!("Invalid file path"))?,
                   datafusion::dataframe::DataFrameWriteOptions
                       ::default()
                       .with_single_file_output(true)
                       .with_sort_by(vec![col("timestamp_utc").sort(true, 
true)]),
                   None
               ).await?;
   ```
   
   Is this expected? I would have imagined the cost should be constant since 
you can use the sort constraint to always scan a constant number of rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to