TheBuilderJR commented on issue #13785: URL: https://github.com/apache/datafusion/issues/13785#issuecomment-2564053984
@alamb @zhuqi-lucas Thank you for the quick turnaround. I've rebased on top of your changes but still seem to see growth in query times as the data size grows for a relatively simple ordered query ``` SELECT * FROM revenue_logs ORDER BY timestamp_utc ASC LIMIT 15 ``` Here is the code for my read path ``` let config = ListingTableConfig::new_with_multi_paths( paths_str .into_iter() .map(|p| ListingTableUrl::parse(&p)) .collect::<Result<Vec<_>, _>>()? // Collect into Result<Vec<ListingTableUrl>, _> and propagate errors ) .with_schema(Arc::new(schema.clone())) .infer(&ctx.state()).await?; let config = ListingTableConfig { options: Some(ListingOptions { file_sort_order: vec![vec![col("timestamp_utc").sort(true, true)]], ..config.options.unwrap_or_else(|| ListingOptions::new(Arc::new(ParquetFormat::default()))) }), ..config }; let listing_table = ListingTable::try_new(config)?; ctx.register_table(table_name, Arc::new(listing_table))?; ``` Here is the code for my write path ``` df .clone() .write_parquet( file_path.to_str().ok_or(anyhow!("Invalid file path"))?, datafusion::dataframe::DataFrameWriteOptions ::default() .with_single_file_output(true) .with_sort_by(vec![col("timestamp_utc").sort(true, true)]), None ).await?; ``` Is this expected? I would have imagined the cost should be constant since you can use the sort constraint to always scan a constant number of rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org