Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-07-30 Thread via GitHub
theirix commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-3137673146 > @2010YOUY01 I'd like to double-check if a volatile filter pushdown to a Parquet executor is expected. In the mentioned PR, I disabled optimisation in a logical plan optimiser to pu

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-30 Thread via GitHub
milenkovicm commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-3021791270 I wonder would creating new physical plan operator to do per batch sampling avoid issues @theirix mentioned. Something similar to https://github.com/milenkovicm/ballista_

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-25 Thread via GitHub
theirix commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-3003725980 > I'd like to double-check if a volatile filter pushdown to a Parquet executor is expected. In the mentioned PR, I disabled optimisation in a logical plan optimiser to push down vola

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-18 Thread via GitHub
theirix commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-2985522134 > According to PostgreSQL's reference: https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation#SYSTEM_Option I believe `SYSTEM` option is equivalent to keep the entire `RecordBat

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-14 Thread via GitHub
theirix commented on code in PR #16325: URL: https://github.com/apache/datafusion/pull/16325#discussion_r2146926102 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4714,3 +4714,115 @@ fn test_using_join_wildcard_schema() { ] ); } + +#[test] Review Comment:

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-11 Thread via GitHub
theirix commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-2962290508 Thank you for the review and suggestions! I'll rework the testing approach and get back with the improved version. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-09 Thread via GitHub
xudong963 commented on code in PR #16325: URL: https://github.com/apache/datafusion/pull/16325#discussion_r2135409445 ## datafusion/sql/src/select.rs: ## @@ -77,11 +82,29 @@ impl SqlToRel<'_, S> { } // Process `from` clause -let plan = self.plan_from_

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-08 Thread via GitHub
2010YOUY01 commented on code in PR #16325: URL: https://github.com/apache/datafusion/pull/16325#discussion_r2135088072 ## datafusion/sql/src/select.rs: ## @@ -77,11 +82,29 @@ impl SqlToRel<'_, S> { } // Process `from` clause -let plan = self.plan_from

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-08 Thread via GitHub
2010YOUY01 commented on code in PR #16325: URL: https://github.com/apache/datafusion/pull/16325#discussion_r2135088072 ## datafusion/sql/src/select.rs: ## @@ -77,11 +82,29 @@ impl SqlToRel<'_, S> { } // Process `from` clause -let plan = self.plan_from