Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2780559411 > I think one of the asks in the original post was additional tests. Oh, checked the issue again and got it lol -- This is an automated message from the Apache Git Ser

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-05 Thread via GitHub
suremarc commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2780386970 Also, there are two other issues I'd like to call out: ## Unit tests for `FileScanConfig::split_groups_by_statistics_with_target_partitions` There are some table-drive

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-05 Thread via GitHub
suremarc commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2780379669 > I'll open a follow-up PR to make it default I think one of the asks in the original post was additional tests. I think some of the asks are already covered in the sqllo

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-04 Thread via GitHub
xudong963 commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2780179649 I'll open a follow-up PR to make it default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-04 Thread via GitHub
alamb closed issue #10336: Enable `split_file_groups_by_statistics` by default URL: https://github.com/apache/datafusion/issues/10336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-04-04 Thread via GitHub
leoyvens commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2779659064 Should this issue have been closed? Did #15473 change default behaviour? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-03-28 Thread via GitHub
xudong963 commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2760506147 Fyi, I'm working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2025-03-27 Thread via GitHub
suremarc commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2758082825 Leaving some thoughts here as I was asked in [another issue](https://github.com/apache/datafusion/issues/15191#issuecomment-2756831956) about what it would take to turn this fe

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-07-29 Thread via GitHub
alamb commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2256432095 Sorry for the delay @leoyvens and thank you for this analysis > https://github.com/apache/datafusion/issues/11170 I would personally love to take this approach

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-07-23 Thread via GitHub
leoyvens commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2246022064 One thing I've noticed is that after DataFusion 40 this actually works in my use case, likely thanks to the statistics code getting fixed, so good news there! It does require a

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-05-04 Thread via GitHub
alamb commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2094127979 THank you @yyy1000 🙏 I think a good place to start would be to write some sqllogic level tests to cover the important cases Perhaos for the first test: 1. Create

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-05-03 Thread via GitHub
yyy1000 commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2093968410 I'd like to help it. 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Enable `split_file_groups_by_statistics` by default [datafusion]

2024-05-01 Thread via GitHub
alamb commented on issue #10336: URL: https://github.com/apache/datafusion/issues/10336#issuecomment-2089121776 Example test coverage we should add I think: https://github.com/apache/datafusion/pull/9593#discussion_r1585517605 -- This is an automated message from the Apache Git Service. T