Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-28 Thread via GitHub
Dandandan commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2690995317 Nice, hopefully we can find ways to improve joins further :muscle: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-28 Thread via GitHub
Dandandan merged PR #14902: URL: https://github.com/apache/datafusion/pull/14902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687230304 > Thank you for the benchmark, I've tested it locally and it's working well. I have several small suggestions: > > 1. Add document for this new join benchmark https://gith

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687169996 Thanks @2010YOUY01 @SemyonSinchenko for review , I tried again, it's not a problem for me now, and previously may due to my disk is not enough, i cleaned up some disk usage.

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687165121 Thank you for the benchmark, I've tested it locally and it's working well. I have several small suggestions: 1. Add document for this new join benchmark https://github.com/apac

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
SemyonSinchenko commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687145015 @zhuqi-lucas Did you try to increase the `batch_size` argument? It is designed to avoid OOMs but the small batch size can also reduce the generation speed. If your computer h

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
2010YOUY01 commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687143242 > > The data generation will take long time for big data. > > How bad is it? I can try to dig into the problem and try to improve it on the side of `falsa` (generation libra

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2686901486 Thanks @SemyonSinchenko for review, not too bad for my case, and it takes more time and it's expected for huge file generation. But my computer is 48GB memory, i assume lower mem

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
SemyonSinchenko commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685735020 > The data generation will take long time for big data. How bad is it? I can try to dig into the problem and try to improve it on the side of `falsa` (generation librar

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-26 Thread via GitHub
zhuqi-lucas commented on PR #14902: URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685544998 Try to reproduce the https://github.com/apache/datafusion/issues/13765 But current main branch, our join passed! It takes about 50s, it's a good result! cc @alamb