Dandandan commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2690995317
Nice, hopefully we can find ways to improve joins further :muscle:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
Dandandan merged PR #14902:
URL: https://github.com/apache/datafusion/pull/14902
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@data
zhuqi-lucas commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687230304
> Thank you for the benchmark, I've tested it locally and it's working well.
I have several small suggestions:
>
> 1. Add document for this new join benchmark
https://gith
zhuqi-lucas commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687169996
Thanks @2010YOUY01 @SemyonSinchenko for review , I tried again, it's not a
problem for me now, and previously may due to my disk is not enough, i cleaned
up some disk usage.
2010YOUY01 commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687165121
Thank you for the benchmark, I've tested it locally and it's working well. I
have several small suggestions:
1. Add document for this new join benchmark
https://github.com/apac
SemyonSinchenko commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687145015
@zhuqi-lucas Did you try to increase the `batch_size` argument? It is
designed to avoid OOMs but the small batch size can also reduce the generation
speed. If your computer h
2010YOUY01 commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2687143242
> > The data generation will take long time for big data.
>
> How bad is it? I can try to dig into the problem and try to improve it on
the side of `falsa` (generation libra
zhuqi-lucas commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2686901486
Thanks @SemyonSinchenko for review, not too bad for my case, and it takes
more time and it's expected for huge file generation. But my computer is 48GB
memory, i assume lower mem
SemyonSinchenko commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685735020
> The data generation will take long time for big data.
How bad is it? I can try to dig into the problem and try to improve it on
the side of `falsa` (generation librar
zhuqi-lucas commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685544998
Try to reproduce the
https://github.com/apache/datafusion/issues/13765
But current main branch, our join passed! It takes about 50s, it's a good
result! cc @alamb
10 matches
Mail list logo