Re: [Rust] DataFusion TPCH benchmark overview

2021-02-06 Thread Daniël Heres
Update: I created the list of issues in JIRA so we can keep track of progress there. https://issues.apache.org/jira/browse/ARROW-11519https://issues.apache.org/jira/browse/ARROW-11519 Op do 4 feb. 2021 om 12:45 schreef Daniël Heres : > Thanks all for your input! > > I will create an umbrella tic

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-04 Thread Daniël Heres
Thanks all for your input! I will create an umbrella ticket + linked failures / issues to track progress for TPCH support coming days and will share it here. Daniël Op do 4 feb. 2021 om 00:13 schreef Andrew Lamb : > This is awesome, thank you Daniel. I agree that focusing on enough SQL for > TP

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-04 Thread Fernando Herrera
For me that would be great. I'm going to start reading the code and see what I can write to the arrow guide I'm working on. Thanks On Thu, 4 Feb 2021, 11:28 Andrew Lamb, wrote: > H Fernando, yes I would be delighted. > > I am planning on creating a high level overview w/ slides as a Tech Talk >

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-04 Thread Andrew Lamb
H Fernando, yes I would be delighted. I am planning on creating a high level overview w/ slides as a Tech Talk (for work, but will be open to the public) sometime in March. How about I pull together some initial material, and then I can share that / go over it with anyone who is interested? What

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-04 Thread Fernando Herrera
Hi Andrew, I would like to work a little bit more on Datafusion, so I was wondering if you could give a small walkthrough of the code and how the queries are constructed. Do you think that could be possible? Fernando On Wed, Feb 3, 2021 at 11:13 PM Andrew Lamb wrote: > This is awesome, thank yo

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-03 Thread Andrew Lamb
This is awesome, thank you Daniel. I agree that focusing on enough SQL for TPCH queries would be a great idea and way to focus our efforts. Subqueries may be the largest remaining outstanding item that I see -- I have some ideas of how to implement them on the planner side if others are interested

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-03 Thread Andy Grove
Thanks for the update on this, Daniël. It is great to see the progress with this! Perhaps it is worth creating one JIRA issue per failing query detailing the errors and we can link these to the issues that are causing the failures? On Wed, Feb 3, 2021 at 1:57 PM Mike Seddon wrote: > Hi Daniël,

Re: [Rust] DataFusion TPCH benchmark overview

2021-02-03 Thread Mike Seddon
Hi Daniël, I am working on 22 as part of https://github.com/apache/arrow/pull/9243 We also need to convert all the Float64 schema types to Decimal(n). Cheers, Mike On Thu, Feb 4, 2021 at 5:44 AM Daniël Heres wrote: > Hey all, > > Quite some features have been added to DataFusion in the last c

[Rust] DataFusion TPCH benchmark overview

2021-02-03 Thread Daniël Heres
Hey all, Quite some features have been added to DataFusion in the last couple of months. One test of the functionality we support this is the TPC-H benchmark. We now can run 7 out of 22 queries without errors. I think a nice goal would be having complete support for the full suite, as it means a