clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708663987
Hey @alamb @lmwnshn I've been actually following the CMU 15-799 course (nights and weekend's mostly) and started working on a Rust port of the benchbase Java implementation after seieng this discussion, but I am also looking at Trino's TPCH generator and DuckDB's generator. I ported most of the randomness logic, I tried to keep it compatible (bug for bug), I am currently working on sanity checks for the RNG stuff but I am seeing some discrepancies . Example (my implementation) : ``` +-------------+--------------+-----------------------------------------------------------------------------------+ | r_regionkey | r_name | r_comment | +-------------+--------------+-----------------------------------------------------------------------------------+ | 0 | AFRICA | e. blithely special packages boost finally bold, quiet pains. furiously regular | | | | instructions cajole furiously! fina | +-------------+--------------+-----------------------------------------------------------------------------------+ | 1 | AMERICA | counts. ironic, even ideas use | +-------------+--------------+-----------------------------------------------------------------------------------+ | 2 | ASIA | , ironic platelets. regular, qu | +-------------+--------------+-----------------------------------------------------------------------------------+ | 3 | EUROPE | . slyly express dolphins use carefully. even | +-------------+--------------+-----------------------------------------------------------------------------------+ | 4 | MIDDLE EAST | n foxes. slowly unusual deposits might cajole blithely special theodolites. | | | | evenly express deposits sleep ca | +-------------+--------------+-----------------------------------------------------------------------------------+ ``` Duck DB's output after running `INSTALL tpch; LOAD tpch; CALL dbgen(sf = 1);select * from region`. ``` ┌─────────────┬─────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ r_regionkey │ r_name │ r_comment │ │ int32 │ varchar │ varchar │ ├─────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 0 │ AFRICA │ ar packages. regular excuses among the ironic requests cajole fluffily blithely final requests. furiously express p │ │ 1 │ AMERICA │ s are. furiously even pinto bea │ │ 2 │ ASIA │ c, special dependencies around │ │ 3 │ EUROPE │ e dolphins are furiously about the carefully │ │ 4 │ MIDDLE EAST │ foxes boost furiously along the carefully dogged tithes. slyly regular orbits according to the special epit │ └─────────────┴─────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` I am debugging this issue before I add support for the remaining tables which shouldn't take too long, my implementation is currently in a lib crate and i'll also add a cli crate. My goal is to potentially donate it to the [datafusion-contrib ](https://github.com/datafusion-contrib) organization and then remain as the maintainer this way we can coordinate how to integrate it into datafusion as an extension. I am keeping the repo private for now until I finish a 0.1.0 release but can invite you and others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org