alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-3016555361
I have been thinking about this one a lot and I am sorry I haven't written
thus up before now. I was trying to collect my thoughts.
I feel the core isssue is a tension betw
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2954201095
Sorry for not having a chance to test this work earlier @clflushopt
I really look forward to checking it out and will try to do so later this
week.
--
This is an
kevinjqliu commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2952854828
This is great, thanks @clflushopt
I couldn't find a way to use datafusion to write multiple parquet files, but
i think this is a limitation with datafusion's `COPY` co
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2948048431
Hey @alamb following suggestions from @kevinjqliu I am happy to say that
https://github.com/clflushopt/datafusion-tpch provides a ux on par with duckdb
and what we discussed
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2907767246
Thansk @clflushopt -- I'll try and check this out tomorrow or Tuesday
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2906345545
Hey @alamb @kevinjqliu I have individual TPCH table generators working fine
https://github.com/clflushopt/datafusion-tpch/blob/main/src/lib.rs but I am
still scratching my he
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2856948236
I was stuck trying to decide between a scalar function or a table function
for `tpchgen(sf)` I really like your suggestion @alamb thanks for unblocking.
I'll have a v0.1.0 do
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2851315858
I think we could use a user defined **TABLE** function:
https://datafusion.apache.org/library-user-guide/adding-udfs.html#adding-a-user-defined-table-function
So tha
kevinjqliu commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2848681742
sounds good to me!
`SELECT * FROM lineitem(1.0)` makes sense
`SELECT 1 FROM tpchgen(1.0)` looks a bit odd but i cant think of a better
alternative
--
This is an autom
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2844075397
I agree with @alamb on this one, regarding the separation of creation &
storing the files on disk explicitly. One suggestion I would propose is that I
would add a scalar func
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2837210261
> duckdb's CALL dbgen(sf = 1); creates tables in the current schema and then
populates those tables with data using its own format.
The other thing we can do is to just make
kevinjqliu commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2816875067
> In order to try and make progress on this, I decided to go with having a
single function that builds all tables for a single scale factor similar to how
DuckDB does it. My
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2811379764
In order to try and make progress on this, I decided to go with having a
single function that builds all tables for a single scale factor similar to how
DuckDB does it. My re
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2810241760
> [@alamb](https://github.com/alamb) Yes once I address the couple of
prioritized issues I have open for `v1.0.0` the next step will be to work on
the integration, I agree with ha
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2798250520
@alamb Yes once I address the couple of prioritized issues I have open for
`v1.0.0` the next step will be to work on the integration, I agree with having
table functions but
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2797923658
> I just read your blogpost today, and I am really happy to have a faster
generator. The post focussed on generating tpc-h to files, but I see you also
discussed something like th
m-mueller678 commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2797014499
I just read your blogpost today, and I am really happy to have a faster
generator. The post focussed on generating tpc-h to files, but I see you also
discussed something l
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2789578819
@clflushopt -- do you have any next steps planned for this projec?
I think tpchgen is basically ready / done (though I predict we may get a
flurry of additional interest on
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2779688162
We have drafted a blog about this project in case anyone wants to review /
check it out:
- https://github.com/apache/datafusion-site/pull/67
--
This is an automated message f
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2770127700
@scsmithr of GlareDB integrated the tpchgen library in glaredb as a table
function
- https://github.com/GlareDB/glaredb/pull/3549
Which is quite cool
```shell
g
matthewmturner commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2764969834
@clflushopt awesome, im really excited to add this to dft - it will be the
next item i work on. will let you know if any questions / comments.
--
This is an automated m
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2764746130
I think the next question in my mind is exactly how to integrate this into
datafusion-cli
We could follow the model of duckdb and create a table function like
`dbgen(sf = 1
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2764733544
@lmwnshn @matthewmturner we now have a live crate for integrations
https://crates.io/crates/tpchgen and a cli available
https://github.com/clflushopt/tpchgen-rs special thank
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2749315813
> Good to see this rust generator. We have adopted it in our database
projection for benchmarking.
Thanks @niebayes -- here is a preview of what I am currently working on
niebayes commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2746782663
Good to see this rust generator. We have adopted it in our database
projection for benchmarking.
--
This is an automated message from the Apache Git Service.
To respond to th
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2722682304
@clflushopt has some very cool ideas for testing in tpchdbgen-rs
Specifically we verified that the output data (for SF 0.001 and SF 0.01) is
byte-for-byte identical with db
lmwnshn commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2721519020
@clflushopt Nice work! Re: randomness, the TPC-H spec has a "qualification
database" (dataset) with specific "query validation" tests (instantiating the
SQL queries with specifi
matthewmturner commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2712539973
@clflushopt this is _awesome_. Once you release I will likely add this to
[dft](https://github.com/datafusion-contrib/datafusion-dft).
--
This is an automated message
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2712464986
For anyone following this issue I have a full port here
https://github.com/clflushopt/tpchgen-rs and I am working on completing a first
release (I have issues to track that m
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2710564179
@alamb Hey yeah sorry it just by habit I like to complete things before
"releasing" them, but just made it open !
--
This is an automated message from the Apache Git Servic
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2710281339
> Hey [@alamb](https://github.com/alamb) as of today I have a fully working
implementation that matches Apache Trino and OLTPBenchmark's, I found the issue
I mentionned in the mes
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2709364715
Hey @alamb as of today I have a fully working implementation that matches
Apache Trino and OLTPBenchmark's, I found the issue I mentionned in the message
above which was due
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708856273
> My goal is to potentially donate it to the [datafusion-contrib
](https://github.com/datafusion-contrib) organization and then keep maintaining
it there this way we can coordinat
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708672112
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
clflushopt commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2708663987
Hey @alamb @lmwnshn I've been actually following the CMU 15-799 course
(nights and weekend's mostly) and started working on a Rust port of the
benchbase Java implementation a
lmwnshn commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2651044600
If you prefer Java to C, CMU-DB's BenchBase project does implement support
for generating and loading TPC-H data in parallel:
https://github.com/cmu-db/benchbase/tree/main/src/m
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2651062245
Thanks @lmwnshn -- the Java implementation might be easier to transliterate
to Rust...
Also BTW I am pretty sure other rust data projects would be interested in a
Rust imp
37 matches
Mail list logo