pan3793 commented on code in PR #50857: URL: https://github.com/apache/spark/pull/50857#discussion_r2084503790
########## .github/workflows/benchmark.yml: ########## @@ -138,6 +142,7 @@ jobs: # To prevent spark.test.home not being set. See more detail in SPARK-36007. SPARK_HOME: ${{ github.workspace }} SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1 + SPARK_TPCDS_DATA_TEXT: ${{ github.workspace }}/tpcds-sf-1-text Review Comment: Hi @luben, in this round, I'm trying to use TPCDS-generated data for the zstd compression benchmark. The data can be generated by the following steps: - follow https://github.com/databricks/tpcds-kit to build - `mkdir -p tpcds-sf-1-text` - `tpcds-kit/tools/dsdgen -DISTRIBUTIONS tpcds-kit/tools/tpcds.idx -SCALE 1 -DIR tpcds-sf-1-text` And my local test shows that zstd-jni 1.5.6 and 1.5.7 are basically at the same level, and 1.5.7 is a little bit faster in some cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org