Re: [PR] [SPARK-52078][TEST] Rewrite ZStandardBenchmark with TPC-DS data [spark]

via GitHub Mon, 12 May 2025 04:52:12 -0700


pan3793 commented on code in PR #50857:
URL: https://github.com/apache/spark/pull/50857#discussion_r2084503790



##########
.github/workflows/benchmark.yml:
##########
@@ -138,6 +142,7 @@ jobs:
       # To prevent spark.test.home not being set. See more detail in 
SPARK-36007.
       SPARK_HOME: ${{ github.workspace }}
       SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1
+      SPARK_TPCDS_DATA_TEXT: ${{ github.workspace }}/tpcds-sf-1-text

Review Comment:
   Hi @luben, in this round, I'm trying to use TPCDS-generated data for the 
zstd compression benchmark.
   
   The data can be generated by the following steps:
   - follow https://github.com/databricks/tpcds-kit to build
   - `mkdir -p tpcds-sf-1-text`
   - `tpcds-kit/tools/dsdgen -DISTRIBUTIONS tpcds-kit/tools/tpcds.idx -SCALE 1 
-DIR tpcds-sf-1-text`
   
   And my local test shows that zstd-jni 1.5.6 and 1.5.7 are basically at the 
same level, and 1.5.7 is a little bit faster in some cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-52078][TEST] Rewrite ZStandardBenchmark with TPC-DS data [spark]

Reply via email to