Hi everybody,

As some of you may know, at Talend, we’ve been working for a while to add 
TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part of 
Beam testing workflow and release routine will help a community to detect 
quickly the performance regressions or improvements, identify missing or 
incorrect Beam SQL features and execute Beam SQL on different runtime 
environments with different runners. 

What is TPC-DS? From TPC-DS specification document [1]:

“TPC-DS is a decision support benchmark that models several generally 
applicable aspects of a decision support system, including queries and data 
maintenance. The benchmark provides a representative evaluation of performance 
as a general purpose decision support system.” 

TPC-DS benchmark suite for Beam is implemented as a separate testing tool for 
Java SDK (like well known Nexmark benchmark suite) [2]. It supports a limited 
number of TPC-DS SQL queries for now (mostly because of limited SQL syntax 
support in Beam), CSV and Parquet as input data format, and it runs on Jenkins 
with three most popular Beam runners (Spark [3], Flink [4], Dataflow [5]). The 
job metrics are stored in InfluxDB and can be accessed though Grafana 
dashboards [6][7][8]. 

More details can be found in Beam documentation [9].

For sure, there are still plenty things to do, like adding new runners, support 
of other SDKs, data formats, etc - so, your contributions are very welcomed in 
any form. Though, at least for now, we already have a first working and 
automated version that can be used by community. 

Also, I’d like to thank everybody who worked on this improvement!

—
Alexey


[1] 
https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp 
<https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp>
[2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds 
<https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds>
[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/ 
<https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/>
[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/ 
<https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/>
[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/ 
<https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/>
[6] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 
<http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1>
[7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1 
<http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1>
[8] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 
<http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1>
[9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/ 
<https://beam.apache.org/documentation/sdks/java/testing/tpcds/>





Reply via email to