jazracherif commented on code in PR #84:
URL: https://github.com/apache/datafusion-ray/pull/84#discussion_r2027648678


##########
tpch/tpcbench.py:
##########
@@ -186,8 +186,28 @@ def main(
 
     args = parser.parse_args()
 
+    if (args.qnum != -1 and args.query is not None):
+        print("Please specify either --qnum or --query, but not both")
+        exit(1)
+
+    queries = []
+    if (args.qnum != -1):
+        if args.qnum < 1 or args.qnum > 22:
+            print("Invalid query number. Please specify a number between 1 and 
22.")
+            exit(1)
+        else:
+            queries.append((str(args.qnum), tpch_query(args.qnum)))

Review Comment:
   explicitly mention TPCH in the id
   
   ```suggestion
               queries.append((f"TPCH-{args.qnum)}", tpch_query(args.qnum)))
   ```



##########
tpch/tpcbench.py:
##########
@@ -186,8 +186,28 @@ def main(
 
     args = parser.parse_args()
 
+    if (args.qnum != -1 and args.query is not None):
+        print("Please specify either --qnum or --query, but not both")
+        exit(1)
+
+    queries = []
+    if (args.qnum != -1):
+        if args.qnum < 1 or args.qnum > 22:
+            print("Invalid query number. Please specify a number between 1 and 
22.")
+            exit(1)
+        else:
+            queries.append((str(args.qnum), tpch_query(args.qnum)))
+            print("Executing tpch query ", args.qnum)
+
+    elif (args.query is not None):
+        queries.append(("custom query", args.query))
+        print("Executing custom query: ", args.query)
+    else:
+        print("Executing all tpch queries")
+        queries = [(str(i), tpch_query(i)) for i in range(1, 23)]
+

Review Comment:
   minor suggestion, extract this into its own functions, for example
   
   ```py
   from typing import List
   def get_sql_queries(tpch_qnum: str = None, sql_statement: str= None) -> 
List[(str, str)]:
       """
       Get the list of SQL statements from either the TPCH or user provided SQL 
statements.
       At most one of these parameters can be provided.
   
       :param tpch_qnum: the TPCH Query number. If none, return all TPCH 
queries supported
       :param sql_statement: SQL string statement on available data tables (e.g 
ingested through make_data.py)
       :return: a list of tuples with name of the Query and the string SQL 
statement
       """
   ```



##########
docs/contributing.md:
##########
@@ -80,15 +80,15 @@ RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tips.py 
--data-dir=$(pwd)/../testdata
 - In the `tpch` directory, use `make_data.py` to create a TPCH dataset at a 
provided scale factor, then
 
 ```bash
-RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpc.py 
--data=file:///path/to/your/tpch/directory/ --concurrency=2 --batch-size=8182 
--worker-pool-min=10 --qnum 2
+RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py 
--data=file:///path/to/your/tpch/directory/ --concurrency=2 --batch-size=8182 
--worker-pool-min=10 --qnum 2

Review Comment:
   I would recommend standardizing the data file directory to testdata/tpch and 
add the correct make_file.py command just above, for example
   
   ```suggestion
   RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py 
--data=../testdata/tpch --concurrency=2 --batch-size=8182 --worker-pool-min=10 
--qnum 2
   ```
   
   add before this more documentation one make_file
   - In the `tpch` directory, use `make_data.py` to create a TPCH dataset at a 
provided scale factor and an output director, such as the `testdata` directory
   ```bash
   python make_data.py 1 "../testdata/tpch"
   ```
   
   could also specify a env variable for this in the setup
   `TPCH_DATA=../testdata/tpch`
   
   and replace the examples with `$TPCH_DATA`



##########
tpch/tpcbench.py:
##########
@@ -186,8 +186,28 @@ def main(
 
     args = parser.parse_args()
 
+    if (args.qnum != -1 and args.query is not None):
+        print("Please specify either --qnum or --query, but not both")
+        exit(1)
+
+    queries = []
+    if (args.qnum != -1):
+        if args.qnum < 1 or args.qnum > 22:
+            print("Invalid query number. Please specify a number between 1 and 
22.")
+            exit(1)
+        else:
+            queries.append((str(args.qnum), tpch_query(args.qnum)))
+            print("Executing tpch query ", args.qnum)
+
+    elif (args.query is not None):
+        queries.append(("custom query", args.query))
+        print("Executing custom query: ", args.query)
+    else:
+        print("Executing all tpch queries")
+        queries = [(str(i), tpch_query(i)) for i in range(1, 23)]

Review Comment:
   ```suggestion
           queries = [(f"TPCH-{i}", tpch_query(i)) for i in range(1, 23)]
   ```



##########
tpch/tpcbench.py:
##########
@@ -154,7 +151,10 @@ def main(
     parser.add_argument(
         "--concurrency", required=True, help="Number of concurrent tasks"
     )
-    parser.add_argument("--qnum", type=int, default=-1, help="TPCH query 
number, 1-22")
+    parser.add_argument("--qnum", type=int, default=-1,
+                        help="TPCH query number, 1-22")
+    parser.add_argument("--query", required=False, type=str,
+                        help="Custom query to run with tpch tables")

Review Comment:
   ```suggestion
                           help="Custom SQL query statement to run with tpch 
tables")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to