[jira] [Created] (CASSANALYTICS-36) Bulk Reader should dynamically size the Spark job based on estimated table size

Doug Rohrer (Jira) Fri, 18 Apr 2025 11:29:04 -0700

Doug Rohrer created CASSANALYTICS-36:
----------------------------------------


             Summary: Bulk Reader should dynamically size the Spark job based 
on estimated table size
                 Key: CASSANALYTICS-36
                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-36
             Project: Apache Cassandra Analytics
          Issue Type: New Feature
          Components: Reader
            Reporter: Doug Rohrer


When reading a smaller dataset, leveraging a large number of Spark cores is 
actually less efficient than using a smaller number. By using estimated table 
size provided by Cassandra (similar to the data provided by `nodetool 
tablestats`) we can do a better job of limiting resource utilization and 
decreasing job runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANALYTICS-36) Bulk Reader should dynamically size the Spark job based on estimated table size

Reply via email to