[ https://issues.apache.org/jira/browse/CASSANALYTICS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francisco Guerrero updated CASSANALYTICS-36: -------------------------------------------- Source Control Link: https://github.com/apache/cassandra-analytics/commit/1664b2b5378666afa604432f9ec30f6e2fd11780 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Bulk Reader should dynamically size the Spark job based on estimated table > size > ------------------------------------------------------------------------------- > > Key: CASSANALYTICS-36 > URL: https://issues.apache.org/jira/browse/CASSANALYTICS-36 > Project: Apache Cassandra Analytics > Issue Type: New Feature > Components: Reader > Reporter: Doug Rohrer > Assignee: Francisco Guerrero > Priority: Normal > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When reading a smaller dataset, leveraging a large number of Spark cores is > actually less efficient than using a smaller number. By using estimated table > size provided by Cassandra (similar to the data provided by `nodetool > tablestats`) we can do a better job of limiting resource utilization and > decreasing job runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org