[ https://issues.apache.org/jira/browse/CASSANALYTICS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh McKenzie updated CASSANALYTICS-36: --------------------------------------- Fix Version/s: 1.0 > Bulk Reader should dynamically size the Spark job based on estimated table > size > ------------------------------------------------------------------------------- > > Key: CASSANALYTICS-36 > URL: https://issues.apache.org/jira/browse/CASSANALYTICS-36 > Project: Apache Cassandra Analytics > Issue Type: New Feature > Components: Reader > Reporter: Doug Rohrer > Priority: Normal > Fix For: 1.0 > > > When reading a smaller dataset, leveraging a large number of Spark cores is > actually less efficient than using a smaller number. By using estimated table > size provided by Cassandra (similar to the data provided by `nodetool > tablestats`) we can do a better job of limiting resource utilization and > decreasing job runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org