Yifan Cai created CASSANALYTICS-89: -------------------------------------- Summary: Create dedicated data class for broadcast variable during bulk write Key: CASSANALYTICS-89 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-89 Project: Apache Cassandra Analytics Issue Type: Improvement Components: Writer Reporter: Yifan Cai
Bulk write in Analytics uses Spark’s broadcast variable feature to distribute job information (BulkWriterContext) to executors. While this works today, it triggers unnecessary work in Spark’s SizeEstimator, which inspects all fields—including transient ones. Since BulkWriterContext (and the objects it references) contain many transient fields that aren’t meant to be serialized, SizeEstimator still walks them via reflection, wasting CPU cycles. A cleaner approach would be to introduce a dedicated data class for the broadcast variable, with only the minimal set of fields required for distribution. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org