[ https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235437#comment-13235437 ]
Peter Schuller commented on CASSANDRA-3943: ------------------------------------------- We are working on generating single-range-per-reducer sstables so that there is no overlap, and each reducer can send to a single node (or at least one node per sstable generated). It doesn't address local storage, but does address this. It also has the effect that if we combine it with log(n) filtering of sstables in the read path based on ranges, it would be feasable to bulk import and have thousands of sstables and completely disable compaction. > Too many small size sstables after loading data using sstableloader or > BulkOutputFormat increases compaction time. > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-3943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3943 > Project: Cassandra > Issue Type: Wish > Components: Hadoop, Tools > Affects Versions: 0.8.2, 1.1.0 > Reporter: Samarth Gahire > Priority: Minor > Labels: bulkloader, hadoop, ponies, sstableloader, streaming, > tools > Original Estimate: 168h > Remaining Estimate: 168h > > When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the > size of sstables created is around the buffer size provided. > But After loading , sstables created in the cluster nodes are of size around > {code}( (sstable_size_before_loading) * replication_factor ) / > No_Of_Nodes_In_Cluster{code} > As the no of nodes in cluster goes increasing, size of each sstable loaded to > cassandra node decreases.Such small size sstables take too much time to > compact (minor compaction) as compare to relatively large size sstables. > One solution that we have tried is to increase the buffer size while > generating sstables.But as we increase the buffer size ,time taken to > generate sstables increases.Is there any solution to this in existing > versions or are you fixing this in future version? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira