[ 
https://issues.apache.org/jira/browse/CASSANDRA-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769607#comment-13769607
 ] 

Jeremiah Jordan commented on CASSANDRA-5982:
--------------------------------------------

We should add 
https://github.com/jbellis/cassandra/commit/2e22cf23ec4e18cd99b69bb3d419931c55e3ba93
 to tpstats also.
                
> OutOfMemoryError when writing text blobs to a very large number of tables
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5982
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.2.10, 2.0.1
>
>         Attachments: 2000CF_memtable_mem_usage.png, system.log.gz
>
>
> This test goes outside the norm for Cassandra, creating ~2000 column 
> families, and writing large text blobs to them. 
> The process goes like this:
> Bring up a 6 node m2.2xlarge cluster on EC2. This instance type has enough 
> memory (34.2GB) so that Cassandra will allocate a full 8GB heap without 
> tuning cassandra-env.sh. However, this instance type only has a single drive, 
> so data and commitlog are comingled. (This test has also been run m1.xlarge 
> instances which have four drives (but lower memory) and has exhibited similar 
> results when assigning one to commitlog and 3 to datafile_directories.)
> Use the 'memtable_allocator: HeapAllocator' setting from CASSANDRA-5935.
> Create 2000 CFs:
> {code}
> CREATE KEYSPACE cf_stress WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3}
> CREATE COLUMNFAMILY cf_stress.tbl_00000 (id timeuuid PRIMARY KEY, val1 text, 
> val2 text, val3 text ) ;
> # repeat for tbl_00001, tbl_00002 ... tbl_02000
> {code}
> This process of creating tables takes a long time, about 5 hours, but for 
> anyone wanting to create that many tables, presumably they only need to do 
> this once, so this may be acceptable.
> Write data:
> The test dataset consists of writing 100K, 1M, and 10M documents to these 
> tables:
> {code}
> INSERT INTO {table_name} (id, val1, val2, val3) VALUES (?, ?, ?, ?)
> {code}
> With 5 threads doing these inserts across the cluster, indefinitely, randomly 
> choosing a table number 1-2000, the cluster eventually topples over with 
> 'OutOfMemoryError: Java heap space'.
> A heap dump analysis indicates that it's mostly memtables:
> !2000CF_memtable_mem_usage.png!
> Best current theory is that this is commitlog bound and that the memtables 
> cannot flush fast enough due to locking issues. But I'll let [~jbellis] 
> comment more on that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to