Is writing too many rows to a single partition the cause of memory consumption?


What I want to achieve is this: say I have 5 partition ID. Each corresponds to 
50 million IDs.  Given a partition ID, I need to get its corresponding 50 
million IDs. Is there another way to design the schema to avoid such a compound 
primary key?


I could use the clustering IDs as the primary key, and create index on the 
partition ID. But is that equivalent to create another table with compound keys?


At 2014-06-06 00:16:13, "Jack Krupansky" <j...@basetechnology.com> wrote:
How many rows (primary key values) are you writing for each partition of the 
primary key? I mean, are there relatively few, or are these very wide 
partitions?
 
Oh, I see! You’re writing 50,000,000 rows to a single partition! My, that IS 
ambitious.
 
-- Jack Krupansky
 
From:Xu Zhongxing
Sent: Thursday, June 5, 2014 3:34 AM
To:user@cassandra.apache.org
Subject: CQLSSTableWriter memory leak
 

I am using Cassandra's CQLSSTableWriter to import a large amount of data into 
Cassandra. When I use CQLSSTableWriter to write to a table with compound 
primary key, the memory consumption keeps growing. The GC of JVM cannot collect 
any used memory. When writing to tables with no compound primary key, the JVM 
GC works fine.

My Cassandra version is 2.0.5. The OS is Ubuntu 14.04 x86-64. JVM parameters 
are -Xms1g -Xmx2g. This is sufficient for all other non-compound primary key 
cases.

The problem can be reproduced by the following test case:

import org.apache.cassandra.io.sstable.CQLSSTableWriter;
import org.apache.cassandra.exceptions.InvalidRequestException;

import java.io.IOException;
import java.util.UUID;

class SS {
    public static void main(String[] args) {
        String schema = "create table test.t (x uuid, y uuid, primary key (x, 
y))";


        String insert = "insert into test.t (x, y) values (?, ?)";
        CQLSSTableWriter writer = CQLSSTableWriter.builder()
            .inDirectory("/tmp/test/t")
            .forTable(schema).withBufferSizeInMB(32)
            .using(insert).build();

        UUID id = UUID.randomUUID();
        try {
            for (int i = 0; i < 50000000; i++) {
                UUID id2 = UUID.randomUUID();
                writer.addRow(id, id2);
            }

            writer.close();
        } catch (Exception e) {
            System.err.println("hell");
        }
    }
}

Reply via email to