Hi there,

I see the following:

1) Add 8,000,000 columns to a single row. Each column name is a UUID.
2) Use cassandra-cli to run count keyspace.cf['myGUID']

The following is reported in the logs:

ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java 
(line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) 
Internal error processing get_count
java.lang.OutOfMemoryError: Java heap space

and Cassandra falls over. I see the same behaviour with 0.6.6.

Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the 
count to return in this particular example (i.e. no OutOfMemory is thrown).

Here's the scala code that was ran to load the column, which uses the AKKA 
persistence API:

object ColumnTest {
        def main(args : Array[String]) : Unit = {
                println("Super column test starting")
                val hosts = Array{"localhost"}
                val sessions = new 
CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost", 
9160)),Protocol.Binary,ConsistencyLevel.ONE)
                val session = sessions.newSession
                loadRow("myGUID", 8000000, session)
                session.close
        }
        
        def loadRow(key:String, noOfColumns:Int, session:CassandraSession){
                print("loading: "+key+", with columns: "+noOfColumns)
                val start = System.currentTimeMillis
                val rawPath = new ColumnPath("dr")
                for(i <- 0 until noOfColumns){
                        val recordUuid = UUID.randomUUID.toString
                        session ++| (key, 
rawPath.setColumn(recordUuid.getBytes), "1".getBytes, System.currentTimeMillis)
                        session.flush
                }
                val finish = System.currentTimeMillis
                print(", Time taken (secs) :" +((finish-start)/1000) + " 
seconds.\n")
        }
}

Heres the configuration used:

# Arguments to pass to the JVM
JVM_OPTS=" \
        -ea \
        -Xms1G \
        -Xmx2G \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:SurvivorRatio=8 \
        -XX:MaxTenuringThreshold=1 \
        -XX:CMSInitiatingOccupancyFraction=75 \
        -XX:+UseCMSInitiatingOccupancyOnly \
        -XX:+HeapDumpOnOutOfMemoryError \
        -Dcom.sun.management.jmxremote.port=8080 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"

Admittedly the resource allocation is small, but I wondered if there should be 
some configuration guidelines (e.g. memory allocation vs number of columns 
supported).
        
Im running this on my MBP with a single node and java as thus:

$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
        
Heres the CF definition:

    <Keyspace Name="occurrence">
      <ColumnFamily Name="dr"
                    CompareWith="UTF8Type"
                    Comment="The column family for dataset tracking"/>
     
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
     <ReplicationFactor>1</ReplicationFactor>
     
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
    </Keyspace>
    
Apologies in advance if this is a known issue or a known limitation of 0.6.x.
I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 8mill 
columns = 300MB approx in this particular case.   
I guess it may also be a result of the limitations with thrift (i.e. no 
streaming capabilities).
    
Any thoughts appreciated,

Dave
    







Reply via email to