Hi there, I see the following:
1) Add 8,000,000 columns to a single row. Each column name is a UUID. 2) Use cassandra-cli to run count keyspace.cf['myGUID'] The following is reported in the logs: ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java (line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) Internal error processing get_count java.lang.OutOfMemoryError: Java heap space and Cassandra falls over. I see the same behaviour with 0.6.6. Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the count to return in this particular example (i.e. no OutOfMemory is thrown). Here's the scala code that was ran to load the column, which uses the AKKA persistence API: object ColumnTest { def main(args : Array[String]) : Unit = { println("Super column test starting") val hosts = Array{"localhost"} val sessions = new CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost", 9160)),Protocol.Binary,ConsistencyLevel.ONE) val session = sessions.newSession loadRow("myGUID", 8000000, session) session.close } def loadRow(key:String, noOfColumns:Int, session:CassandraSession){ print("loading: "+key+", with columns: "+noOfColumns) val start = System.currentTimeMillis val rawPath = new ColumnPath("dr") for(i <- 0 until noOfColumns){ val recordUuid = UUID.randomUUID.toString session ++| (key, rawPath.setColumn(recordUuid.getBytes), "1".getBytes, System.currentTimeMillis) session.flush } val finish = System.currentTimeMillis print(", Time taken (secs) :" +((finish-start)/1000) + " seconds.\n") } } Heres the configuration used: # Arguments to pass to the JVM JVM_OPTS=" \ -ea \ -Xms1G \ -Xmx2G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:CMSInitiatingOccupancyFraction=75 \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false" Admittedly the resource allocation is small, but I wondered if there should be some configuration guidelines (e.g. memory allocation vs number of columns supported). Im running this on my MBP with a single node and java as thus: $ java -version java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) Heres the CF definition: <Keyspace Name="occurrence"> <ColumnFamily Name="dr" CompareWith="UTF8Type" Comment="The column family for dataset tracking"/> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace> Apologies in advance if this is a known issue or a known limitation of 0.6.x. I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 8mill columns = 300MB approx in this particular case. I guess it may also be a result of the limitations with thrift (i.e. no streaming capabilities). Any thoughts appreciated, Dave