Unsubscribe Please
Sent from my iPad On Dec 12, 2010, at 1:26 AM, Dave Martin <moyesys...@googlemail.com> wrote: > Hi there, > > I see the following: > > 1) Add 8,000,000 columns to a single row. Each column name is a UUID. > 2) Use cassandra-cli to run count keyspace.cf['myGUID'] > > The following is reported in the logs: > > ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java > (line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main] > java.lang.OutOfMemoryError: Java heap space > ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) > Internal error processing get_count > java.lang.OutOfMemoryError: Java heap space > > and Cassandra falls over. I see the same behaviour with 0.6.6. > > Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the > count to return in this particular example (i.e. no OutOfMemory is thrown). > > Here's the scala code that was ran to load the column, which uses the AKKA > persistence API: > > object ColumnTest { > def main(args : Array[String]) : Unit = { > println("Super column test starting") > val hosts = Array{"localhost"} > val sessions = new > CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost", > 9160)),Protocol.Binary,ConsistencyLevel.ONE) > val session = sessions.newSession > loadRow("myGUID", 8000000, session) > session.close > } > > def loadRow(key:String, noOfColumns:Int, session:CassandraSession){ > print("loading: "+key+", with columns: "+noOfColumns) > val start = System.currentTimeMillis > val rawPath = new ColumnPath("dr") > for(i <- 0 until noOfColumns){ > val recordUuid = UUID.randomUUID.toString > session ++| (key, rawPath.setColumn(recordUuid.getBytes), > "1".getBytes, System.currentTimeMillis) > session.flush > } > val finish = System.currentTimeMillis > print(", Time taken (secs) :" +((finish-start)/1000) + " seconds.\n") > } > } > > Heres the configuration used: > > # Arguments to pass to the JVM > JVM_OPTS=" \ > -ea \ > -Xms1G \ > -Xmx2G \ > -XX:+UseParNewGC \ > -XX:+UseConcMarkSweepGC \ > -XX:+CMSParallelRemarkEnabled \ > -XX:SurvivorRatio=8 \ > -XX:MaxTenuringThreshold=1 \ > -XX:CMSInitiatingOccupancyFraction=75 \ > -XX:+UseCMSInitiatingOccupancyOnly \ > -XX:+HeapDumpOnOutOfMemoryError \ > -Dcom.sun.management.jmxremote.port=8080 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false" > > Admittedly the resource allocation is small, but I wondered if there should > be some configuration guidelines (e.g. memory allocation vs number of columns > supported). > > Im running this on my MBP with a single node and java as thus: > > $ java -version > java version "1.6.0_22" > Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) > Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) > > Heres the CF definition: > > <Keyspace Name="occurrence"> > <ColumnFamily Name="dr" > CompareWith="UTF8Type" > Comment="The column family for dataset tracking"/> > > <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> > <ReplicationFactor>1</ReplicationFactor> > > <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> > </Keyspace> > > Apologies in advance if this is a known issue or a known limitation of 0.6.x. > I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but > 8mill columns = 300MB approx in this particular case. > I guess it may also be a result of the limitations with thrift (i.e. no > streaming capabilities). > > Any thoughts appreciated, > > Dave > > > > > > > >