Unsubscribe

Colin Sun, 12 Dec 2010 06:58:05 -0800

Unsubscribe

Please


Sent from my iPad

On Dec 12, 2010, at 1:26 AM, Dave Martin <moyesys...@googlemail.com> wrote:

> Hi there,
> 
> I see the following:
> 
> 1) Add 8,000,000 columns to a single row. Each column name is a UUID.
> 2) Use cassandra-cli to run count keyspace.cf['myGUID']
> 
> The following is reported in the logs:
> 
> ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java 
> (line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main]
> java.lang.OutOfMemoryError: Java heap space
> ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) 
> Internal error processing get_count
> java.lang.OutOfMemoryError: Java heap space
> 
> and Cassandra falls over. I see the same behaviour with 0.6.6.
> 
> Increasing the memory allocation with the -Xmx & -Xms args to 4GB allows the 
> count to return in this particular example (i.e. no OutOfMemory is thrown).
> 
> Here's the scala code that was ran to load the column, which uses the AKKA 
> persistence API:
> 
> object ColumnTest {
>    def main(args : Array[String]) : Unit = {
>        println("Super column test starting")
>        val hosts = Array{"localhost"}
>        val sessions = new 
> CassandraSessionPool("occurrence",StackPool(SocketProvider("localhost", 
> 9160)),Protocol.Binary,ConsistencyLevel.ONE)
>        val session = sessions.newSession
>        loadRow("myGUID", 8000000, session)
>        session.close
>    }
>    
>    def loadRow(key:String, noOfColumns:Int, session:CassandraSession){
>        print("loading: "+key+", with columns: "+noOfColumns)
>        val start = System.currentTimeMillis
>        val rawPath = new ColumnPath("dr")
>        for(i <- 0 until noOfColumns){
>            val recordUuid = UUID.randomUUID.toString
>            session ++| (key, rawPath.setColumn(recordUuid.getBytes), 
> "1".getBytes, System.currentTimeMillis)
>            session.flush
>        }
>        val finish = System.currentTimeMillis
>        print(", Time taken (secs) :" +((finish-start)/1000) + " seconds.\n")
>    }
> }
> 
> Heres the configuration used:
> 
> # Arguments to pass to the JVM
> JVM_OPTS=" \
>        -ea \
>        -Xms1G \
>        -Xmx2G \
>        -XX:+UseParNewGC \
>        -XX:+UseConcMarkSweepGC \
>        -XX:+CMSParallelRemarkEnabled \
>        -XX:SurvivorRatio=8 \
>        -XX:MaxTenuringThreshold=1 \
>        -XX:CMSInitiatingOccupancyFraction=75 \
>        -XX:+UseCMSInitiatingOccupancyOnly \
>        -XX:+HeapDumpOnOutOfMemoryError \
>        -Dcom.sun.management.jmxremote.port=8080 \
>        -Dcom.sun.management.jmxremote.ssl=false \
>        -Dcom.sun.management.jmxremote.authenticate=false"
> 
> Admittedly the resource allocation is small, but I wondered if there should 
> be some configuration guidelines (e.g. memory allocation vs number of columns 
> supported).
> 
> Im running this on my MBP with a single node and java as thus:
> 
> $ java -version
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
> 
> Heres the CF definition:
> 
>    <Keyspace Name="occurrence">
>      <ColumnFamily Name="dr"
>                    CompareWith="UTF8Type"
>                    Comment="The column family for dataset tracking"/>
>     
> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
>     <ReplicationFactor>1</ReplicationFactor>
>     
> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
>    </Keyspace>
> 
> Apologies in advance if this is a known issue or a known limitation of 0.6.x.
> I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 
> 8mill columns = 300MB approx in this particular case.   
> I guess it may also be a result of the limitations with thrift (i.e. no 
> streaming capabilities).
> 
> Any thoughts appreciated,
> 
> Dave
> 
> 
> 
> 
> 
> 
> 
>

Unsubscribe

Reply via email to