I was thinking this too, but I think that the overall insert amount is not that big. Data is basically map data, and the files are map tiles, which I can easily make smaller. We are currently using this data from multiple nodes(GRID), but we want to get rid off the files system hassle(basically samba mounts).
Read r always done per file(column). This is why I think that Cassandra would be good. At least the read performance is more that good for us. -jussi Schubert Zhang wrote: > I think your file (as cassandra column value) is too large. > And I also think Cassandra is not good at store files. > > On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri > <ju...@androidconsulting.com <mailto:ju...@androidconsulting.com>> wrote: > > new try, previous went to wrong place... > > Hi all, > > i'm trying to run a scenario of adding files from specific folder > to cassandra. Now I have 64 files(about 15-20 MB per file) and > overall of 1GB of data. > I'm able to insert a round 40 files, but after that the cassandra > goes to some GC loop and I finally get an timeout to the client. > It is not going to OOM, but it just jams. > > Here is what I had last marks in log file: > NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line > 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 > used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java > (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving > 1435582832 used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java > (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving > 1679795336 used; max is 4108386304 > INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java > (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving > 1589308456 used; max is 4108386304 > > > > I think that I must have something wrong in my configurations or > in how I use cassandra, because here people are inserting 10 times > more stuff and it works. > > Column family I using: > <ColumnFamily CompareWith="BytesType" Name="Standard1"/> > Basically inserting with key name is "Folder_name" and column name > is "file name" and value is the file content. > I tried with Hector(mainly) and directly using thrift(insert and > batch_mutate). > > In my case, the data does not need to readable immediately after > insert, but I don't know it that helps in anyway. > > > My environment : > mac and/or linux, tested in both > java 1.6.0_17 > Cassandra 0.6.1 > > > > <RpcTimeoutInMillis>60000</RpcTimeoutInMillis> > <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB> > <RowWarningThresholdInMB>512</RowWarningThresholdInMB> > <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB> > <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB> > <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB> > <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB> > <MemtableThroughputInMB>64</MemtableThroughputInMB> > <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB> > <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions> > <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes> > <ConcurrentReads>8</ConcurrentReads> > <ConcurrentWrites>32</ConcurrentWrites> > <CommitLogSync>batch</CommitLogSync> > <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS --> > <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS> > <GCGraceSeconds>500</GCGraceSeconds> > > JVM_OPTS=" \ > -server \ > -Xms3G \ > -Xmx3G \ > -XX:PermSize=512m \ > -XX:MaxPermSize=800m \ > -XX:MaxNewSize=256m \ > -XX:NewSize=128m \ > -XX:TargetSurvivorRatio=90 \ > -XX:+AggressiveOpts \ > -XX:+UseParNewGC \ > -XX:+UseConcMarkSweepGC \ > -XX:+CMSParallelRemarkEnabled \ > -XX:+HeapDumpOnOutOfMemoryError \ > -XX:SurvivorRatio=128 \ > -XX:MaxTenuringThreshold=0 \ > -XX:+DisableExplicitGC \ > -Dcom.sun.management.jmxremote.port=8080 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false" > >