Re: Inserting files to Cassandra timeouts

Jussi P?öri Wed, 28 Apr 2010 10:38:05 -0700

I was thinking this too, but I think that the overall insert amount is
not that big.
Data is basically map data, and the files are map tiles, which I can
easily make smaller.
We are currently using this data from multiple nodes(GRID), but we want
to get rid off the files system hassle(basically samba mounts).


Read r always done per file(column). This is why I think that Cassandra
would be good. At least the read performance is more that good
for us.

-jussi


Schubert Zhang wrote:
> I think your file (as cassandra column value) is too large.
> And I also think Cassandra is not good at store files.
>
> On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri
> <ju...@androidconsulting.com <mailto:ju...@androidconsulting.com>> wrote:
>
>     new try, previous went to wrong place...
>
>     Hi all,
>
>     i'm trying to run a scenario of adding files from specific folder
>     to cassandra. Now I have 64 files(about 15-20 MB per file) and
>     overall of 1GB of data.
>     I'm able to insert a round 40 files, but after that the cassandra
>     goes to some GC loop and I finally get an timeout to the client.
>     It is not going to OOM, but it just jams.
>
>     Here is what I had last marks in log file:
>     NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line
>     110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120
>     used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java
>     (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving
>     1435582832 used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java
>     (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving
>     1679795336 used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java
>     (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving
>     1589308456 used; max is 4108386304
>
>
>
>     I think that I must have something wrong in my configurations or
>     in how I use cassandra, because here people are inserting 10 times
>     more stuff and it works.
>
>     Column family I using:
>     <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
>     Basically inserting with key name is "Folder_name" and column name
>     is "file name" and value is the file content.
>     I tried with Hector(mainly) and directly using thrift(insert and
>     batch_mutate).
>
>     In my case, the data does not need to readable immediately after
>     insert, but I don't know it that helps in anyway.
>
>
>     My environment :
>     mac and/or linux, tested in both
>     java 1.6.0_17
>     Cassandra 0.6.1
>
>
>
>      <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
>     <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
>     <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>      <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>      <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>      <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>      <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>      <MemtableThroughputInMB>64</MemtableThroughputInMB>
>      <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>      <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>      <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>      <ConcurrentReads>8</ConcurrentReads>
>      <ConcurrentWrites>32</ConcurrentWrites>
>      <CommitLogSync>batch</CommitLogSync>
>      <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>      <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>      <GCGraceSeconds>500</GCGraceSeconds>
>
>     JVM_OPTS=" \
>            -server \
>            -Xms3G \
>            -Xmx3G \
>            -XX:PermSize=512m \
>            -XX:MaxPermSize=800m \
>            -XX:MaxNewSize=256m \
>            -XX:NewSize=128m \
>            -XX:TargetSurvivorRatio=90 \
>            -XX:+AggressiveOpts \
>            -XX:+UseParNewGC \
>            -XX:+UseConcMarkSweepGC \
>            -XX:+CMSParallelRemarkEnabled \
>            -XX:+HeapDumpOnOutOfMemoryError \
>            -XX:SurvivorRatio=128 \
>            -XX:MaxTenuringThreshold=0 \
>            -XX:+DisableExplicitGC \
>            -Dcom.sun.management.jmxremote.port=8080 \
>            -Dcom.sun.management.jmxremote.ssl=false \
>            -Dcom.sun.management.jmxremote.authenticate=false"
>
>

Re: Inserting files to Cassandra timeouts

Reply via email to