Re: huge commitlog

Chuan-Heng Hsiao Sun, 25 Nov 2012 16:58:50 -0800

Hi Aaron,

Thank you very much for replying.


>From the log, it seems the the ERROR happens when trying to flush
memtable with secondary index.
(When inserting the data, I set the default value as '' for all
pre-defined columns.
 it's for programming convenience.)

The following is the log:

 INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360
serialized/live bytes, 24 ops)
ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[FlushWriter:2123,5,main]
java.lang.AssertionError: Keys must not be empty
        at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
        at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)


 INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line
264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426
serialized/live bytes, 24 ops)
ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[FlushWriter
:2125,5,main]
java.lang.AssertionError: Keys must not be empty
        at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
        at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295)
        at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48)
        at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

Sincerely,
Hsiao


On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
>
> Do you have the full error stack ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com>
> wrote:
>
> Hi Cassandra Devs,
>
> After trying to setup the same settings (and importing same data)
> to the 3 VMs on the same machine instead of 3 physical machines,
> so far I couldn't replicate the exploded-commitlog situation.
>
> On my 4-physical-machine setting, everything seems to be
> back to normal (commitlog size is less than the expected max setting)
> after restarting the nodes.
>
> This time the size of the commitlog of one node is set as 4G, and the
> others are set as 8G.
>
> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at
> 8G).
> I checked the log, and found some ERROR about network problems,
> and some ERROR about "Keys must not be empty".
>
> I suspect that besides the network problems,
> the "Keys must not be empty" ERROR may be the main reason why
> the commitlog continues growing.
> (I've already ensured that the Keys must not be empty in my code,
> so the problem may be raised when syncing internally in cassandra.)
>
> I restarted the 4G node as 8G node. Because there is no huge traffic since
> then, I am not sure whether increasing the commitlog size will
> solve/reduce this problem or not yet.
> I'll keep you posted once the commitlog get expldoed again.
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao
> <hsiao.chuanh...@gmail.com> wrote:
>
> I have RF = 3. Read/Write consistency has already been set as TWO.
>
> It did seem that the data were not consistent yet.
> (There are some CFs that I expected empty after the operations, but still
> got some data, and the number of data were decreasing after retrying
> to get all data
> from that CF)
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tups...@tupshin.com>
> wrote:
>
> What consistency level are you writing with? If you were writing with ANY,
> try writing with a higher consistency level.
>
> -Tupshin
>
> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hsiao.chuanh...@gmail.com>
> wrote:
>
>
> Hi Aaron,
>
> Thank you very much for the replying.
>
> The 700 CFs were created in the beginning (before any insertion.)
>
> I did not do anything with commitlog_archiving.properties, so I guess
> I was not using commit log archiving.
>
> What I did was doing a lot of insertions (and some deletions)
> using another 4 machines with 32 processes in total.
> (There are 4 nodes in my setting, so there are 8 machines in total)
>
> I did see huge logs in /var/log/cassandra after such huge amount of
> insertions.
> Right now I  can't distinguish whether single insertion also cause huge
> logs.
>
> nodetool flush hanged (maybe because of 200G+ commitlog)
>
> Because these machines are not in production (guaranteed no more
> insertion/deletion)
> I ended up restarting cassandra one node each time, the commitlog
> shrinked back to
> 4G. I am doing repair on each node now.
>
> I'll try to re-import and keep logs when the commitlog increases insanely
> again.
>
> Sincerely,
> Hsiao
>
>
> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>
> I am wondering whether the huge commitlog size is the expected behavior
> or
> not?
>
> Nope.
>
> Did you notice the large log size during or after the inserts ?
> If after did the size settle ?
> Are you using commit log archiving ? (in commitlog_archiving.properties)
>
> and around 700 mini column family (around 10M in data_file_directories)
>
> Can you describe how you created the 700 CF's ?
>
> and how can we reduce the size of commitlog?
>
> As a work around nodetool flush should checkpoint the log.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com>
> wrote:
>
> hi Cassandra Developers,
>
> I am experiencing huge commitlog size (200+G) after inserting huge
> amount of data.
> It is a 4-node cluster with RF= 3, and currently each has 200+G commit
> log (so there are around 1T commit log in total)
>
> The setting of commitlog_total_space_in_mb is default.
>
> I am using 1.1.6.
>
> I did not do nodetool cleanup and nodetool flush yet, but
> I did nodetool repair -pr for each column family.
>
> There is 1 huge column family (around 68G in data_file_directories),
> and 18 mid-huge column family (around 1G in data_file_directories)
> and around 700 mini column family (around 10M in data_file_directories)
>
> I am wondering whether the huge commitlog size is the expected behavior
> or
> not?
> and how can we reduce the size of commitlog?
>
> Sincerely,
> Hsiao
>
>
>

Re: huge commitlog

Reply via email to