Hi Aaron, Thank you very much for replying.
>From the log, it seems the the ERROR happens when trying to flush memtable with secondary index. (When inserting the data, I set the default value as '' for all pre-defined columns. it's for programming convenience.) The following is the log: INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360 serialized/live bytes, 24 ops) ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[FlushWriter:2123,5,main] java.lang.AssertionError: Keys must not be empty at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line 264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426 serialized/live bytes, 24 ops) ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[FlushWriter :2125,5,main] java.lang.AssertionError: Keys must not be empty at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Sincerely, Hsiao On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aa...@thelastpickle.com> wrote: > I checked the log, and found some ERROR about network problems, > and some ERROR about "Keys must not be empty". > > Do you have the full error stack ? > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com> > wrote: > > Hi Cassandra Devs, > > After trying to setup the same settings (and importing same data) > to the 3 VMs on the same machine instead of 3 physical machines, > so far I couldn't replicate the exploded-commitlog situation. > > On my 4-physical-machine setting, everything seems to be > back to normal (commitlog size is less than the expected max setting) > after restarting the nodes. > > This time the size of the commitlog of one node is set as 4G, and the > others are set as 8G. > > Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at > 8G). > I checked the log, and found some ERROR about network problems, > and some ERROR about "Keys must not be empty". > > I suspect that besides the network problems, > the "Keys must not be empty" ERROR may be the main reason why > the commitlog continues growing. > (I've already ensured that the Keys must not be empty in my code, > so the problem may be raised when syncing internally in cassandra.) > > I restarted the 4G node as 8G node. Because there is no huge traffic since > then, I am not sure whether increasing the commitlog size will > solve/reduce this problem or not yet. > I'll keep you posted once the commitlog get expldoed again. > > Sincerely, > Hsiao > > > On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao > <hsiao.chuanh...@gmail.com> wrote: > > I have RF = 3. Read/Write consistency has already been set as TWO. > > It did seem that the data were not consistent yet. > (There are some CFs that I expected empty after the operations, but still > got some data, and the number of data were decreasing after retrying > to get all data > from that CF) > > Sincerely, > Hsiao > > > On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tups...@tupshin.com> > wrote: > > What consistency level are you writing with? If you were writing with ANY, > try writing with a higher consistency level. > > -Tupshin > > On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hsiao.chuanh...@gmail.com> > wrote: > > > Hi Aaron, > > Thank you very much for the replying. > > The 700 CFs were created in the beginning (before any insertion.) > > I did not do anything with commitlog_archiving.properties, so I guess > I was not using commit log archiving. > > What I did was doing a lot of insertions (and some deletions) > using another 4 machines with 32 processes in total. > (There are 4 nodes in my setting, so there are 8 machines in total) > > I did see huge logs in /var/log/cassandra after such huge amount of > insertions. > Right now I can't distinguish whether single insertion also cause huge > logs. > > nodetool flush hanged (maybe because of 200G+ commitlog) > > Because these machines are not in production (guaranteed no more > insertion/deletion) > I ended up restarting cassandra one node each time, the commitlog > shrinked back to > 4G. I am doing repair on each node now. > > I'll try to re-import and keep logs when the commitlog increases insanely > again. > > Sincerely, > Hsiao > > > On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com> > wrote: > > I am wondering whether the huge commitlog size is the expected behavior > or > not? > > Nope. > > Did you notice the large log size during or after the inserts ? > If after did the size settle ? > Are you using commit log archiving ? (in commitlog_archiving.properties) > > and around 700 mini column family (around 10M in data_file_directories) > > Can you describe how you created the 700 CF's ? > > and how can we reduce the size of commitlog? > > As a work around nodetool flush should checkpoint the log. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com> > wrote: > > hi Cassandra Developers, > > I am experiencing huge commitlog size (200+G) after inserting huge > amount of data. > It is a 4-node cluster with RF= 3, and currently each has 200+G commit > log (so there are around 1T commit log in total) > > The setting of commitlog_total_space_in_mb is default. > > I am using 1.1.6. > > I did not do nodetool cleanup and nodetool flush yet, but > I did nodetool repair -pr for each column family. > > There is 1 huge column family (around 68G in data_file_directories), > and 18 mid-huge column family (around 1G in data_file_directories) > and around 700 mini column family (around 10M in data_file_directories) > > I am wondering whether the huge commitlog size is the expected behavior > or > not? > and how can we reduce the size of commitlog? > > Sincerely, > Hsiao > > >