Can you please create a ticket for this on https://issues.apache.org/jira/browse/CASSANDRA
Thanks ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 26/11/2012, at 1:58 PM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com> wrote: > Hi Aaron, > > Thank you very much for replying. > > From the log, it seems the the ERROR happens when trying to flush > memtable with secondary index. > (When inserting the data, I set the default value as '' for all > pre-defined columns. > it's for programming convenience.) > > The following is the log: > > INFO [OptionalTasks:1] 2012-11-13 14:24:20,650 ColumnFamilyStore.java > (line 659) Enqueuing flush of > Memtable-(some_cf).(some_cf)_(some_idx)_idx_1@1216346401(485/8360 > serialized/live bytes, 24 ops) > ERROR [FlushWriter:2123] 2012-11-13 14:24:20,650 > AbstractCassandraDaemon.java (line 135) Exception in thread > Thread[FlushWriter:2123,5,main] > java.lang.AssertionError: Keys must not be empty > at > org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) > at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) > at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) > at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > > > INFO [FlushWriter:2125] 2012-11-13 14:24:20,651 Memtable.java (line > 264) Writing Memtable-(some_cf).(some_cf)_(some_idx2)_idx_1@272356994(485/2426 > serialized/live bytes, 24 ops) > ERROR [FlushWriter:2125] 2012-11-13 14:24:20,652 > AbstractCassandraDaemon.java (line 135) Exception in thread > Thread[FlushWriter > :2125,5,main] > java.lang.AssertionError: Keys must not be empty > at > org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) > at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) > at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) > at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > > Sincerely, > Hsiao > > > On Mon, Nov 26, 2012 at 3:52 AM, aaron morton <aa...@thelastpickle.com> wrote: >> I checked the log, and found some ERROR about network problems, >> and some ERROR about "Keys must not be empty". >> >> Do you have the full error stack ? >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 25/11/2012, at 4:14 AM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com> >> wrote: >> >> Hi Cassandra Devs, >> >> After trying to setup the same settings (and importing same data) >> to the 3 VMs on the same machine instead of 3 physical machines, >> so far I couldn't replicate the exploded-commitlog situation. >> >> On my 4-physical-machine setting, everything seems to be >> back to normal (commitlog size is less than the expected max setting) >> after restarting the nodes. >> >> This time the size of the commitlog of one node is set as 4G, and the >> others are set as 8G. >> >> Few days ago the node with 4G got exploded as 5+G. (the 8G nodes remain at >> 8G). >> I checked the log, and found some ERROR about network problems, >> and some ERROR about "Keys must not be empty". >> >> I suspect that besides the network problems, >> the "Keys must not be empty" ERROR may be the main reason why >> the commitlog continues growing. >> (I've already ensured that the Keys must not be empty in my code, >> so the problem may be raised when syncing internally in cassandra.) >> >> I restarted the 4G node as 8G node. Because there is no huge traffic since >> then, I am not sure whether increasing the commitlog size will >> solve/reduce this problem or not yet. >> I'll keep you posted once the commitlog get expldoed again. >> >> Sincerely, >> Hsiao >> >> >> On Mon, Nov 19, 2012 at 11:21 AM, Chuan-Heng Hsiao >> <hsiao.chuanh...@gmail.com> wrote: >> >> I have RF = 3. Read/Write consistency has already been set as TWO. >> >> It did seem that the data were not consistent yet. >> (There are some CFs that I expected empty after the operations, but still >> got some data, and the number of data were decreasing after retrying >> to get all data >> from that CF) >> >> Sincerely, >> Hsiao >> >> >> On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper <tups...@tupshin.com> >> wrote: >> >> What consistency level are you writing with? If you were writing with ANY, >> try writing with a higher consistency level. >> >> -Tupshin >> >> On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao" <hsiao.chuanh...@gmail.com> >> wrote: >> >> >> Hi Aaron, >> >> Thank you very much for the replying. >> >> The 700 CFs were created in the beginning (before any insertion.) >> >> I did not do anything with commitlog_archiving.properties, so I guess >> I was not using commit log archiving. >> >> What I did was doing a lot of insertions (and some deletions) >> using another 4 machines with 32 processes in total. >> (There are 4 nodes in my setting, so there are 8 machines in total) >> >> I did see huge logs in /var/log/cassandra after such huge amount of >> insertions. >> Right now I can't distinguish whether single insertion also cause huge >> logs. >> >> nodetool flush hanged (maybe because of 200G+ commitlog) >> >> Because these machines are not in production (guaranteed no more >> insertion/deletion) >> I ended up restarting cassandra one node each time, the commitlog >> shrinked back to >> 4G. I am doing repair on each node now. >> >> I'll try to re-import and keep logs when the commitlog increases insanely >> again. >> >> Sincerely, >> Hsiao >> >> >> On Mon, Nov 19, 2012 at 3:19 AM, aaron morton <aa...@thelastpickle.com> >> wrote: >> >> I am wondering whether the huge commitlog size is the expected behavior >> or >> not? >> >> Nope. >> >> Did you notice the large log size during or after the inserts ? >> If after did the size settle ? >> Are you using commit log archiving ? (in commitlog_archiving.properties) >> >> and around 700 mini column family (around 10M in data_file_directories) >> >> Can you describe how you created the 700 CF's ? >> >> and how can we reduce the size of commitlog? >> >> As a work around nodetool flush should checkpoint the log. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao <hsiao.chuanh...@gmail.com> >> wrote: >> >> hi Cassandra Developers, >> >> I am experiencing huge commitlog size (200+G) after inserting huge >> amount of data. >> It is a 4-node cluster with RF= 3, and currently each has 200+G commit >> log (so there are around 1T commit log in total) >> >> The setting of commitlog_total_space_in_mb is default. >> >> I am using 1.1.6. >> >> I did not do nodetool cleanup and nodetool flush yet, but >> I did nodetool repair -pr for each column family. >> >> There is 1 huge column family (around 68G in data_file_directories), >> and 18 mid-huge column family (around 1G in data_file_directories) >> and around 700 mini column family (around 10M in data_file_directories) >> >> I am wondering whether the huge commitlog size is the expected behavior >> or >> not? >> and how can we reduce the size of commitlog? >> >> Sincerely, >> Hsiao >> >> >>