Hi all, While testing the new 0.7.1 release I got the following exception:
ERROR [ReadStage:11] 2011-02-15 16:39:18,105 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:75) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:71) ... 12 more I'm able reliably reproduce this using the following one node cluster: - apache-cassandra-0.7.1-bin.tar.gz - Fedora 14 - java version "1.6.0_20". OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) - Default cassandra.yaml - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M" cassandra-cli initialization: - create keyspace foo; - use foo; - create column family datasets; $ python dataset_check.py (attached) Inserting row 0 of 10 Inserting row 1 of 10 Inserting row 2 of 10 Inserting row 3 of 10 Inserting row 4 of 10 Inserting row 5 of 10 Inserting row 6 of 10 Inserting row 7 of 10 Inserting row 8 of 10 Inserting row 9 of 10 Attempting to fetch key 0 Traceback (most recent call last): ... pycassa.pool.MaximumRetryException: Retried 6 times After this I have 6 EOFExceptions in system.log. Running "get datasets[0]['name'];" using cassandra-cli also triggers the same exception. I've not been able to reproduce this with cassandra 0.7.0. Regards, Jonas
import pycassa pool = pycassa.ConnectionPool('foo', ['localhost:9160'], timeout=10) cf = pycassa.ColumnFamily(pool, 'datasets') def insert_dataset(key, num_cols=50000): columns = {} extra_data = 'XXX' * 20 for i in range(num_cols): col = 'r%08d' % i columns[col] = '%s:%s:%s' % (key, col, extra_data) if len(columns) >= 3000: cf.insert(key, columns) columns = {} if len(columns) >= 3000: cf.insert(key, columns) columns = {} cf.insert(key, {'name': 'key:%s' % key}) def test_insert_and_column_fetch(num=20): # Insert @num fairly large rows for i in range(num): print 'Inserting row %d of %d' % (i, num) insert_dataset(str(i)) # Verify that the "name" column is correctly stored for i in range(num): print 'Attempting to fetch key %d' % i row = cf.get(str(i), columns=['name']) assert row['name'] == 'key:%d' % i for i, (key, row) in enumerate(cf.get_range(columns=['name'])): print '%d: get_range returned: key %s, name: "%s"' % (i, key, row['name']) assert row['name'] == 'key:' + key test_insert_and_column_fetch(10)