Hi all,

While testing the new 0.7.1 release I got the following exception:

ERROR [ReadStage:11] 2011-02-15 16:39:18,105
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.io.IOError: java.io.EOFException
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:75)
        at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
        at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
        at org.apache.cassandra.db.Table.getRow(Table.java:384)
        at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
        at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
        at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
        at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:71)
        ... 12 more

I'm able reliably reproduce this using the following one node cluster:
- apache-cassandra-0.7.1-bin.tar.gz
- Fedora 14
- java version "1.6.0_20".
  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
- Default cassandra.yaml
- cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"

cassandra-cli initialization:
- create keyspace foo;
- use foo;
- create column family datasets;

$ python dataset_check.py (attached)
Inserting row 0 of 10
Inserting row 1 of 10
Inserting row 2 of 10
Inserting row 3 of 10
Inserting row 4 of 10
Inserting row 5 of 10
Inserting row 6 of 10
Inserting row 7 of 10
Inserting row 8 of 10
Inserting row 9 of 10
Attempting to fetch key 0
Traceback (most recent call last):
...
pycassa.pool.MaximumRetryException: Retried 6 times

After this I have 6 EOFExceptions in system.log.
Running "get datasets[0]['name'];" using cassandra-cli also triggers the
same exception.
I've not been able to reproduce this with cassandra 0.7.0.

Regards,
Jonas


import pycassa

pool = pycassa.ConnectionPool('foo', ['localhost:9160'], timeout=10)
cf = pycassa.ColumnFamily(pool, 'datasets')


def insert_dataset(key, num_cols=50000):
    columns = {}
    extra_data = 'XXX' * 20
    for i in range(num_cols):
        col = 'r%08d' % i
        columns[col] = '%s:%s:%s' % (key, col, extra_data)
        if len(columns) >= 3000:
            cf.insert(key, columns)
            columns = {}
    if len(columns) >= 3000:
        cf.insert(key, columns)
        columns = {}
    cf.insert(key, {'name': 'key:%s' % key})


def test_insert_and_column_fetch(num=20):
    # Insert @num fairly large rows
    for i in range(num):
        print 'Inserting row %d of %d' % (i, num)
        insert_dataset(str(i))
    # Verify that the "name" column is correctly stored
    for i in range(num):
        print 'Attempting to fetch key %d' % i
        row = cf.get(str(i), columns=['name'])
        assert row['name'] == 'key:%d' % i
    for i, (key, row) in enumerate(cf.get_range(columns=['name'])):
        print '%d: get_range returned: key %s, name: "%s"' % (i, key, 
row['name'])
        assert row['name'] == 'key:' + key


test_insert_and_column_fetch(10)

Reply via email to