Can you attach the eleveldb portion of your app.config file? Configuration problems, especially max_open_files being too low, can often cause issues like this.
If it isn't sensitive, the whole app.config and vm.args files are also often helpful. On Thu, Oct 11, 2012 at 9:12 AM, <jan.evangeli...@seznam.cz> wrote: > Hello, > > I am writing a new application and I am testing it on a cluster with 4 Riak > nodes (16 GM RAM, 2 x i3 3.4GHz - 2 cores). > > The application is tested with the expected load of 1000 requests/second, > 90% of the requests cause a Riak read and write of a new key. The problem > is that the performance starts falling after 18-20 hours and one of the Riak > nodes stops responding after 23-25 hours. > > (Key is cca 61 bytes long, data is just 3 timestamps converted to binary, > and there is a secondary key containing an expiration time. There should be > a mapred job to delete keys older than 24 hours, but it is turned off while > researching the performance problem.) > > Logs on the other nodes show that the problematic node cannot be contacted: > > 2012-10-11 11:33:57.473 [error] <0.908.0> ** Node 'riak@172.16.0.2' not > responding ** > ** Removing (timedout) connection ** > > The problematic node itself does not respond to "/usr/sbin/riak ping", but > beam.smp is running and ALIVE messages are written regularly to the erlang > log. There is nothing suspicious in logs on the node, its error log is > empty. > > The beam.smp consumes 20% memory and 50-100% of 1 CPU (the other 3 CPUs sit > idle), and the process has 267 open LevelDB files. > > The database sizes are: > > node1: 16249M, 281 files in 21 dirs (with 4 additional files like > /home/riak/leveldb/0/lost/BLOCKS.bad); this is the problematic node > node2: 16183M, 264 files in 16 dirs > node3: 16664M, 264 files in 16 dirs > node4: 16205M, 265 files in 16 dirs > > I tried to attach to the beam.smp process with Erlang, but it does not > respond to net_adm:ping/1. > > I attached gdb to the process, and gdb shows that most of its 93 threads are > idle (in ethr_event_wait), but 2 threads are in LevelDB code: > > Thread 24 (Thread 0x7f1a8ecc0700 (LWP 3912)): > #0 0x00007f1a91f74d84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f1a0ee0ae9d in leveldb::port::CondVar::Wait() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #2 0x00007f1a0ede3841 in leveldb::DBImpl::MakeRoomForWrite(bool) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ede91ad in leveldb::DBImpl::Write(leveldb::WriteOptions > const&, leveldb::WriteBatch*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0eddeca4 in eleveldb_write () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x0000000000534c16 in process_main () > #6 0x00000000004987e3 in ?? () > #7 0x0000000000595320 in ?? () > #8 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #9 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #10 0x0000000000000000 in ?? () > > Thread 20 (Thread 0x7f19fc727700 (LWP 3967)): > #0 0x00007f1a0ee05a67 in leveldb::crc32c::Extend(unsigned int, char const*, > unsigned long) () from /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #1 0x00007f1a0ee012b9 in > leveldb::TableBuilder::WriteRawBlock(leveldb::Slice const&, > leveldb::CompressionType, leveldb::BlockHandle*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #2 0x00007f1a0ee01444 in > leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*, > leveldb::BlockHandle*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ee015e4 in leveldb::TableBuilder::Flush() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0ee0178b in leveldb::TableBuilder::Add(leveldb::Slice const&, > leveldb::Slice const&) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x00007f1a0ede7cad in > leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #6 0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #7 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #8 0x00007f1a0ee06c1e in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #9 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #10 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #11 0x0000000000000000 in ?? () > > When I looked at thread 20 in the process again, the stack has shown some > Snappy compressions, and many later inspections have shown call to > fdatasync(2), > which was replaced by some more compaction work. Thread 24 still sits in > leveldb::DBImpl::MakeRoomForWrite. > > Thread 20 samples: > #0 0x00007f1a0ee0ed6d in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #1 0x00007f1a0ee0edb3 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #2 0x00007f1a0ee0f9dc in snappy::internal::CompressFragment(char const*, > unsigned long, char*, unsigned short*, int) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ee10dc1 in snappy::Compress(snappy::Source*, snappy::Sink*) > () from /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0ee1115a in snappy::RawCompress(char const*, unsigned long, > char*, unsigned long*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x00007f1a0ee014eb in > leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*, > leveldb::BlockHandle*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #6 0x00007f1a0ee015e4 in leveldb::TableBuilder::Flush() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #7 0x00007f1a0ee0178b in leveldb::TableBuilder::Add(leveldb::Slice const&, > leveldb::Slice const&) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #8 0x00007f1a0ede7cad in > leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #9 0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #10 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #11 0x00007f1a0ee06c1e in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #12 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #13 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #14 0x0000000000000000 in ?? () > > #0 0x00007f1a91a8fa5d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007f1a0ee08d64 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #2 0x00007f1a0ede3357 in > leveldb::DBImpl::FinishCompactionOutputFile(leveldb::DBImpl::CompactionState*, > leveldb::Iterator*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ede7e6e in > leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #6 0x00007f1a0ee06c1e in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #7 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #8 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #9 0x0000000000000000 in ?? () > > #0 0x00007f1a0ee05765 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #1 0x00007f1a0ee0b6da in > leveldb::InternalKeyComparator::Compare(leveldb::Slice const&, > leveldb::Slice const&) const () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #2 0x00007f1a0ee00218 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ee006aa in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0ede7ccd in > leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #6 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #7 0x00007f1a0ee06c1e in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #8 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #9 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #10 0x0000000000000000 in ?? () > > #0 0x00007f1a0eb61dcb in std::string::_M_mutate(unsigned long, unsigned > long, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #1 0x00007f1a0eb61e1c in std::string::_M_replace_safe(unsigned long, > unsigned long, char const*, unsigned long) () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #2 0x00007f1a0ee03559 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #3 0x00007f1a0ee037bd in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #4 0x00007f1a0ee00680 in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #5 0x00007f1a0ede7ccd in > leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #6 0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #7 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #8 0x00007f1a0ee06c1e in ?? () from > /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so > #9 0x00007f1a91f70e9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #10 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #11 0x0000000000000000 in ?? () > > Software used: > > OS: Ubuntu 12.04 LTS, amd64 > Riak: riak_1.2.1rc2, installed from the Basho-provided deb package > client accesses Riak via riak-erlang-client 1.2.1 > > Any hints? > > Thanks, Jan > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com