Vladimir,

I asked around the Basho chat room and you have a crash that has never been 
seen.  This should be interesting.

The crash is happening during a compaction, specifically during the creation of 
the bloom filter for a new .sst file.  Maybe we can isolate the old file that 
feeding this compaction and move it out of the way for further debugging … and 
get you running while the debugging happens off-line.

Would you tar/zip the following files (changing the paths as appropriate for 
your system):

tar -czf vladimir_LOGs.tgz /var/lib/riak/leveldb/*/LOG*
and your app.config file.

I will see if I can determine where the bad input file resides and help you get 
back running.  Then we can decide how to look deeper for root cause.

Matthew


On Jun 12, 2013, at 4:02 PM, Vladimir Shabanov <vshaban...@gmail.com> wrote:

> Hello,
> 
> I have a cluster of 8 Riak-1.3.1 nodes. Recently one of my nodes silently 
> crashed. Nothing unusual was reported in logs.
> 
> When I've tried to start my node again it worked for few seconds and silently 
> crashed again. I've run 'riak console' and seen "Segmentation fault".
> 
> gdb with dumped core shows:
> 
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f162547fa30 in MurmurHash64A(void const*, int, unsigned int) ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> 
> Backtrace shows that it happens somewhere in LevelDB compaction.
> 
> (gdb) bt
> #0  0x00007f162547fa30 in MurmurHash64A(void const*, int, unsigned int) ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #1  0x00007f162547833c in leveldb::(anonymous 
> namespace)::BloomFilterPolicy2::CreateFilter(leveldb::Slice const*, int, 
> std::string*) const ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #2  0x00007f162548382d in leveldb::FilterBlockBuilder::GenerateFilter() ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #3  0x00007f1625483a58 in leveldb::FilterBlockBuilder::StartBlock(unsigned 
> long) ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #4  0x00007f1625475175 in leveldb::TableBuilder::Flush() ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #5  0x00007f1625475395 in leveldb::TableBuilder::Add(leveldb::Slice const&, 
> leveldb::Slice const&) () from 
> /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #6  0x00007f162545b561 in 
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from 
> /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #7  0x00007f162545bd3b in leveldb::DBImpl::BackgroundCompaction() ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #8  0x00007f162545ca5d in leveldb::DBImpl::BackgroundCall() ()
>    from /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #9  0x00007f162547bb38 in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tank/riak-1.3.1/lib/eleveldb-1.3.0/priv/eleveldb.so
> #10 0x00007f163366ab50 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #11 0x00007f16331aca7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #12 0x0000000000000000 in ?? ()
> 
> gdb output in gist
> https://gist.github.com/vshabanov/5768546
> 
> Why it's happening and how to bring the node back to life?
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to