Hi Antti,

Riak is not tested on btrfs and the file system is not officially
supported. We recommend ext4 or xfs for Linux. ZFS is an option on
Solaris derivatives and FreeBSD.

--
Luke Bakken
Engineer
lbak...@basho.com


On Fri, Dec 18, 2015 at 6:14 AM, Antti Kuusela
<antti.kuus...@firstbeat.fi> wrote:
> Hi,
>
> I have been testing Riak and Riak CS as a possible solution for our future
> storage needs. I have a five server cluster running Centos 7. Riak version
> is 2.1.3 (first installed as 2.1.1, updated twice via Basho repo) and Riak
> CS version is 2.1.0. The servers each have 64GB RAM and six 4TB disks in
> raid 6 using btrfs.
>
> I have been pushing random data into Riak-CS via s3cmd to see how the system
> behaves. Smallest objects have been 2000 bytes, largest 100MB. I have also
> been making btrfs snapshots of the entire platform data dir nightly for
> backup purposes. Stop Riak CS, wait 10 seconds, stop Riak, wait 10, make
> snapshot, start Riak, wait 180 seconds, start Riak CS. This is performed on
> each of the servers in turn with a five minute wait in between. I have added
> the waits to try spread the startup load and allow the system time to get
> things running. New data is constantly pushed to the S3 API but restarting
> the nodes in rotation causes by far the highest stress on the system.
>
> I have encountered one problem in particular. Quite often one of the Riak
> nodes starts up but after a couple of minutes it just drops, all processes
> exited except for epmd.
>
> Following is from /var/log/riak/console, most of the lines skipped for sake
> of brevity. Normal startup stuff, as far as I can see:
>
> 2015-12-16 00:26:04.446 [info] <0.7.0> Application lager started on node
> 'riak@192.168.50.32'
> ...
> 2015-12-16 00:26:04.490 [info] <0.72.0> alarm_handler:
> {set,{system_memory_high_watermark,[]}}
> ...
> 2015-12-16 00:26:04.781 [info]
> <0.206.0>@riak_core_capability:process_capability_changes:555 New
> capability: {riak_core,vnode_routing} = proxy
> ...
> 2015-12-16 00:26:04.869 [info] <0.7.0> Application riak_core started on node
> 'riak@192.168.50.32'
> ...
> 2015-12-16 00:26:04.969 [info] <0.407.0>@riak_kv_env:doc_env:46 Environment
> and OS variables:
> 2015-12-16 00:26:05.124 [warning] <0.6.0> lager_error_logger_h dropped 9
> messages in the last second that exceeded the limit of 100 messages/sec
> 2015-12-16 00:26:05.124 [info] <0.407.0> riak_kv_env: Open file limit: 65536
> 2015-12-16 00:26:05.124 [warning] <0.407.0> riak_kv_env: Cores are disabled,
> this may hinder debugging
> 2015-12-16 00:26:05.124 [info] <0.407.0> riak_kv_env: Erlang process limit:
> 262144
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: Erlang ports limit:
> 65536
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: ETS table count limit:
> 256000
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: Thread pool size: 64
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: Generations before
> full sweep: 0
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: Schedulers: 12 for 12
> cores
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: sysctl vm.swappiness
> is 0 greater than or equal to 0)
> 2015-12-16 00:26:05.125 [info] <0.407.0> riak_kv_env: sysctl
> net.core.wmem_default is 8388608 lesser than or equal to 8388608)
> ...
> 2015-12-16 00:26:05.139 [info] <0.478.0>@riak_core:wait_for_service:504
> Waiting for service riak_kv to start (0 seconds)
> 2015-12-16 00:26:05.158 [info]
> <0.495.0>@riak_kv_entropy_manager:set_aae_throttle_limits:790 Setting AAE
> throttle limits: [{-1,0},{200,10},{500,50},{750,250},{900,1000},{1100,5000}]
> ...
> 2015-12-16 00:26:30.160 [info]
> <0.495.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:853 Changing
> AAE throttle from undefined -> 5000 msec/key, based on maximum vnode mailbox
> size {unknown_mailbox_sizes,node_list,['riak@192.168.50.32']} from
> ['riak@192.168.50.32']
> 2015-12-16 00:27:12.053 [info] <0.478.0>@riak_core:wait_for_service:504
> Waiting for service riak_kv to start (60 seconds)
> 2015-12-16 00:28:25.057 [info] <0.478.0>@riak_core:wait_for_service:504
> Waiting for service riak_kv to start (120 seconds)
>
> And then nothing
>
> From /var/log/messages:
>
> Dec 16 00:26:02 storage2 su: (to riak) root on none
> Dec 16 00:26:04 storage2 riak[48174]: Starting up
> Dec 16 00:28:59 storage2 kernel: traps: beam.smp[48492] general protection
> ip:7fcaf9402f16 sp:7fca6affcdd0 error:0 in eleveldb.so[7fcaf93b5000+93000]
> Dec 16 00:28:59 storage2 run_erl[48172]: Erlang closed the connection.
>
> On another node at a different time /var/log/riak/console.log had similar
> messages, and also some warnings about invalid hint files, such as:
>
> 2015-12-13 00:15:41.232 [warning] <0.815.0> Hintfile
> '/data/riak/bitcask/570899077082383952423314387779798054553098649600/56.bitcask.hint'
> invalid
>
> In this latter example riak was started with "systemctl start riak" rather
> than "riak start". From /var/log/messages:
>
> Dec 13 00:15:29 storage1 riak: Starting riak: [  OK  ]
> Dec 13 00:15:29 storage1 systemd: Started SYSV: Riak is a distributed data
> store.
> Dec 13 00:15:56 storage1 kernel: beam.smp[131820]: segfault at 160 ip
> 00007f24c0902ce6 sp 00007f24337fddd0 error 4 in
> eleveldb.so[7f24c08b5000+93000]
> Dec 13 00:15:56 storage1 run_erl[131501]: Erlang closed the connection.
>
> Of reported Riak bugs, this is similar to
> https://github.com/basho/riak/issues/790 . However, the poster of that issue
> reported that his problem was fixed by repairing leveldb partitions. I
> looked at this following
> http://docs.basho.com/riak/latest/ops/running/recovery/repairing-leveldb/
> but didn't find any errors.
>
> Incidentally, I started having problems with btrfs as well. On one node
> btrfs caused a kernel crash and on another kernel killed beam.smp process
> after it stopped responding for over 120 seconds while syncing to btrfs. The
> kernel in Centos 7 probably isn't best suited for working with btrfs.
> Advertised version is 3.10.0.
>
> So, my question is what is your take on this? Is this a bug in the leveldb
> library? The same one already reported? What log data would help debug or
> reproduce it? Or is there potentially some problem with my setup? Or could
> this be caused by a bug in btrfs? What is your take on using Riak with
> btrfs?
>
> --
> Antti Kuusela, M.Sc
> Senior Software Developer
> Firstbeat Technologies Ltd.
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to