I had a thought that you could compare the
file's ctime to the process' stime, but apparently those will not
be equal. Just posting in case it sparks an idea for anyone else:jesse@minerva:~> sudo !! sudo cat /var/lib/riaksearch/bitcask/570899077082383952423314387779798054553098649600/bitcask.write.lock [sudo] password for jesse: 16510 /var/lib/riaksearch/bitcask/570899077082383952423314387779798054553098649600/1304623349.bitcask.data jesse@minerva:~> ls -lc /var/lib/riaksearch/bitcask/570899077082383952423314387779798054553098649600/bitcask.write.lock -rw------- 1 riak riak 107 2011-05-05 15:17 /var/lib/riaksearch/bitcask/570899077082383952423314387779798054553098649600/bitcask.write.lockjesse@minerva:~> ps -eo pid,lstart | egrep ^16510 16510 Wed May 4 22:00:45 2011 On 05/10/2011 09:11 AM, Greg Nelson wrote: Would it work to change /usr/sbin/riak to delete stray .lock files on start?Sent from my iPhone On May 10, 2011, at 4:10 AM, Nico Meyer <nico.me...@adition.com> wrote:Hi again! I just encountered this problem again myself, so I was able to check my theory. So one of the bitcask.write.lock files contained this: 2272 /var/lib/riak/bitcask/1121816686466884466511812771987303177196838846464/1305008752.bitcask.data and sure enough 'ps axu' gives me: riak 2269 0.0 0.0 10624 396 ? S 12:46 0:00 inet_gethost 4 riak 2270 0.0 0.0 10624 432 ? S 12:46 0:00 inet_gethost 4 riak 2271 0.0 0.0 10624 432 ? S 12:46 0:00 inet_gethost 4 riak 2272 0.0 0.0 10624 384 ? S 12:46 0:00 inet_gethost 4 root 3139 0.0 0.0 0 0 ? S 13:00 0:00 [flush-254:1] Cheers, Nico Am 10.05.2011 03:07, schrieb Gary William Flake:(Removing riak-users.) This was on an Umbuntu 10.04 box. Riaksearch was auto started in init.d but we occasionally start/stop the service as part of our application stack. In this one case, we did a shutdown from an admin web console, which may have not called the proper shutdown procedures in init.d. On restart, I noticed the issues and found the locked files. Removing them did the trick. -- GWF On May 9, 2011, at 7:10 AM, David Smith wrote:Hmm...ok. Will have to ponder how we can fix that. Thanks! D. On Mon, May 9, 2011 at 8:09 AM, Nico Meyer<nico.me...@adition.com> wrote:Hi Dave, I believe problem occours if there happens to be another process with the same PID as the old (now gone) riak node. This can happen if the machine was rebooted since the riak node crashed or if the PIDs wrapped, they are only two bytes after all. os_pid_exists/1 only checks for ANY process with the PID from the lockfile (https://github.com/basho/bitcask/blob/master/src/bitcask_lockops.erl#L116). Am Montag, den 09.05.2011, 07:06 -0600 schrieb David Smith:On Sat, May 7, 2011 at 9:25 AM, Gary William Flake<g...@flake.org> wrote:That was it, Nico. Thanks. I know we did a forced shutdown this week, which was probably the cause. But I would have thought that riak would have taken care of its own lock file bookkeeping on restarting.Bitcask does: https://github.com/basho/bitcask/blob/master/src/bitcask_lockops.erl#L46 It's curious that the logic didn't handle the case. What platform/OS are you on? Are you using init scripts to restart on boot? Thanks, D.-- Dave Smith Director, Engineering Basho Technologies, Inc. diz...@basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com |
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com