Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-12 Thread Matthew Anderson
Hi Andreas,

I think we're both working on the same thing, I've just changed the
function calls over to rsockets in the source instead of using the pre-load
library. It explains why we're having the exact same problem!

>From what I've been able to tell the entire problem revolves around
rsockets not supporting POLLRDHUP. As far as I can tell the pipe will only
be removed when tcp_read_wait returns -1. With rsockets it never receives
the POLLRDHUP event after shutdown_socket() is called so the rpoll call
blocks until timeout (900 seconds) and the pipe stays active.

The question then would be how can we destroy a pipe without relying on
POLLRDHUP? shutdown_socket() always gets called when the socket should be
closed so could there might be a way to trick tcp_read_wait() into
returning -1 by doing somethere in shutdown_socket() but I'm not sure how
to go about it.

Any ideas?



On Mon, Aug 12, 2013 at 1:55 PM, Andreas Bluemle <
andreas.blue...@itxperts.de> wrote:

> Hi Matthew,
>
>
> On Fri, 9 Aug 2013 09:11:07 +0200
> Matthew Anderson  wrote:
>
> > So I've had a chance to re-visit this since Bécholey Alexandre was
> > kind enough to let me know how to compile Ceph with the RDMACM
> > library (thankyou again!).
> >
> > At this stage it compiles and runs but there appears to be a problem
> > with calling rshutdown in Pipe as it seems to just wait forever for
> > the pipe to close which causes commands like 'ceph osd tree' to hang
> > indefinitely after they work successfully. Debug MS is here -
> > http://pastebin.com/WzMJNKZY
> >
>
> I am currently looking at a very similar problem.
> My test setup is to start ceph-mon monitors and check their state
> using "ceph mon stat".
>
> The monitors (3 instances) and the "ceph mon stat" command are started
> with LD_PRELOAD=.
>
> The behaviour is that the "ceph mon stat" command connects, sends the
> request and receives the answer, which shows a healthy state for the
> monitors. But the "ceph mon stat" does not terminate.
>
> On the monitor end I encounter an EOPNOTSUPP being set at the time
> the connection shall terminate. This is detected in the
> Pipe::tcp_read_wait() where the socket is poll'ed for IN and HUP events.
>
> What I have found out already is that it is not the poll() / rpoll()
> which set the error: they do return a HUP event and are happy.
> As far as I can tell, the fact of the EOPNOTSUPP being set is historical
> at that point, i.e. it must have been set at some earlier stage.
>
> I am using ceph v0.61.7.
>
>
> Best Regards
>
> Andreas
>
>
> > I also tried RADOS bench but it appears to be doing something
> > similar. Debug MS is here - http://pastebin.com/3aXbjzqS
> >
> > It seems like it's very close to working... I must be missing
> > something small that's causing some grief. You can see the OSD coming
> > up in the ceph monitor and the PG's all become active+clean. When
> > shutting down the monitor I get the below which show's it waiting for
> > the pipes to close -
> >
> > 2013-08-09 15:08:31.339394 7f4643cfd700 20 accepter.accepter closing
> > 2013-08-09 15:08:31.382075 7f4643cfd700 10 accepter.accepter stopping
> > 2013-08-09 15:08:31.382115 7f464bd397c0 20 -- 172.16.0.1:6789/0 wait:
> > stopped accepter thread 2013-08-09 15:08:31.382127 7f464bd397c0 20 --
> > 172.16.0.1:6789/0 wait: stopping reaper thread 2013-08-09
> > 15:08:31.382146 7f4645500700 10 -- 172.16.0.1:6789/0 reaper_entry
> > done 2013-08-09 15:08:31.382182 7f464bd397c0 20 -- 172.16.0.1:6789/0
> > wait: stopped reaper thread 2013-08-09 15:08:31.382194 7f464bd397c0
> > 10 -- 172.16.0.1:6789/0 wait: closing pipes 2013-08-09
> > 15:08:31.382200 7f464bd397c0 10 -- 172.16.0.1:6789/0 reaper
> > 2013-08-09 15:08:31.382205 7f464bd397c0 10 -- 172.16.0.1:6789/0
> > reaper done 2013-08-09 15:08:31.382210 7f464bd397c0 10 --
> > 172.16.0.1:6789/0 wait: waiting for pipes
> > 0x3014c80,0x3015180,0x3015400 to close
> >
> > The git repo has been updated if anyone has a few spare minutes to
> > take a look - https://github.com/funkBuild/ceph-rsockets
> >
> > Thanks again
> > -Matt
> >
> >
> >
> >
> >
> > On Thu, Jun 20, 2013 at 5:09 PM, Matthew Anderson
> >  wrote: Hi All,
> >
> > I've had a few conversations on IRC about getting RDMA support into
> > Ceph and thought I would give it a quick attempt to hopefully spur
> > some interest. What I would like to accomplish is an RSockets only
> > implementation so I'm able to use Ceph, RBD and QEMU at full speed
> > over an Infiniband fabric.
> >
> > What I've tried to do is port Pipe.cc and Acceptor.cc to rsockets by
> > replacing the regular socket calls with the rsocket equivalent.
> > Unfortunately it doesn't compile and I get an error of -
> >
> >  CXXLD  ceph-osd
> > ./.libs/libglobal.a(libcommon_la-Accepter.o): In function
> > `Accepter::stop()':
> /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:243:
> > undefined reference to
> > `rshutdown'
> /home/matt/Desktop/ceph-0.61.3-rsockets/src/msg/Accepter.cc:251:
> > undefined refer

[ceph-users] Start Stop OSD

2013-08-12 Thread Joshua Young
I have 2 issues that I can not find a solution to.
First: I am unable to stop / start any osd by command. I have deployed with 
ceph-deploy on Ubuntu 13.04 and everything seems to be working find. I have 5 
hosts 5 mons and 20 osds.

Using initctl list | grep ceph gives me


ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/cloud4) start/running, process 1841
ceph-create-keys stop/waiting
ceph-osd (ceph/15) start/running, process 2122
ceph-mds stop/waiting

However OSD 12 13 14 15 are all on this server.

sudo stop ceph-osd id=12
gives me stop: Unknown instance: ceph/12

Does anyone know what is wrong? Nothing in logs.

Also, when trying to put the journal on an SSD everything works fine. I can add 
all 4 disks per host to the same SSD. The issue is when I restart the server, 
only 1 out of the 3 OSDs will come back up. Has anyone else had this issue?

Thanks!
Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FW: [ANN] ceph-deploy v1.2 has been released!

2013-08-12 Thread Alfredo Deza
On Sun, Aug 11, 2013 at 9:31 PM, Harvey Skinner  wrote:

>  Hello Alfredo,
>
> when do you expect this updated version of ceph-deploy to make it into
> a cuttlefish release?I would like to give this updated version a
> try while I am working on deployment of a Ceph environment using
> ceph-deploy, but I don't have any experience with the Python install
> tools.
>

We have shifted how we are doing releases of ceph-deploy so they can be
decoupled from regular ceph releases.

So there is really no reason you should wait for a ceph release. We should
have RPM+DEB packages ready. And from now on, that
should be the case for ceph-deploy releases (Python package + RPM + DEB).

Let me confirm we got the other packages out and how you can get them. I
will also update the docs so this is a bit more clear :)



>
> thank you,
> Harvey
>
> >
> > From: ceph-users-boun...@lists.ceph.com
> > [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alfredo Deza
> > Sent: Friday, August 09, 2013 5:52 PM
> > To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
> > Subject: [ceph-users] [ANN] ceph-deploy v1.2 has been released!
> >
> >
> >
> > I am very pleased to announce the release of ceph-deploy to the Python
> > Package Index.
> >
> > The OS packages are yet to come, I will make sure to update this thread
> when
> > they do.
> >
> > For now, if you are familiar with Python install tools, you can install
> > directly from PyPI with pip or easy_install:
> >
> > pip install ceph-deploy
> >
> >
> >
> > or
> >
> > easy_install ceph-deploy
> >
> > This release includes a massive effort for better error reporting and
> > granular information in remote hosts (for `install` and `mon create`
> > commands for now).
> >
> > There were about 18 bug fixes and improvements too, including upstream
> > libraries that are used by ceph-deploy.
> >
> > If you find any issues with ceph-deploy, please make sure you let me know
> > via this list or on irc at #ceph!
> >
> > Enjoy!
> >
> > -Alfredo
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph rbd io tracking (rbdtop?)

2013-08-12 Thread Jeff Moskow
Hi,

The activity on our ceph cluster has gone up a lot.  We are using exclusively 
RBD
storage right now.

Is there a tool/technique that could be used to find out which rbd images are
receiving the most activity (something like "rbdtop")?

Thanks,
Jeff

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy and journal on separate disk

2013-08-12 Thread Pavel Timoschenkov
Hi.
I have some problems with create journal on separate disk, using ceph-deploy 
osd prepare command.
When I try execute next command:
ceph-deploy osd prepare ceph001:sdaa:sda1
where:
sdaa - disk for ceph data
sda1 - partition on ssd drive for journal
I get next errors:

ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa:sda1
ceph-disk-prepare -- /dev/sdaa /dev/sda1 returned 1
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
The operation has completed successfully.
meta-data=/dev/sdaa1 isize=2048   agcount=32, agsize=22892700 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=732566385, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=357698, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
mount: /dev/sdaa1: more filesystems detected. This should not happen,
   use -t  to explicitly specify the filesystem type or
   use wipefs(8) to clean up the device.

mount: you must specify the filesystem type
ceph-disk: Mounting filesystem failed: Command '['mount', '-o', 'noatime', 
'--', '/dev/sdaa1', '/var/lib/ceph/tmp/mnt.ek6mog']' returned non-zero exit 
status 32

Someone had a similar problem?
Thanks for the help
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] suscribe

2013-08-12 Thread 不坏阿峰

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rbd io tracking (rbdtop?)

2013-08-12 Thread Kasper Dieter
On Mon, Aug 12, 2013 at 03:19:04PM +0200, Jeff Moskow wrote:
> Hi,
> 
> The activity on our ceph cluster has gone up a lot.  We are using exclusively 
> RBD
> storage right now.
> 
> Is there a tool/technique that could be used to find out which rbd images are
> receiving the most activity (something like "rbdtop")?
I'm using (on each OSD node in a small xterm):
nmon# http://nmon.sourceforge.net/pmwiki.php
d
.

-Dieter

> 
> Thanks,
> Jeff
> 
> -- 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rbd io tracking (rbdtop?)

2013-08-12 Thread Wido den Hollander

On 08/12/2013 03:19 PM, Jeff Moskow wrote:

Hi,

The activity on our ceph cluster has gone up a lot.  We are using exclusively 
RBD
storage right now.

Is there a tool/technique that could be used to find out which rbd images are
receiving the most activity (something like "rbdtop")?



Are you using libvirt with KVM? If so, you can always poll all the disk 
I/O for all VMs using libvirt.



Thanks,
 Jeff




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Keep Crashing

2013-08-12 Thread Stephane Boisvert

  
  
Hi,
    It seems my OSD processes keep crashing randomly and I don't
know why.  It seems to happens when the cluster is trying to
re-balance... In normal usange I didn't  notice any crash like that.

We running ceph 0.61.7 on an up to date ubuntu 12.04 (all packages
including kernel are current).


Anyone have an idea ?


TRACE:


 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
 1: /usr/bin/ceph-osd() [0x79219a]
 2: (()+0xfcb0) [0x7fd692da1cb0]
 3: (gsignal()+0x35) [0x7fd69155a425]
 4: (abort()+0x17b) [0x7fd69155db8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
[0x7fd691eac69d]
 6: (()+0xb5846) [0x7fd691eaa846]
 7: (()+0xb5873) [0x7fd691eaa873]
 8: (()+0xb596e) [0x7fd691eaa96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x84303f]
 10:
(PG::RecoveryState::Recovered::Recovered(boost::statechart::state,
(boost::statechart::history_mode)0>::my_context)+0x38f)
[0x6d932f]
 11: (boost::statechart::state,
(boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr
const&,
boost::statechart::state_machine,
boost::statechart::null_exception_translator>&)+0x5c)
[0x6f270c]
 12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
const&)+0xb4) [0x6d9454]
 13:
(boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xda) [0x6f296a]
 14:
(boost::statechart::state_machine,
boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
const&)+0x5b) [0x6e320b]
 15:
(boost::statechart::state_machine,
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
const&)+0x11) [0x6e34e1]
 16:
(PG::handle_peering_event(std::tr1::shared_ptr,
PG::RecoveryCtx*)+0x347) [0x69aaf7]
 17: (OSD::process_peering_events(std::list > const&,
ThreadPool::TPHandle&)+0x2f5) [0x632fc5]
 18: (OSD::PeeringWQ::_process(std::list > const&,
ThreadPool::TPHandle&)+0x12) [0x66e2d2]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
 20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
 21: (()+0x7e9a) [0x7fd692d99e9a]
 22: (clone()+0x6d) [0x7fd691617ccd]
 NOTE: a copy of the executable, or `objdump -rdS
` is needed to interpret this.

--- begin dump of recent events ---
    -3> 2013-08-12 15:58:15.561005 7fd683d78700  1 --
10.136.48.18:6814/21240 <== osd.56 10.136.48.14:0/17437 44 
osd_ping(ping e8959 stamp 2013-08-12 15:58:15.556022) v2  47+0+0
(355096560 0 0) 0xc4e81c0 con 0x12fbeb00
    -2> 2013-08-12 15:58:15.561038 7fd683d78700  1 --
10.136.48.18:6814/21240 --> 10.136.48.14:0/17437 --
osd_ping(ping_reply e8959 stamp 2013-08-12 15:58:15.556022) v2 --
?+0 0x1683ec40 con 0x12fbeb00
    -1> 2013-08-12 15:58:15.568600 7fd67e56d700  1 --
10.136.48.18:6813/21240 --> osd.44 10.136.48.15:6820/25671 --
osd_sub_op(osd.20.0:1293 25.328
699ac328/rbd_data.ae2732ae8944a.00240828/head//25 [push] v
8424'11 snapset=0=[]:[] snapc=0=[]) v7 -- ?+0 0x2df0f400
 0> 2013-08-12 15:58:15.581608 7fd681d74700 -1 *** Caught
signal (Aborted) **
 in thread 7fd681d74700

 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
 1: /usr/bin/ceph-osd() [0x79219a]
 2: (()+0xfcb0) [0x7fd692da1cb0]
 3: (gsignal()+0x35) [0x7fd69155a425]
 4: (abort()+0x17b) [0x7fd69155db8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
[0x7fd691eac69d]
 6: (()+0xb5846) [0x7fd691eaa846]
 7: (()+0xb5873) [0x7fd691eaa873]
 8: (()+0xb596e) [0x7fd691eaa96e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0x84303f]
 10:
(PG::RecoveryState::Recovered::Recovered(boost::statechart::state,
(boost::statechart::history_mode)0>::my_context)+0x38f)
[0x6d932f]
 11: (boost::statechart::state,
(boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr
const&,
boost::statechart::state_machine,
boost::statechart::null_exception_translator>&)+0x5c)
[0x6f270c]
 12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
const&)+0xb4) [0x6d9454]
 13:
(boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xda) [0x6f296a]
 14:
(boost::statechart::state_machine,
boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
const&)+0x5b) [0x6e320b]
 15:
(boost::statechart::state_machine,
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
const&)+0x11) [0x6e34e1]
 16:
(PG::handl

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-12 Thread Andreas Bluemle
Hi Matthew,

I am not quite sure about the POLLRDHUP.
On the server side (ceph-mon), tcp_read_wait does see the
POLLHUP - which should be the indicator that the
the other side is shutting down.

I have also taken a brief look at the client side (ceph mon stat).
It initiates a shutdown - but never finishes. See attached log file
from "ceph --log-file ceph-mon-stat.rsockets --debug-ms 30 mon stat".
I have also attached the corresponding log file for regualr TCP/IP
sockets.

It looks to me that in the rsockets case, the reaper is able to cleanup
even though there is still sth. left to do - and hence the shutdown
never completes.


Best Regards

Andreas Bluemle


On Mon, 12 Aug 2013 15:11:47 +0800
Matthew Anderson  wrote:

> Hi Andreas,
> 
> I think we're both working on the same thing, I've just changed the
> function calls over to rsockets in the source instead of using the
> pre-load library. It explains why we're having the exact same problem!
> 
> From what I've been able to tell the entire problem revolves around
> rsockets not supporting POLLRDHUP. As far as I can tell the pipe will
> only be removed when tcp_read_wait returns -1. With rsockets it never
> receives the POLLRDHUP event after shutdown_socket() is called so the
> rpoll call blocks until timeout (900 seconds) and the pipe stays
> active.
> 
> The question then would be how can we destroy a pipe without relying
> on POLLRDHUP? shutdown_socket() always gets called when the socket
> should be closed so could there might be a way to trick
> tcp_read_wait() into returning -1 by doing somethere in
> shutdown_socket() but I'm not sure how to go about it.
> 
> Any ideas?
> 
> 
> 
> On Mon, Aug 12, 2013 at 1:55 PM, Andreas Bluemle <
> andreas.blue...@itxperts.de> wrote:
> 
> > Hi Matthew,
> >
> >
> > On Fri, 9 Aug 2013 09:11:07 +0200
> > Matthew Anderson  wrote:
> >
> > > So I've had a chance to re-visit this since Bécholey Alexandre was
> > > kind enough to let me know how to compile Ceph with the RDMACM
> > > library (thankyou again!).
> > >
> > > At this stage it compiles and runs but there appears to be a
> > > problem with calling rshutdown in Pipe as it seems to just wait
> > > forever for the pipe to close which causes commands like 'ceph
> > > osd tree' to hang indefinitely after they work successfully.
> > > Debug MS is here - http://pastebin.com/WzMJNKZY
> > >
> >
> > I am currently looking at a very similar problem.
> > My test setup is to start ceph-mon monitors and check their state
> > using "ceph mon stat".
> >
> > The monitors (3 instances) and the "ceph mon stat" command are
> > started with LD_PRELOAD=.
> >
> > The behaviour is that the "ceph mon stat" command connects, sends
> > the request and receives the answer, which shows a healthy state
> > for the monitors. But the "ceph mon stat" does not terminate.
> >
> > On the monitor end I encounter an EOPNOTSUPP being set at the time
> > the connection shall terminate. This is detected in the
> > Pipe::tcp_read_wait() where the socket is poll'ed for IN and HUP
> > events.
> >
> > What I have found out already is that it is not the poll() / rpoll()
> > which set the error: they do return a HUP event and are happy.
> > As far as I can tell, the fact of the EOPNOTSUPP being set is
> > historical at that point, i.e. it must have been set at some
> > earlier stage.
> >
> > I am using ceph v0.61.7.
> >
> >
> > Best Regards
> >
> > Andreas
> >
> >
> > > I also tried RADOS bench but it appears to be doing something
> > > similar. Debug MS is here - http://pastebin.com/3aXbjzqS
> > >
> > > It seems like it's very close to working... I must be missing
> > > something small that's causing some grief. You can see the OSD
> > > coming up in the ceph monitor and the PG's all become
> > > active+clean. When shutting down the monitor I get the below
> > > which show's it waiting for the pipes to close -
> > >
> > > 2013-08-09 15:08:31.339394 7f4643cfd700 20 accepter.accepter
> > > closing 2013-08-09 15:08:31.382075 7f4643cfd700 10
> > > accepter.accepter stopping 2013-08-09 15:08:31.382115
> > > 7f464bd397c0 20 -- 172.16.0.1:6789/0 wait: stopped accepter
> > > thread 2013-08-09 15:08:31.382127 7f464bd397c0 20 --
> > > 172.16.0.1:6789/0 wait: stopping reaper thread 2013-08-09
> > > 15:08:31.382146 7f4645500700 10 -- 172.16.0.1:6789/0 reaper_entry
> > > done 2013-08-09 15:08:31.382182 7f464bd397c0 20 --
> > > 172.16.0.1:6789/0 wait: stopped reaper thread 2013-08-09
> > > 15:08:31.382194 7f464bd397c0 10 -- 172.16.0.1:6789/0 wait:
> > > closing pipes 2013-08-09 15:08:31.382200 7f464bd397c0 10 --
> > > 172.16.0.1:6789/0 reaper 2013-08-09 15:08:31.382205 7f464bd397c0
> > > 10 -- 172.16.0.1:6789/0 reaper done 2013-08-09 15:08:31.382210
> > > 7f464bd397c0 10 -- 172.16.0.1:6789/0 wait: waiting for pipes
> > > 0x3014c80,0x3015180,0x3015400 to close
> > >
> > > The git repo has been updated if anyone has a few spare minutes to
> > > take a look - https://github.com/funkBuild/ceph-rsockets
> > >
> > > Thanks aga

[ceph-users] basic single node set up issue on rhel6

2013-08-12 Thread Mathew, Sijo (KFRM 1)
Hi,

I have been trying to get ceph installed on a single node. But I'm stuck with 
the following error.

[host]$ ceph-deploy -v mon create ceph-server-299
Deploying mon, cluster ceph hosts ceph-server-299
Deploying mon to ceph-server-299
Distro RedHatEnterpriseServer codename Santiago, will use sysvinit
Traceback (most recent call last):
  File "/usr/bin/ceph-deploy", line 21, in 
main()
  File "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 112, in main
return args.func(args)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/mon.py", line 234, in mon
mon_create(args)
  File "/usr/lib/python2.6/site-packages/ceph_deploy/mon.py", line 138, in 
mon_create
init=init,
  File "/usr/lib/python2.6/site-packages/pushy/protocol/proxy.py", line 255, in 

(conn.operator(type_, self, args, kwargs))
  File "/usr/lib/python2.6/site-packages/pushy/protocol/connection.py", line 
66, in operator
return self.send_request(type_, (object, args, kwargs))
  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", 
line 323, in send_request
return self.__handle(m)
  File "/usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py", 
line 639, in __handle
raise e
pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory

I saw a similar thread in the archives, but the solution given there doesn't 
seem to be that clear.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-June/002344.html

I had to install all the rpms separately as the machines that I work with don't 
have internet access and "ceph-deploy install" needs internet access. Could 
someone suggest what might be wrong here?

Environment: RHEL 6.4, ceph 0.61

Thanks,
Sijo Mathew



=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd map issues: no such file or directory (ENOENT) AND map wrong image

2013-08-12 Thread PJ
Hi All,

Before go on the issue description, here is our hardware configurations:
- Physical machine * 3: each has quad-core CPU * 2, 64+ GB RAM, HDD * 12
(500GB ~ 1TB per drive; 1 for system, 11 for OSD). ceph OSD are on physical
machines.
- Each physical machine runs 5 virtual machines. One VM as ceph MON (i.e.
totally 3 MONs), the other 4 VMs provides either iSCSI or FTP/NFS service
- Physical machines and virtual machines are based on the same software
condition: Ubuntu 12.04 + kernel 3.6.11, ceph v0.61.7


The issues we met are,

1. Right after ceph installation, create pool then create image and map is
no problem. But if we do not use the whole environment more than half day,
do the same process (create pool -> create image -> map image) will return
error: no such file or directory (ENOENT). Once the issue occurs, it could
be easily reproduce by the same process. But this issue may be disappear if
wait 10+ minutes after pool creation. Reboot system also could avoid it.

I had success and failed straces logged on the same virtual machine (the
one provides FTP/NFS):
success: https://www.dropbox.com/s/u8jc4umak24kr1y/rbd_done.txt
failed: https://www.dropbox.com/s/ycuupmmrlc4d0ht/rbd_failed.txt


2. The second issue is to create two images (AAA and BBB) under one pool
(xxx), if we map "rbd -p xxx image AAA", the result is success but it shows
BBB under /dev/rbd/xxx/. Use "rbd showmapped", it shows "AAA" of pool xxx
is mapped. I am not sure which one is really mapped because both images are
empty. This issue is hard to reproduce but once happens /dev/rbd/ are
mess-up.

One more question but not about rbd map issues. Our usage is to map one rbd
device and mount in several places (in one virtual machine) for iSCSI, FTP
and NFS, does that cause any problem to ceph operation?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Keep Crashing

2013-08-12 Thread John Wilkins
Stephane,

You should post any crash bugs with stack trace to ceph-devel
ceph-de...@vger.kernel.org.


On Mon, Aug 12, 2013 at 9:02 AM, Stephane Boisvert <
stephane.boisv...@gameloft.com> wrote:

>  Hi,
> It seems my OSD processes keep crashing randomly and I don't know
> why.  It seems to happens when the cluster is trying to re-balance... In
> normal usange I didn't  notice any crash like that.
>
> We running ceph 0.61.7 on an up to date ubuntu 12.04 (all packages
> including kernel are current).
>
>
> Anyone have an idea ?
>
>
> TRACE:
>
>
>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>  1: /usr/bin/ceph-osd() [0x79219a]
>  2: (()+0xfcb0) [0x7fd692da1cb0]
>  3: (gsignal()+0x35) [0x7fd69155a425]
>  4: (abort()+0x17b) [0x7fd69155db8b]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>  6: (()+0xb5846) [0x7fd691eaa846]
>  7: (()+0xb5873) [0x7fd691eaa873]
>  8: (()+0xb596e) [0x7fd691eaa96e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1df) [0x84303f]
>  10:
> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f]
>  11: (boost::statechart::state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr
> const&,
> boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator,
> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c]
>  12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
> const&)+0xb4) [0x6d9454]
>  13: (boost::statechart::simple_state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> const&, void const*)+0xda) [0x6f296a]
>  14: (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator,
> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
> const&)+0x5b) [0x6e320b]
>  15: (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator,
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> const&)+0x11) [0x6e34e1]
>  16: (PG::handle_peering_event(std::tr1::shared_ptr,
> PG::RecoveryCtx*)+0x347) [0x69aaf7]
>  17: (OSD::process_peering_events(std::list >
> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5]
>  18: (OSD::PeeringWQ::_process(std::list >
> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
>  19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
>  20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
>  21: (()+0x7e9a) [0x7fd692d99e9a]
>  22: (clone()+0x6d) [0x7fd691617ccd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> --- begin dump of recent events ---
> -3> 2013-08-12 15:58:15.561005 7fd683d78700  1 --
> 10.136.48.18:6814/21240 <== osd.56 10.136.48.14:0/17437 44 
> osd_ping(ping e8959 stamp 2013-08-12 15:58:15.556022) v2  47+0+0
> (355096560 0 0) 0xc4e81c0 con 0x12fbeb00
> -2> 2013-08-12 15:58:15.561038 7fd683d78700  1 --
> 10.136.48.18:6814/21240 --> 10.136.48.14:0/17437 -- osd_ping(ping_reply
> e8959 stamp 2013-08-12 15:58:15.556022) v2 -- ?+0 0x1683ec40 con 0x12fbeb00
> -1> 2013-08-12 15:58:15.568600 7fd67e56d700  1 --
> 10.136.48.18:6813/21240 --> osd.44 10.136.48.15:6820/25671 --
> osd_sub_op(osd.20.0:1293 25.328
> 699ac328/rbd_data.ae2732ae8944a.00240828/head//25 [push] v 8424'11
> snapset=0=[]:[] snapc=0=[]) v7 -- ?+0 0x2df0f400
>  0> 2013-08-12 15:58:15.581608 7fd681d74700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fd681d74700
>
>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>  1: /usr/bin/ceph-osd() [0x79219a]
>  2: (()+0xfcb0) [0x7fd692da1cb0]
>  3: (gsignal()+0x35) [0x7fd69155a425]
>  4: (abort()+0x17b) [0x7fd69155db8b]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>  6: (()+0xb5846) [0x7fd691eaa846]
>  7: (()+0xb5873) [0x7fd691eaa873]
>  8: (()+0xb596e) [0x7fd691eaa96e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1df) [0x84303f]
>  10:
> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, m

Re: [ceph-users] OSD Keep Crashing

2013-08-12 Thread Samuel Just
Can you post more of the log?  There should be a line towards the bottom
indicating the line with the failed assert.  Can you also attach ceph pg
dump, ceph osd dump, ceph osd tree?
-Sam


On Mon, Aug 12, 2013 at 11:54 AM, John Wilkins wrote:

> Stephane,
>
> You should post any crash bugs with stack trace to ceph-devel
> ceph-de...@vger.kernel.org.
>
>
> On Mon, Aug 12, 2013 at 9:02 AM, Stephane Boisvert <
> stephane.boisv...@gameloft.com> wrote:
>
>>  Hi,
>> It seems my OSD processes keep crashing randomly and I don't know
>> why.  It seems to happens when the cluster is trying to re-balance... In
>> normal usange I didn't  notice any crash like that.
>>
>> We running ceph 0.61.7 on an up to date ubuntu 12.04 (all packages
>> including kernel are current).
>>
>>
>> Anyone have an idea ?
>>
>>
>> TRACE:
>>
>>
>>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>>  1: /usr/bin/ceph-osd() [0x79219a]
>>  2: (()+0xfcb0) [0x7fd692da1cb0]
>>  3: (gsignal()+0x35) [0x7fd69155a425]
>>  4: (abort()+0x17b) [0x7fd69155db8b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>>  6: (()+0xb5846) [0x7fd691eaa846]
>>  7: (()+0xb5873) [0x7fd691eaa873]
>>  8: (()+0xb596e) [0x7fd691eaa96e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1df) [0x84303f]
>>  10:
>> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state> PG::RecoveryState::Active, boost::mpl::list> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f]
>>  11: (boost::statechart::state> PG::RecoveryState::Active, boost::mpl::list> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr
>> const&,
>> boost::statechart::state_machine> PG::RecoveryState::Initial, std::allocator,
>> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c]
>>  12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
>> const&)+0xb4) [0x6d9454]
>>  13: (boost::statechart::simple_state> PG::RecoveryState::Active, boost::mpl::list> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
>> const&, void const*)+0xda) [0x6f296a]
>>  14:
>> (boost::statechart::state_machine> PG::RecoveryState::Initial, std::allocator,
>> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
>> const&)+0x5b) [0x6e320b]
>>  15:
>> (boost::statechart::state_machine> PG::RecoveryState::Initial, std::allocator,
>> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
>> const&)+0x11) [0x6e34e1]
>>  16: (PG::handle_peering_event(std::tr1::shared_ptr,
>> PG::RecoveryCtx*)+0x347) [0x69aaf7]
>>  17: (OSD::process_peering_events(std::list >
>> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5]
>>  18: (OSD::PeeringWQ::_process(std::list >
>> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
>>  19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
>>  20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
>>  21: (()+0x7e9a) [0x7fd692d99e9a]
>>  22: (clone()+0x6d) [0x7fd691617ccd]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
>> to interpret this.
>>
>> --- begin dump of recent events ---
>> -3> 2013-08-12 15:58:15.561005 7fd683d78700  1 --
>> 10.136.48.18:6814/21240 <== osd.56 10.136.48.14:0/17437 44 
>> osd_ping(ping e8959 stamp 2013-08-12 15:58:15.556022) v2  47+0+0
>> (355096560 0 0) 0xc4e81c0 con 0x12fbeb00
>> -2> 2013-08-12 15:58:15.561038 7fd683d78700  1 --
>> 10.136.48.18:6814/21240 --> 10.136.48.14:0/17437 -- osd_ping(ping_reply
>> e8959 stamp 2013-08-12 15:58:15.556022) v2 -- ?+0 0x1683ec40 con 0x12fbeb00
>> -1> 2013-08-12 15:58:15.568600 7fd67e56d700  1 --
>> 10.136.48.18:6813/21240 --> osd.44 10.136.48.15:6820/25671 --
>> osd_sub_op(osd.20.0:1293 25.328
>> 699ac328/rbd_data.ae2732ae8944a.00240828/head//25 [push] v 8424'11
>> snapset=0=[]:[] snapc=0=[]) v7 -- ?+0 0x2df0f400
>>  0> 2013-08-12 15:58:15.581608 7fd681d74700 -1 *** Caught signal
>> (Aborted) **
>>  in thread 7fd681d74700
>>
>>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>>  1: /usr/bin/ceph-osd() [0x79219a]
>>  2: (()+0xfcb0) [0x7fd692da1cb0]
>>  3: (gsignal()+0x35) [0x7fd69155a425]
>>  4: (abort()+0x17b) [0x7fd69155db8b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>>  6: (()+0xb5846) [0x7fd691eaa846]
>>  7: (()+0xb5873) [0x7fd691eaa873]
>>  8: (()+0xb596e) [0x7fd691eaa96e]
>>  9: 

Re: [ceph-users] ceph-deploy and journal on separate disk

2013-08-12 Thread Samuel Just
Did you try using ceph-deploy disk zap ceph001:sdaa first?
-Sam

On Mon, Aug 12, 2013 at 6:21 AM, Pavel Timoschenkov
 wrote:
> Hi.
>
> I have some problems with create journal on separate disk, using ceph-deploy
> osd prepare command.
>
> When I try execute next command:
>
> ceph-deploy osd prepare ceph001:sdaa:sda1
>
> where:
>
> sdaa – disk for ceph data
>
> sda1 – partition on ssd drive for journal
>
> I get next errors:
>
> 
>
> ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa:sda1
>
> ceph-disk-prepare -- /dev/sdaa /dev/sda1 returned 1
>
> Information: Moved requested sector from 34 to 2048 in
>
> order to align on 2048-sector boundaries.
>
> The operation has completed successfully.
>
> meta-data=/dev/sdaa1 isize=2048   agcount=32, agsize=22892700
> blks
>
>  =   sectsz=512   attr=2, projid32bit=0
>
> data =   bsize=4096   blocks=732566385, imaxpct=5
>
>  =   sunit=0  swidth=0 blks
>
> naming   =version 2  bsize=4096   ascii-ci=0
>
> log  =internal log   bsize=4096   blocks=357698, version=2
>
>  =   sectsz=512   sunit=0 blks, lazy-count=1
>
> realtime =none   extsz=4096   blocks=0, rtextents=0
>
>
>
> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same
> device as the osd data
>
> mount: /dev/sdaa1: more filesystems detected. This should not happen,
>
>use -t  to explicitly specify the filesystem type or
>
>use wipefs(8) to clean up the device.
>
>
>
> mount: you must specify the filesystem type
>
> ceph-disk: Mounting filesystem failed: Command '['mount', '-o', 'noatime',
> '--', '/dev/sdaa1', '/var/lib/ceph/tmp/mnt.ek6mog']' returned non-zero exit
> status 32
>
>
>
> Someone had a similar problem?
>
> Thanks for the help
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mounting a pool via fuse

2013-08-12 Thread Samuel Just
Can you elaborate on what behavior you are looking for?
-Sam

On Fri, Aug 9, 2013 at 4:37 AM, Georg Höllrigl
 wrote:
> Hi,
>
> I'm using ceph 0.61.7.
>
> When using ceph-fuse, I couldn't find a way, to only mount one pool.
>
> Is there a way to mount a pool - or is it simply not supported?
>
>
>
> Kind Regards,
> Georg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Can you attach the output of ceph osd tree?

Also, can you run

ceph osd getmap -o /tmp/osdmap

and attach /tmp/osdmap?
-Sam

On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
> Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
> then restarting it, waiting 2 minutes and then doing the next one (all OSD's
> eventually restarted).  I tried this twice.
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] run ceph without auth

2013-08-12 Thread Samuel Just
I have referred you to someone more conversant with the details of
mkcephfs, but for dev purposes, most of us use the vstart.sh script in
src/ (http://ceph.com/docs/master/dev/).
-Sam

On Fri, Aug 9, 2013 at 2:59 AM, Nulik Nol  wrote:
> Hi,
> I am configuring a single node for developing purposes, but ceph asks
> me for keyring. Here is what I do:
>
> [root@localhost ~]# mkcephfs -c /usr/local/etc/ceph/ceph.conf
> --prepare-monmap -d /tmp/foo
> preparing monmap in /tmp/foo/monmap
> /usr/local/bin/monmaptool --create --clobber --add a 127.0.0.1:6789
> --print /tmp/foo/monmap
> /usr/local/bin/monmaptool: monmap file /tmp/foo/monmap
> /usr/local/bin/monmaptool: generated fsid 7bd045a6-ca45-4f12-b9f3-e0c76718859a
> epoch 0
> fsid 7bd045a6-ca45-4f12-b9f3-e0c76718859a
> last_changed 2013-08-09 04:51:06.921996
> created 2013-08-09 04:51:06.921996
> 0: 127.0.0.1:6789/0 mon.a
> /usr/local/bin/monmaptool: writing epoch 0 to /tmp/foo/monmap (1 monitors)
> \nWARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please
> see: \n http://github.com/ceph/ceph-deploy
> [root@localhost ~]# mkcephfs --init-local-daemons osd -d /tmp/foo
> \nWARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please
> see: \n http://github.com/ceph/ceph-deploy
> [root@localhost ~]# mkcephfs --init-local-daemons mds -d /tmp/foo
> \nWARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please
> see: \n http://github.com/ceph/ceph-deploy
> [root@localhost ~]# mkcephfs --prepare-mon -d /tmp/foo
> Building generic osdmap from /tmp/foo/conf
> /usr/local/bin/osdmaptool: osdmap file '/tmp/foo/osdmap'
> /usr/local/bin/osdmaptool: writing epoch 1 to /tmp/foo/osdmap
> Generating admin key at /tmp/foo/keyring.admin
> creating /tmp/foo/keyring.admin
> Building initial monitor keyring
> cat: /tmp/foo/key.*: No such file or directory
> \nWARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please
> see: \n http://github.com/ceph/ceph-deploy
> [root@localhost ~]#
>
> How can I tell ceph to do not use keyring ?
>
> This is my config file:
>
> [global]
> auth cluster required = none
> auth service required = none
> auth client required = none
> debug filestore = 20
> [mon]
> mon data = /data/mon
>
> [mon.a]
> host = s1
> mon addr = 127.0.0.1:6789
>
> [osd]
> osd journal size = 1000
> filestore_xattr_use_omap = true
>
> [osd.0]
> host = s1
> osd data = /data/osd/osd1
> osd mkfs type = bttr
> osd journal = /data/journal/log
> devs = /dev/loop0
>
> [mds.a]
> host = s1
>
>
> TIA
> Nulik
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

I've attached both files.

Thanks!
Jeff

On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
> Can you attach the output of ceph osd tree?
> 
> Also, can you run
> 
> ceph osd getmap -o /tmp/osdmap
> 
> and attach /tmp/osdmap?
> -Sam
> 
> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
> > Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
> > then restarting it, waiting 2 minutes and then doing the next one (all OSD's
> > eventually restarted).  I tried this twice.
> >
> > --
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

# idweight  type name   up/down reweight
-1  14.61   root default
-3  14.61   rack unknownrack
-2  2.783   host ceph1
0   0.919   osd.0   up  1   
1   0.932   osd.1   up  1   
2   0.932   osd.2   up  0   
-5  2.783   host ceph2
3   0.919   osd.3   down0   
4   0.932   osd.4   up  1   
5   0.932   osd.5   up  1   
-4  3.481   host ceph3
10  0.699   osd.10  up  1   
6   0.685   osd.6   up  1   
7   0.699   osd.7   up  1   
8   0.699   osd.8   up  1   
9   0.699   osd.9   up  1   
-6  2.783   host ceph4
14  0.919   osd.14  down0   
15  0.932   osd.15  up  1   
16  0.932   osd.16  down0   
-7  2.782   host ceph5
11  0.92osd.11  up  0   
12  0.931   osd.12  up  1   
13  0.931   osd.13  up  1   



osdmap
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Backup monmap, osdmap, and crushmap

2013-08-12 Thread Joao Eduardo Luis

On 08/08/13 15:21, Craig Lewis wrote:

I've seen a couple posts here about broken clusters that had to repair
by modifing the monmap, osdmap, or the crush rules.

The old school sysadmin in me says it would be a good idea to make
backups of these 3 databases.  So far though, it seems like everybody
was able to repair their clusters by dumping the current map and
modifying it.

I'll probably do it, just to assuage my paranoia, but I was wondering
what you guys thought.


Well, this could get you *some* infos, but you wouldn't be able to 
reconstruct a monitor this way.  There's just way too many maps that 
you'd need to reconstruct the monitor.



The not-so-best approach would be to grab all map epochs, from 1 to the 
map's current epoch.  We don't currently have a way to expose to the 
user what is the first available map epoch in the store (the need for it 
never came up), so for now you'd have to start at 1 and increment it 
until you'd find an existing version (we trim old versions, so that 
could be at 1, 10k, or a few hundred thousands, depending on how many 
maps you have).  With all that information, you could somehow 
reconstruct a monitor with some effort -- and even so, we currently only 
expose an interface to obtain maps for some services such as the mon, 
osd, pg and mds;  we have a bunch of other versions kept in the monitor 
that are not currently exposed to the user.



This is something we definitely want to improve on, but as of this 
moment the best approach to backup monitors reliably would be to stop 
the monitor, copy the store, and restart the monitor.  Assuming you have 
3+ monitors, stopping just one of them wouldn't affect the quorum or 
cluster availability.  And assuming you're backing up a monitor that is 
in the quorum, then backing it up is as good as backing any other monitor.



Hope this helps.


  -Joao





I'm thinking of cronning this on the MON servers:
#!/usr/bin/env bash

# Number of days to keep backups
cleanup_age="10"

# Fetch the current timestamp, to use in the backup filenames
date=$(date +"%Y-%m-%dT%H:%M:%S")

# Dump the current maps
cd /var/lib/ceph/backups/
ceph mon getmap -o ./monmap.${date}
ceph osd getmap -o ./osdmap.${date}
ceph osd getcrushmap -o ./crushmap.${date}

# Delete old maps
find . -type f -regextype posix-extended -regex
'\./(mon|osd|crush)map\..*' -mtime +${cleanup_age} -print0 | xargs -0 rm




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Are you using any kernel clients?  Will osds 3,14,16 be coming back?
-Sam

On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow  wrote:
> Sam,
>
> I've attached both files.
>
> Thanks!
> Jeff
>
> On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
>> Can you attach the output of ceph osd tree?
>>
>> Also, can you run
>>
>> ceph osd getmap -o /tmp/osdmap
>>
>> and attach /tmp/osdmap?
>> -Sam
>>
>> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
>> > Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
>> > then restarting it, waiting 2 minutes and then doing the next one (all 
>> > OSD's
>> > eventually restarted).  I tried this twice.
>> >
>> > --
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-12 Thread Samuel Just
I think the docs you are looking for are
http://ceph.com/docs/master/man/8/cephfs/ (specifically the set_layout
command).
-Sam

On Thu, Aug 8, 2013 at 7:48 AM, Da Chun  wrote:
> Hi list,
> I saw the info about data striping in
> http://ceph.com/docs/master/architecture/#data-striping .
> But couldn't find the way to set these values.
>
> Could you please tell me how to that or give me a link? Thanks!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Start Stop OSD

2013-08-12 Thread Dan Mick



On 08/12/2013 04:49 AM, Joshua Young wrote:

I have 2 issues that I can not find a solution to.

First: I am unable to stop / start any osd by command. I have deployed
with ceph-deploy on Ubuntu 13.04 and everything seems to be working
find. I have 5 hosts 5 mons and 20 osds.

Using initctl list | grep ceph gives me



ceph-osd (ceph/15) start/running, process 2122


The fact that only one is output means that upstart believes there's 
only one OSD job running.  Are you sure the other daemons are actually 
alive and started by upstart?



However OSD 12 13 14 15 are all on this server.

sudo stop ceph-osd id=12

gives me stop: Unknown instance: ceph/12

Does anyone know what is wrong? Nothing in logs.

Also, when trying to put the journal on an SSD everything works fine. I
can add all 4 disks per host to the same SSD. The issue is when I
restart the server, only 1 out of the 3 OSDs will come back up. Has
anyone else had this issue?


Are you using partitions on the SSD?  If not, that's obviously going to 
be a problem; the device is usable by only one journal at a time.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pgs stuck unclean

2013-08-12 Thread Samuel Just
Can you attach the output of:

ceph -s
ceph pg dump
ceph osd dump

and run

ceph osd getmap -o /tmp/osdmap

and attach /tmp/osdmap/
-Sam

On Wed, Aug 7, 2013 at 1:58 AM, Howarth, Chris  wrote:
> Hi,
>
> One of our OSD disks failed on a cluster and I replaced it, but when it
> failed it did not completely recover and I have a number of pgs which are
> stuck unclean:
>
>
>
> # ceph health detail
>
> HEALTH_WARN 7 pgs stuck unclean
>
> pg 3.5a is stuck unclean for 335339.172516, current state active, last
> acting [5,4]
>
> pg 3.54 is stuck unclean for 335339.157608, current state active, last
> acting [15,7]
>
> pg 3.55 is stuck unclean for 335339.167154, current state active, last
> acting [16,9]
>
> pg 3.1c is stuck unclean for 335339.174150, current state active, last
> acting [8,16]
>
> pg 3.a is stuck unclean for 335339.177001, current state active, last acting
> [0,8]
>
> pg 3.4 is stuck unclean for 335339.165377, current state active, last acting
> [17,4]
>
> pg 3.5 is stuck unclean for 335339.149507, current state active, last acting
> [2,6]
>
>
>
> Does anyone know how to fix these ? I tried the following, but this does not
> seem to work:
>
>
>
> # ceph pg 3.5 mark_unfound_lost revert
>
> pg has no unfound objects
>
>
>
> thanks
>
>
>
> Chris
>
> __
>
> Chris Howarth
>
> OS Platforms Engineering
>
> Citi Architecture & Technology Engineering
>
> (e) chris.howa...@citi.com
>
> (t) +44 (0) 20 7508 3848
>
> (f) +44 (0) 20 7508 0964
>
> (mail-drop) CGC-06-3A
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] could not generate the bootstrap key

2013-08-12 Thread Samuel Just
Can you give a step by step account of what you did prior to the error?
-Sam

On Tue, Aug 6, 2013 at 10:52 PM, 於秀珠  wrote:
> using the ceph-deploy to manage a existing cluster,i follow the steps in the
> document ,but there is some errors that i can not gather the keys.
> when i run the command "ceph-deploy gatherkeys PS-16",the logs show below:
>
> 2013-08-07 10:14:08,579 ceph_deploy.gatherkeys DEBUG Have
> ceph.client.admin.keyring
> 2013-08-07 10:14:08,579 ceph_deploy.gatherkeys DEBUG Checking PS-16 for
> /var/lib/ceph/mon/ceph-{hostname}/keyring
> 2013-08-07 10:14:08,674 ceph_deploy.gatherkeys DEBUG Got ceph.mon.keyring
> key from PS-16.
> 2013-08-07 10:14:08,674 ceph_deploy.gatherkeys DEBUG Checking PS-16 for
> /var/lib/ceph/bootstrap-osd/ceph.keyring
> 2013-08-07 10:14:08,774 ceph_deploy.gatherkeys WARNING Unable to find
> /var/lib/ceph/bootstrap-osd/ceph.keyring on ['PS-16']
> 2013-08-07 10:14:08,774 ceph_deploy.gatherkeys DEBUG Checking PS-16 for
> /var/lib/ceph/bootstrap-mds/ceph.keyring
> 2013-08-07 10:14:08,874 ceph_deploy.gatherkeys WARNING Unable to find
> /var/lib/ceph/bootstrap-mds/ceph.keyring on ['PS-16']
>
>
> and i try to deploy a new ceph cluster,i meet the same problem,when i create
> the mon ,and then gather the key ,but i also can not gather the bootstrap
> keys,
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

3, 14 and 16 have been down for a while and I'll eventually replace 
those drives (I could do it now)
but didn't want to introduce more variables.

We are using RBD with Proxmox, so I think the answer about kernel 
clients is yes

Jeff

On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote:
> Are you using any kernel clients?  Will osds 3,14,16 be coming back?
> -Sam
> 
> On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow  wrote:
> > Sam,
> >
> > I've attached both files.
> >
> > Thanks!
> > Jeff
> >
> > On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
> >> Can you attach the output of ceph osd tree?
> >>
> >> Also, can you run
> >>
> >> ceph osd getmap -o /tmp/osdmap
> >>
> >> and attach /tmp/osdmap?
> >> -Sam
> >>
> >> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
> >> > Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
> >> > then restarting it, waiting 2 minutes and then doing the next one (all 
> >> > OSD's
> >> > eventually restarted).  I tried this twice.
> >> >
> >> > --
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Ok, your best bet is to remove osds 3,14,16:

ceph auth del osd.3
ceph osd crush rm osd.3
ceph osd rm osd.3

for each of them.  Each osd you remove may cause
some data re balancing, so you should be ready for
that.
-Sam

On Mon, Aug 12, 2013 at 3:01 PM, Jeff Moskow  wrote:
> Sam,
>
> 3, 14 and 16 have been down for a while and I'll eventually replace 
> those drives (I could do it now)
> but didn't want to introduce more variables.
>
> We are using RBD with Proxmox, so I think the answer about kernel 
> clients is yes
>
> Jeff
>
> On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote:
>> Are you using any kernel clients?  Will osds 3,14,16 be coming back?
>> -Sam
>>
>> On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow  wrote:
>> > Sam,
>> >
>> > I've attached both files.
>> >
>> > Thanks!
>> > Jeff
>> >
>> > On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
>> >> Can you attach the output of ceph osd tree?
>> >>
>> >> Also, can you run
>> >>
>> >> ceph osd getmap -o /tmp/osdmap
>> >>
>> >> and attach /tmp/osdmap?
>> >> -Sam
>> >>
>> >> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
>> >> > Thanks for the suggestion.  I had tried stopping each OSD for 30 
>> >> > seconds,
>> >> > then restarting it, waiting 2 minutes and then doing the next one (all 
>> >> > OSD's
>> >> > eventually restarted).  I tried this twice.
>> >> >
>> >> > --
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-12 Thread Joao Eduardo Luis
Following a discussion we had today on #ceph, I've added some extra 
functionality to 'ceph-monstore-tool' to allow copying the data out of a 
store into a new mon store, and can be found on branch wip-monstore-copy.


Using it as

ceph-monstore-tool --mon-store-path  --out  
--command store-copy


with mon-data-dir being the mon data dir where the current monitor lives 
(say, /var/lib/ceph/mon/ceph-a), and mon-data-out being another 
directory.  This last directory should be empty, allowing the tool to 
create a new store, but if a store already exists it will not error out, 
copying instead the keys from the first store to the already existing 
store, so beware!


Also, should bear in mind that you must stop the monitor while doing 
this -- the tool won't work otherwise.


Anyway, this should allow you to grab all your data from the current 
monitor.  You'll be presented with a few stats when the store finishes 
being copied, and hopefully you'll see that the tool didn't copy 220GB 
worth of data -- should be considerably less!


Let me know if this works out for you.

  -Joao

On 07/08/13 15:14, Jeppesen, Nelson wrote:

Joao,

Have you had a chance to look at my monitor issues? I Ran ''ceph-mon -i FOO 
-compact'  last week but it did not improve disk usage.

Let me know if there's anything else I dig up. The monitor still at 0.67-rc2 
with the OSDs at .0.61.7.


On 08/02/2013 12:15 AM, Jeppesen, Nelson wrote:

Thanks for the reply, but how can I fix this without an outage?

I tired adding 'mon compact on start = true' but the monitor just hung. 
Unfortunately this is a production cluster and can't take the outages (I'm 
assuming the cluster will fail without a monitor). I had three monitors I was 
hit with the store.db bug and lost two of the three.

I have tried running with 0.61.5, .0.61.7 and 0.67-rc2. None of them seem to 
shrink the DB.


My guess is that the compaction policies we are enforcing won't cover
the portions of the store that haven't been compacted *prior* to the
upgrade.

Even today we still know of users with stores growing over dozens of
GBs, requiring occasional restarts to compact (which is far from an
acceptable fix).  Some of these stores can take several minutes to
compact when the monitors are restarted, although these guys can often
mitigate any down time by restarting monitors one at a time while
maintaining quorum.  Unfortunately you don't have that luxury. :-\

If however you are willing to manually force a compaction, you should be
able to do so with 'ceph-mon -i FOO --compact'.

Now, there is a possibility this is why you've been unable to add other
monitors to the cluster.  Chances are that the iterators used to
synchronize the store get stuck, or move slowly enough to make all sorts
of funny timeouts to be triggered.

I intend to look into your issue (especially the problems with adding
new monitors) in the morning to better assess what's happening.

-Joao



-Original Message-
From: Mike Dawson [mailto:mike.dawson at cloudapt.com]
Sent: Thursday, August 01, 2013 4:10 PM
To: Jeppesen, Nelson
Cc: ceph-users at lists.ceph.com
Subject: Re: [ceph-users] Why is my mon store.db is 220GB?

220GB is way, way too big. I suspect your monitors need to go through a 
successful leveldb compaction. The early releases of Cuttlefish suffered 
several issues with store.db growing unbounded. Most were fixed by 0.61.5, I 
believe.

You may have luck stoping all Ceph daemons, then starting the monitor by itself. 
When there were bugs, leveldb compaction tended work better without OSD traffic 
hitting the monitors. Also, there are some settings to force a compact on startup 
like 'mon compact on start = true' and mon compact on trim = true". I don't 
think either are required anymore though. See some history here:

http://tracker.ceph.com/issues/4895


Thanks,

Mike Dawson
Co-Founder & Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/1/2013 6:52 PM, Jeppesen, Nelson wrote:

My Mon store.db has been at 220GB for a few months now. Why is this
and how can I fix it? I have one monitor in this cluster and I suspect
that I can't  add monitors to the cluster because it is too big. Thank you.



___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Backup monmap, osdmap, and crushmap

2013-08-12 Thread Craig Lewis
You saved me a bunch of time; I was planning to test my backup and 
restore later today.   Thanks!



It occurred to me that the backups won't be as useful as I thought.  I'd 
need to make sure that the PGs hadn't moved around after the backup was 
made.  If they had, I'd spend a lot of time tracking down the new 
locations and manually rsyncing data.  Not a big deal on a small 
cluster, but it get harder as the cluster gets larger.  Now that I look 
at the dumps, it looks like PG locations are one of the things missing.



Binary backups of the MON directories are fine.  All of the problems 
I've seen on the list occurred during cluster upgrades, so I'll make the 
backup part of my upgrade procedure instead of a cron.



If I wanted to restore a backup, what would be required?  Looking at 
http://eu.ceph.com/docs/v0.47.1/ops/manage/grow/mon/#removing-a-monitor-from-an-unhealthy-or-down-cluster, 
I don't see a monmap directory inside /var/lib/ceph/mon/ceph-#/ 
anymore.  I assume it went away during the switch to LevelDB, so I think 
I'll need to dump a copy when I make the binary backup.


I'm assuming there is some node specific data in each MON's store.  If 
not, could I just stop all monitors, rsync the backup to all monitors, 
and start them all up?


I'm not using cephx, but I assume the keyrings would complicate things.  
It would probably be easiest to make binary backups on all of the 
monitors (with some time delay, so only one is offline at a time), then 
start them up in newest-to-oldest-backup order. Or I could use LVM 
snapshots to make simultaneous backups on all monitors.



Thanks for the info.



*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 8/12/13 14:39 , Joao Eduardo Luis wrote:

On 08/08/13 15:21, Craig Lewis wrote:

I've seen a couple posts here about broken clusters that had to repair
by modifing the monmap, osdmap, or the crush rules.

The old school sysadmin in me says it would be a good idea to make
backups of these 3 databases.  So far though, it seems like everybody
was able to repair their clusters by dumping the current map and
modifying it.

I'll probably do it, just to assuage my paranoia, but I was wondering
what you guys thought.


Well, this could get you *some* infos, but you wouldn't be able to 
reconstruct a monitor this way.  There's just way too many maps that 
you'd need to reconstruct the monitor.



The not-so-best approach would be to grab all map epochs, from 1 to 
the map's current epoch.  We don't currently have a way to expose to 
the user what is the first available map epoch in the store (the need 
for it never came up), so for now you'd have to start at 1 and 
increment it until you'd find an existing version (we trim old 
versions, so that could be at 1, 10k, or a few hundred thousands, 
depending on how many maps you have).  With all that information, you 
could somehow reconstruct a monitor with some effort -- and even so, 
we currently only expose an interface to obtain maps for some services 
such as the mon, osd, pg and mds;  we have a bunch of other versions 
kept in the monitor that are not currently exposed to the user.



This is something we definitely want to improve on, but as of this 
moment the best approach to backup monitors reliably would be to stop 
the monitor, copy the store, and restart the monitor. Assuming you 
have 3+ monitors, stopping just one of them wouldn't affect the quorum 
or cluster availability.  And assuming you're backing up a monitor 
that is in the quorum, then backing it up is as good as backing any 
other monitor.



Hope this helps.


  -Joao





I'm thinking of cronning this on the MON servers:
#!/usr/bin/env bash

# Number of days to keep backups
cleanup_age="10"

# Fetch the current timestamp, to use in the backup filenames
date=$(date +"%Y-%m-%dT%H:%M:%S")

# Dump the current maps
cd /var/lib/ceph/backups/
ceph mon getmap -o ./monmap.${date}
ceph osd getmap -o ./osdmap.${date}
ceph osd getcrushmap -o ./crushmap.${date}

# Delete old maps
find . -type f -regextype posix-extended -regex
'\./(mon|osd|crush)map\..*' -mtime +${cleanup_age} -print0 | xargs -0 rm




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd map issues: no such file or directory (ENOENT) AND map wrong image

2013-08-12 Thread Josh Durgin

On 08/12/2013 10:19 AM, PJ wrote:

Hi All,

Before go on the issue description, here is our hardware configurations:
- Physical machine * 3: each has quad-core CPU * 2, 64+ GB RAM, HDD * 12
(500GB ~ 1TB per drive; 1 for system, 11 for OSD). ceph OSD are on
physical machines.
- Each physical machine runs 5 virtual machines. One VM as ceph MON
(i.e. totally 3 MONs), the other 4 VMs provides either iSCSI or FTP/NFS
service
- Physical machines and virtual machines are based on the same software
condition: Ubuntu 12.04 + kernel 3.6.11, ceph v0.61.7


The issues we met are,

1. Right after ceph installation, create pool then create image and map
is no problem. But if we do not use the whole environment more than half
day, do the same process (create pool -> create image -> map image) will
return error: no such file or directory (ENOENT). Once the issue occurs,
it could be easily reproduce by the same process. But this issue may be
disappear if wait 10+ minutes after pool creation. Reboot system also
could avoid it.


This sounds similar to http://tracker.ceph.com/issues/5925 - and
your case suggests it may be a monitor bug, since that test is userspace
and you're using the kernel client. Could you reproduce
this with logs from your monitors from the time of pool creation to
after the map fails with ENOENT, and these log settings on all mons:

debug ms = 1
debug mon = 20
debug paxos = 10

If you could attach those logs to the bug or otherwise make them
available that'd be great.


I had success and failed straces logged on the same virtual machine (the
one provides FTP/NFS):
success: https://www.dropbox.com/s/u8jc4umak24kr1y/rbd_done.txt
failed: https://www.dropbox.com/s/ycuupmmrlc4d0ht/rbd_failed.txt


Unfortunately these won't tell us much since the kernel is doing all the
work with rbd map.


2. The second issue is to create two images (AAA and BBB) under one pool
(xxx), if we map "rbd -p xxx image AAA", the result is success but it
shows BBB under /dev/rbd/xxx/. Use "rbd showmapped", it shows "AAA" of
pool xxx is mapped. I am not sure which one is really mapped because
both images are empty. This issue is hard to reproduce but once happens
/dev/rbd/ are mess-up.


That sounds very strange, since 'rbd showmapped' and the udev rule that
creates the /dev/rbd/pool/image symlinks use the same data source -
/sys/bus/rbd/N/name. This sounds like a race condition where sysfs is
being read (and reading stale memory) before the kernel finishes
populating it. Could you file this in the tracker? Checking whether
it still occurs in linux 3.10 would be great too. It doesn't seem
possible with the current code.


One more question but not about rbd map issues. Our usage is to map one
rbd device and mount in several places (in one virtual machine) for
iSCSI, FTP and NFS, does that cause any problem to ceph operation?


If it's read-only everywhere, it's fine, but otherwise you'll run into
problems unless you've got something on top of rbd managing access to
it, like ocfs2. You could use nfs on top of one rbd device, but having
multiple nfs servers on top of the same rbd device won't work unless
they can coordinate with each other. The same applies to iscsi and ftp.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [list admin] - membership disabled due to bounces

2013-08-12 Thread Dan Mick
Do I understand you to mean, James, that you bounce spam messages back 
to the sender, even if the sender is a listserv?  That seems like a 
really bad idea, punishing the innocent at best, and causing problems 
like this at worst.


IMO the best spam strategy is to drop it as early as possible at the 
receiver (ideally while the SMTP connection is still open, before any 
queueing).


We can look at making spam rejection better for ceph-users, but I wanted 
to understand whether you and Alex are actually bouncing things back to 
ceph-users, and if so, ask about why.


On 08/11/2013 03:26 AM, James Harper wrote:

This list actually does get a bit of spam, unlike most lists I'm subscribed to. 
I'm surprised more reputation filters haven't blocked it. Rejecting spam is the 
only right way to do it (junk mail folders are dumb), but obviously the 
ceph-users list is taking the bounces as indicating a problem with your account.

Probably the only thing to do is to white list the address and put up with the 
spam.

James


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Alex Bligh
Sent: Sunday, 11 August 2013 6:43 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] [list admin] - membership disabled due to bounces

This is the third of these I've got in a month. I get them on no
other mailing lists and I'm on quite a lot of lists. I have an
outsourced spam thing which is generally well behaved but will
reject spam. Any ideas why this might be?

Alex

Begin forwarded message:


From: ceph-users-requ...@lists.ceph.com
Date: 11 August 2013 08:25:30 GMT+01:00
To: a...@alex.org.uk
Subject: confirm [REDACTED]

Your membership in the mailing list ceph-users has been disabled due
to excessive bounces The last bounce received from you was dated
11-Aug-2013.  You will not get any more messages from this list until
you re-enable your membership.  You will receive 3 more reminders like
this before your membership in the list is deleted.

To re-enable your membership, you can simply respond to this message
(leaving the Subject: line intact), or visit the confirmation page at

http://lists.ceph.com/confirm.cgi/ceph-users-ceph.com/[REDACTED]


You can also visit your membership page at

http://lists.ceph.com/options.cgi/ceph-users-ceph.com/[REDACTED]


On your membership page, you can change various delivery options such
as your email address and whether you get digests or not.  As a
reminder, your membership password is

[REDACTED]

If you have any questions or problems, you can contact the list owner
at

ceph-users-ow...@lists.ceph.com




--
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph instead of RAID

2013-08-12 Thread Dmitry Postrigan
Hello community,

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?

Another thing. I am using ceph-deploy and I have noticed that when I do this:

ceph-deploy --verbose  new localhost

the ceph.conf file is created in the current folder instead of /etc. Is this 
normal?

Also, in the ceph.conf there's a line:
mon host = ::1
Is this normal or I need to change this to point to localhost?

Thanks for any feedback on this.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stuck unclean since forever, current state active+remapped

2013-08-12 Thread 不坏阿峰
i got PGs stuck long time.   do not how to fix it.  can some person help to
check?

Environment: Debian 7 + ceph 0.617


root@ceph-admin:~# ceph -s
   health HEALTH_WARN 6 pgs stuck unclean
   monmap e2: 2 mons at {a=192.168.250.15:6789/0,b=192.168.250.8:6789/0},
election epoch 8, quorum 0,1 a,b
   osdmap e159: 4 osds: 4 up, 4 in
pgmap v23487: 584 pgs: 578 active+clean, 6 active+remapped; 4513 MB
data, 12658 MB used, 387 GB / 399 GB avail; 426B/s wr, 0op/s
   mdsmap e114: 1/1/1 up {0=a=up:active}, 1 up:standby

--
root@ceph-admin:~# ceph health detail
HEALTH_WARN 6 pgs stuck unclean
pg 0.50 is stuck unclean since forever, current state active+remapped, last
acting [3,1]
pg 1.4f is stuck unclean since forever, current state active+remapped, last
acting [3,1]
pg 2.4e is stuck unclean since forever, current state active+remapped, last
acting [3,1]
pg 1.8a is stuck unclean since forever, current state active+remapped, last
acting [2,1]
pg 0.8b is stuck unclean since forever, current state active+remapped, last
acting [2,1]
pg 2.89 is stuck unclean since forever, current state active+remapped, last
acting [2,1]
--
root@ceph-admin:~# ceph osd tree

# idweight  type name   up/down reweight
-1  4   root default
-3  2rack unknownrack
-2  2   host ceph-admin
0   1   osd.0   up  1
1   1   osd.1   up  1
-4  1host ceph-node02
2   1   osd.2   down1
-5  1host ceph-node01
3   1   osd.3   up  1
---
root@ceph-admin:~# ceph osd dump

epoch 159
fsid db32486a-7ad3-4afe-8b67-49ee2a6dcecf
created 2013-08-08 13:45:52.579015
modified 2013-08-12 05:18:37.895385
flags

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 192 pgp_num 192 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 1 owner 0
pool 3 'volumes' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 39 owner 18446744073709551615

max_osd 5
osd.0 up   in  weight 1 up_from 138 up_thru 157 down_at 137
last_clean_interval [45,135) 192.168.250.15:6803/5735
192.168.250.15:6804/5735 192.168.250.15:6805/5735 exists,up
99f2aec0-2367-4b68-86f2-58d6d41589c6
osd.1 up   in  weight 1 up_from 140 up_thru 157 down_at 137
last_clean_interval [47,136) 192.168.250.15:6806/6882
192.168.250.15:6807/6882 192.168.250.15:6808/6882 exists,up
d458ca35-ec55-47a9-a7ce-47b9ddf4d889
osd.2 up   in  weight 1 up_from 157 up_thru 158 down_at 135
last_clean_interval [48,134) 192.168.250.8:6800/3564 192.168.250.8:6801/3564
192.168.250.8:6802/3564 exists,up c4ee9f05-bd5f-4536-8cb8-0af82c00d3d6
osd.3 up   in  weight 1 up_from 143 up_thru 157 down_at 141
last_clean_interval [53,141) 192.168.250.16:6802/14618
192.168.250.16:6804/14618 192.168.250.16:6805/14618 exists,up
e9d67b85-97d1-4635-95c8-f7c50cd7f6b1

pg_temp 0.50 [3,1]
pg_temp 0.8b [2,1]
pg_temp 1.4f [3,1]
pg_temp 1.8a [2,1]
pg_temp 2.4e [3,1]
pg_temp 2.89 [2,1]
--
root@ceph-admin:/etc/ceph# crushtool -d /tmp/crushmap
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host ceph-admin {
id -2   # do not change unnecessarily
# weight 2.000
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.000
item osd.1 weight 1.000
}
rack unknownrack {
id -3   # do not change unnecessarily
# weight 2.000
alg straw
hash 0  # rjenkins1
item ceph-admin weight 2.000
}
host ceph-node02 {
id -4   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.000
}
host ceph-node01 {
id -5   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.3 weight 1.000
}
root default {
id -1   # do not change unnecessarily
# weight 4.000
alg straw
hash 0  # rjenkins1
item unknownrack weight 2.000
item ceph-node02 weight 1.000
item ceph-node01 weight 1.000
}

# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step choose firstn 0 type osd
step emit
}
rule volumes {
ruleset 3
type replicated
min_size 1
max_size 10
step take default
step choose firstn 0 type osd
step emit
}
rule metadata {
ruleset 1
type replicated
min_size 

Re: [ceph-users] Ceph instead of RAID

2013-08-12 Thread Dan Mick



On 08/12/2013 06:49 PM, Dmitry Postrigan wrote:

Hello community,

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?


I mean, you can certainly do that.  1 mon and all OSDs on one server is 
not particularly fault-tolerant, perhaps, but if you have multiple such 
servers in the cluster, sure, why not?



Another thing. I am using ceph-deploy and I have noticed that when I do this:

 ceph-deploy --verbose  new localhost

the ceph.conf file is created in the current folder instead of /etc. Is this 
normal?


Yes.  ceph-deploy also distributes ceph.conf where it needs to go.


Also, in the ceph.conf there's a line:
 mon host = ::1
Is this normal or I need to change this to point to localhost?



You want to configure the machines such that they have resolvable 'real' 
IP addresses:


http://ceph.com/docs/master/start/quick-start-preflight/#hostname-resolution



Thanks for any feedback on this.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd map issues: no such file or directory (ENOENT) AND map wrong image

2013-08-12 Thread Josh Durgin

[re-adding ceph-users so others can benefit from the archives]

On 08/12/2013 07:18 PM, PJ wrote:

2013/8/13 Josh Durgin :

On 08/12/2013 10:19 AM, PJ wrote:


Hi All,

Before go on the issue description, here is our hardware configurations:
- Physical machine * 3: each has quad-core CPU * 2, 64+ GB RAM, HDD * 12
(500GB ~ 1TB per drive; 1 for system, 11 for OSD). ceph OSD are on
physical machines.
- Each physical machine runs 5 virtual machines. One VM as ceph MON
(i.e. totally 3 MONs), the other 4 VMs provides either iSCSI or FTP/NFS
service
- Physical machines and virtual machines are based on the same software
condition: Ubuntu 12.04 + kernel 3.6.11, ceph v0.61.7


The issues we met are,

1. Right after ceph installation, create pool then create image and map
is no problem. But if we do not use the whole environment more than half
day, do the same process (create pool -> create image -> map image) will
return error: no such file or directory (ENOENT). Once the issue occurs,
it could be easily reproduce by the same process. But this issue may be
disappear if wait 10+ minutes after pool creation. Reboot system also
could avoid it.



This sounds similar to http://tracker.ceph.com/issues/5925 - and
your case suggests it may be a monitor bug, since that test is userspace
and you're using the kernel client. Could you reproduce
this with logs from your monitors from the time of pool creation to
after the map fails with ENOENT, and these log settings on all mons:

debug ms = 1
debug mon = 20
debug paxos = 10

If you could attach those logs to the bug or otherwise make them
available that'd be great.



We will add these settings to gather the log. By the way, we try to
avoid this issue by using the default pool (rbd) only. Will it be
useful in this case?


No, the case I'm interested in is when the 'rbd map' fails because
there's a new pool.




I had success and failed straces logged on the same virtual machine (the
one provides FTP/NFS):
success: https://www.dropbox.com/s/u8jc4umak24kr1y/rbd_done.txt
failed: https://www.dropbox.com/s/ycuupmmrlc4d0ht/rbd_failed.txt



Unfortunately these won't tell us much since the kernel is doing all the
work with rbd map.



2. The second issue is to create two images (AAA and BBB) under one pool
(xxx), if we map "rbd -p xxx image AAA", the result is success but it
shows BBB under /dev/rbd/xxx/. Use "rbd showmapped", it shows "AAA" of
pool xxx is mapped. I am not sure which one is really mapped because
both images are empty. This issue is hard to reproduce but once happens
/dev/rbd/ are mess-up.



That sounds very strange, since 'rbd showmapped' and the udev rule that
creates the /dev/rbd/pool/image symlinks use the same data source -
/sys/bus/rbd/N/name. This sounds like a race condition where sysfs is
being read (and reading stale memory) before the kernel finishes
populating it. Could you file this in the tracker?


I will file to tracker.


Checking whether it still occurs in linux 3.10 would be great too. It doesn't 
seem
possible with the current code.



Current code means Linux kernel 3.10 or 3.6?


Current code in 3.10 doesn't look like this issue is possible, unless
I'm missing something. There's been a lot of refactoring since 3.6
though, so it's possible the bug was fixed accidentally.


One more question but not about rbd map issues. Our usage is to map one
rbd device and mount in several places (in one virtual machine) for
iSCSI, FTP and NFS, does that cause any problem to ceph operation?



If it's read-only everywhere, it's fine, but otherwise you'll run into
problems unless you've got something on top of rbd managing access to
it, like ocfs2. You could use nfs on top of one rbd device, but having
multiple nfs servers on top of the same rbd device won't work unless
they can coordinate with each other. The same applies to iscsi and ftp.



If the target rbd device only map on one virtual machine, format it as
ext4 and mount to two places
   mount /dev/rbd0 /nfs --> for nfs server usage
   mount /dev/rbd0 /ftp  --> for ftp server usage
nfs and ftp servers run on the same virtual machine. Will file system
(ext4) help to handle the simultaneous access from nfs and ftp?


I doubt that'll work perfectly on a normal disk, although rbd should
behave the same in this case. Consider what happens when to be some
issues when the same files are modified at once by the ftp and nfs
servers. You could run ftp on an nfs client on a different machine
safely.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-12 Thread Dmitry Postrigan

> On 08/12/2013 06:49 PM, Dmitry Postrigan wrote:
>> Hello community,
>>
>> I am currently installing some backup servers with 6x3TB drives in them. I 
>> played with RAID-10 but I was not
>> impressed at all with how it performs during a recovery.
>>
>> Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
>> local, so I could simply create
>> 6 local OSDs + a monitor, right? Is there anything I need to watch out for 
>> in such configuration?

> I mean, you can certainly do that.  1 mon and all OSDs on one server is 
> not particularly fault-tolerant, perhaps, but if you have multiple such 
> servers in the cluster, sure, why not?

This will be a single server configuration, the goal is to replace mdraid, 
hence I tried to use localhost
(nothing more will be added to the cluster). Are you saying it will be less 
fault tolerant than a RAID-10?

>> Another thing. I am using ceph-deploy and I have noticed that when I do this:
>>
>>  ceph-deploy --verbose  new localhost
>>
>> the ceph.conf file is created in the current folder instead of /etc. Is this 
>> normal?

> Yes.  ceph-deploy also distributes ceph.conf where it needs to go.

Hmm. Here's what I do:

ceph-deploy --verbose  new ***

Creating new cluster named ceph
Resolving host ***
Monitor *** at ***
Monitor initial members are ['***']
Monitor addrs are ['***']
Creating a random mon key...
Writing initial config to ceph.conf...
Writing monitor keyring to ceph.conf...

however, /etc/ceph.conf does not exist. There is an empty folder /etc/ceph, but 
that's it. ceph.conf only
exists in the current folder where I ran ceph-deploy. The hostname I specified 
is of the same server I ran
this on.

Should I manually copy ceph.conf to /etc?

Not sure if it matters, my OS is a fresh installation of centos-6 64-bit 
(3.10.5-1.el6.elrepo.x86_64).


>> Also, in the ceph.conf there's a line:
>>  mon host = ::1
>> Is this normal or I need to change this to point to localhost?


> You want to configure the machines such that they have resolvable 'real' 
> IP addresses:

> http://ceph.com/docs/master/start/quick-start-preflight/#hostname-resolution

Thank you. Looks like I would have to use the host name and then use iptables 
to prevent connections from
outside.



>>
>> Thanks for any feedback on this.
>>
>> Dmitry
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>


Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-12 Thread Jeppesen, Nelson
Joao, 

(log file uploaded to http://pastebin.com/Ufrxn6fZ)

I had some good luck and some bad luck. I copied the store.db to a new monitor, 
injected a modified monmap and started it up (This is all on the same host.) 
Very quickly it reached quorum (as far as I can tell) but didn't respond. 
Running 'ceph -w' just hung, no timeouts or errors. Same thing when restarting 
an OSD.

The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph -w' 
attempts.

I restarted everything again and it sat there synchronizing. IO stat reported 
about 100MB/s, but just reads. I let it sit there for 7 min but nothing 
happened.

Side question, how long can a ceph cluster run without a monitor? I was able to 
upload files via rados gateway without issue even when the monitor was down.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-12 Thread Sage Weil
On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
> Joao, 
> 
> (log file uploaded to http://pastebin.com/Ufrxn6fZ)
> 
> I had some good luck and some bad luck. I copied the store.db to a new 
> monitor, injected a modified monmap and started it up (This is all on the 
> same host.) Very quickly it reached quorum (as far as I can tell) but didn't 
> respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when 
> restarting an OSD.
> 
> The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
> -w' attempts.
> 
> I restarted everything again and it sat there synchronizing. IO stat reported 
> about 100MB/s, but just reads. I let it sit there for 7 min but nothing 
> happened.

Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as 
though the main dispatch thread is blocked (7f71a1aa5700 does nothing 
after winning the election).  It would also be helpful to gdb attach to 
the running ceph-mon and capture the output from 'thread apply all bt'.

> Side question, how long can a ceph cluster run without a monitor? I was 
> able to upload files via rados gateway without issue even when the 
> monitor was down.

Quite a while, as long as no new processes need to authenticate, and no 
nodes go up or down.  Eventually the authentication keys are going to time 
out, though (1 hour is the default).

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How do you replace an OSD?

2013-08-12 Thread Dmitry Postrigan

I just got my small Ceph cluster running. I run 6 OSDs on the same server to 
basically replace mdraid.

I have tried to simulate a hard drive (OSD) failure: removed the OSD 
(out+stop), zapped it, and then
prepared and activated it. It worked, but I ended up with one extra OSD (and 
the old one still showing in the ceph -w output).
I guess this is not how I am supposed to do it?

Documentation recommends manually editing the configuration, however, there are 
no osd entries in my /etc/ceph/ceph.conf

So what would be the best way to replace a failed OSD?

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com