Re: [ceph-users] No monitor sockets after upgrading to Emperor

2013-11-12 Thread Joao Luis
On Nov 12, 2013 2:38 AM, "Berant Lemmenes"  wrote:
>
> I noticed the same behavior on my dumpling cluster. They wouldn't show up
after boot, but after a service restart they were there.
>
> I haven't tested a node reboot since I upgraded to emperor today. I'll
give it a shot tomorrow.
>
> Thanks,
> Berant
>
> On Nov 11, 2013 9:29 PM, "Peter Matulis" 
wrote:
>>
>> After upgrading from Dumpling to Emperor on Ubuntu 12.04 I noticed the
>> admin sockets for each of my monitors were missing although the cluster
>> seemed to continue running fine.  There wasn't anything under
>> /var/run/ceph.  After restarting the service on each monitor node they
>> reappeared.  Anyone?
>>
>> ~pmatulis
>>

Odd behavior. The monitors do remove the admin socket on shutdown and
proceed to create it when they start, but as long as they are running it
should exist. Have you checked the logs for some error message that could
provide more insight on the cause?

  -Joao
___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to use rados_exec

2013-11-12 Thread
 Hi all!
   long time no see!
   I  want to use the function rados_exec,  and I found  the class 
cls_crypto.cc  in the  code source of ceph;
so I  run the funtion like this:

   rados_exec(ioctx, "foo_object", "crypto" , "md5", buf, sizeof(buf),buf2, 
sizeof(buf2) )

ant the function return   operation not support!

 I check the source of ceph , and find that  cls_crypto.cc is not build。how 
can I bulid the class and run it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy: osd creating hung with one ssd disk as shared journal

2013-11-12 Thread Tim Zhang
Hi guys,
I use ceph-deploy to manage my cluster, but I get failed while creating the
OSD, the process seems to hang up at creating first osd. By the way,
SELinux is disabled, and my ceph-disk is patched according to the page:
http://www.spinics.net/lists/ceph-users/msg03258.html
can you guys give me some advise?
(1) the output of ceph-deploy is:
Invoked (1.3.1): /usr/bin/ceph-deploy osd create ceph0:sdb:sda
ceph0:sdd:sda ceph0:sde:sda ceph0:sdf:sda ceph0:sdg:sda ceph0:sdh:sda
ceph1:sdb:sda ceph1:sdd:sda ceph1:sde:sda ceph1:sdf:sda ceph1:sdg:sda
ceph1:sdh:sda ceph2:sdb:sda ceph2:sdd:sda ceph2:sde:sda ceph2:sdf:sda
ceph2:sdg:sda ceph2:sdh:sda
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
ceph0:/dev/sdb:/dev/sda ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda
ceph0:/dev/sdf:/dev/sda ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda
ceph1:/dev/sdb:/dev/sda ceph1:/dev/sdd:/dev/sda ceph1:/dev/sde:/dev/sda
ceph1:/dev/sdf:/dev/sda ceph1:/dev/sdg:/dev/sda ceph1:/dev/sdh:/dev/sda
ceph2:/dev/sdb:/dev/sda ceph2:/dev/sdd:/dev/sda ceph2:/dev/sde:/dev/sda
ceph2:/dev/sdf:/dev/sda ceph2:/dev/sdg:/dev/sda ceph2:/dev/sdh:/dev/sda
[ceph0][DEBUG ] connected to host: ceph0
[ceph0][DEBUG ] detect platform information from remote host
[ceph0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph0
[ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdb journal
/dev/sda activate True
[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster
ceph -- /dev/sdb /dev/sda
[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal
is not the same device as the osd data
[ceph0][ERROR ] Warning: WARNING: the kernel failed to re-read the
partition table on /dev/sda (Device or resource busy).  As a result, it may
not reflect all of your changes until after reboot.
[ceph0][ERROR ] BLKPG: Device or resource busy
[ceph0][ERROR ] error adding partition 1
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][DEBUG ] meta-data=/dev/sdb1  isize=2048   agcount=4,
agsize=61047597 blks
[ceph0][DEBUG ]  =   sectsz=512   attr=2,
projid32bit=0
[ceph0][DEBUG ] data =   bsize=4096
blocks=244190385, imaxpct=25
[ceph0][DEBUG ]  =   sunit=0  swidth=0 blks
[ceph0][DEBUG ] naming   =version 2  bsize=4096   ascii-ci=0
[ceph0][DEBUG ] log  =internal log   bsize=4096
blocks=119233, version=2
[ceph0][DEBUG ]  =   sectsz=512   sunit=0 blks,
lazy-count=1
[ceph0][DEBUG ] realtime =none   extsz=4096   blocks=0,
rtextents=0
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block
--action=add
[ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use.
[ceph0][DEBUG ] connected to host: ceph0
[ceph0][DEBUG ] detect platform information from remote host
[ceph0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdd journal
/dev/sda activate True
[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster
ceph -- /dev/sdd /dev/sda
[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal
is not the same device as the osd data

2 the mount system for that osd shows:
[root@host ~]# mount -l
/dev/sdc1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/sdb1 on /var/lib/ceph/tmp/mnt.6D02EM type xfs (rw,noatime)

3 my testbed information is:
os: centos 6.4 Final
ceph: dumpling 67.4
three hosts: ceph0 ceph1 ceph2
each host have 3 disk sharing one ssd disk as journal

4 my ceph config is as this:
osd journal size = 9500
;osd mkfs type = xfs
;auth supported = none
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
public_network = 172.18.11.0/24
cluster_network = 10.10.11.0/24
osd pool default size = 3
ms nocrc = true
osd op threads = 4
filestore op threads = 0
mon sync fs threshold = 0
osd pool default pg num = 100
osd pool default pgp num = 100

5 the output of ceph0 running command: pe -ef|grep ceph
[root@ceph0 ~]# ps -ef|grep ceph
root 13922 1  0 05:59 ?00:00:00 /bin/sh
/usr/sbin/ceph-disk-udev 1 sdb1 sdb
root 14059 13922  0 05:59 ?00:00:00 python /usr/sbin/ceph-disk
-v activate /dev/sdb1
root 14090 1  0 05:59 ?00:00:00 /bin/sh
/usr/sbin/ceph-disk-udev 1 sda1 sda
root 14107 14090  0 05:59 ?00:00:00 python /usr/sbin/ceph

[ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-12 Thread Mihály Árva-Tóth
Hello,

I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with 3
replica. One of my HDD (osd.0) has just bad sectors, when I try to read an
object from OSD direct, I get Input/output errror. dmesg:

[1214525.670065] mpt2sas0: log_info(0x3108): originator(PL),
code(0x08), sub_code(0x)
[1214525.670072] mpt2sas0: log_info(0x3108): originator(PL),
code(0x08), sub_code(0x)
[1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code
[1214525.670104] sd 0:0:2:0: [sdc]
[1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[1214525.670110] sd 0:0:2:0: [sdc]
[1214525.670112] Sense Key : Medium Error [current]
[1214525.670117] Info fld=0x60c8f21
[1214525.670120] sd 0:0:2:0: [sdc]
[1214525.670123] Add. Sense: Unrecovered read error
[1214525.670126] sd 0:0:2:0: [sdc] CDB:
[1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00 00

Okay I known need to replace HDD.

Fragment of ceph -s  output:
  pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent;

ceph pg dump | grep inconsistent

11.15d  25443   0   0   0   6185091790  30013001
active+clean+inconsistent   2013-11-06 02:30:45.23416.

ceph pg map 11.15d

osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3]

pg repair or deep-scrub can not fix this issue. But if I understand
correctly, osd has to known it can not retrieve object from osd.0 and need
to be replicate an another osd because there is no 3 working replicas now.

Thank you,
Mihaly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use rados_exec

2013-11-12 Thread Noah Watkins
The cls_crypto.cc file in src/ hasn't been included in the Ceph
compilation for a long time. Take a look at src/cls/* for a list of
modules that are compiled. In particular, there is a "Hello World"
example that is nice. These should work for you out-of-the-box.

You could also try to compile cls_crypto.cc (follow the basic
structure of src/cls/Makefile.am).

-Noah

On Tue, Nov 12, 2013 at 1:05 AM, 鹏  wrote:
>  Hi all!
>long time no see!
>I  want to use the function rados_exec,  and I found  the class
> cls_crypto.cc  in the  code source of ceph;
> so I  run the funtion like this:
>
>rados_exec(ioctx, "foo_object", "crypto" , "md5", buf,
> sizeof(buf),buf2, sizeof(buf2) )
>
> ant the function return   operation not support!
>
>  I check the source of ceph , and find that  cls_crypto.cc is not
> build。how can I bulid the class and run it!
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No monitor sockets after upgrading to Emperor

2013-11-12 Thread Berant Lemmenes
I just restarted an OSD node and none of the admin sockets showed up on
reboot (though it joined the cluster fine and all OSDs are happy. The node
is a Ubuntu 12.04.3 system originally deployed via ceph-deploy on dumpling.

The only thing that stands out to me is the failure on lock_fsid and the
error converting store message.

Here are the snip from OSD 19 of a full reboot starting with the shutdown
complete entry, and going until all the reconnect messages.

2013-11-12 09:44:00.757576 7fb8a8e24780  1 --
192.168.200.54:6819/23261shutdown complete.
2013-11-12 09:47:05.843425 7f7918e9d780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 1734
2013-11-12 09:47:05.892704 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:05.892718 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:47:05.944312 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:47:05.944327 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:05.944743 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:06.258005 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.567405 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570098 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570352 7f7918e9d780  1 journal close
/var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:47:07.571215 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:07.572742 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:47:07.572750 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:07.573234 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:07.574879 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.577043 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.578649 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.680531 7f7918e9d780  0 
cls/hello/cls_hello.cc:271: loading cls_hello
2013-11-12 09:47:09.670813 7f8151b5f780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769
2013-11-12 09:47:09.673789 7f8151b5f780  0
filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock
/var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11)
Resource temporarily unavailable
2013-11-12 09:47:09.673804 7f8151b5f780 -1
filestore(/var/lib/ceph/osd/ceph-19) FileStore::mount: lock_fsid failed
2013-11-12 09:47:09.673919 7f8151b5f780 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-19: (16) Device or resource busy
2013-11-12 09:47:14.169305 7f78fd548700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.51:6800/13263 pipe(0x1e48c80 sd=42 :55275 s=2 pgs=5530 cs=1 l=0
c=0x1eae2c0).fault, initiating reconnect
2013-11-12 09:47:14.169444 7f78fd346700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.57:6804/8226 pipe(0xc1ed500 sd=43 :47978 s=2 pgs=16845 cs=1 l=0
c=0x1eae840).fault, initiating reconnect
2013-11-12 09:47:14.169988 7f78fd144700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.59:6810/4862 pipe(0xc1ed280 sd=46 :37094 s=2 pgs=42297 cs=1 l=0
c=0x1eae6e0).fault, initiating reconnect


And here is roughly the same snip from just doing a 'sudo restart
ceph-osd-all':

2013-11-12 09:56:36.658014 7f7918e9d780  1 --
192.168.200.54:6811/1734shutdown complete.
2013-11-12 09:56:37.556988 7f3793c21780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 13723
2013-11-12 09:56:37.559314 7f3793c21780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:56:37.559319 7f3793c21780  1
filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:56:37.561350 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features

Re: [ceph-users] ceph-deploy: osd creating hung with one ssd disk as shared journal

2013-11-12 Thread Gruher, Joseph R
I didn't think you could specify the journal in this manner (just pointing 
multiple OSDs on the same host all to journal /dev/sda).  Don't you either need 
to partition the SSD and point each SSD to a separate partition, or format and 
mount the SSD and each OSD will use a unique file on the mount?  I've always 
created a separate partition on the SSD for each journal.

Preparing cluster ceph disks ceph0:/dev/sdb:/dev/sda ceph0:/dev/sdd:/dev/sda 
ceph0:/dev/sde:/dev/sda ceph0:/dev/sdf:/dev/sda ceph0:/dev/sdg:/dev/sda 
ceph0:/dev/sdh:/dev/sda

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tim Zhang
Sent: Tuesday, November 12, 2013 2:20 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph-deploy: osd creating hung with one ssd disk as 
shared journal

Hi guys,
I use ceph-deploy to manage my cluster, but I get failed while creating the 
OSD, the process seems to hang up at creating first osd. By the way, SELinux is 
disabled, and my ceph-disk is patched according to the 
page:http://www.spinics.net/lists/ceph-users/msg03258.html
can you guys give me some advise?
(1) the output of ceph-deploy is:
Invoked (1.3.1): /usr/bin/ceph-deploy osd create ceph0:sdb:sda ceph0:sdd:sda 
ceph0:sde:sda ceph0:sdf:sda ceph0:sdg:sda ceph0:sdh:sda ceph1:sdb:sda 
ceph1:sdd:sda ceph1:sde:sda ceph1:sdf:sda ceph1:sdg:sda ceph1:sdh:sda 
ceph2:sdb:sda ceph2:sdd:sda ceph2:sde:sda ceph2:sdf:sda ceph2:sdg:sda 
ceph2:sdh:sda
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph0:/dev/sdb:/dev/sda 
ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda ceph0:/dev/sdf:/dev/sda 
ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda ceph1:/dev/sdb:/dev/sda 
ceph1:/dev/sdd:/dev/sda ceph1:/dev/sde:/dev/sda ceph1:/dev/sdf:/dev/sda 
ceph1:/dev/sdg:/dev/sda ceph1:/dev/sdh:/dev/sda ceph2:/dev/sdb:/dev/sda 
ceph2:/dev/sdd:/dev/sda ceph2:/dev/sde:/dev/sda ceph2:/dev/sdf:/dev/sda 
ceph2:/dev/sdg:/dev/sda ceph2:/dev/sdh:/dev/sda
[ceph0][DEBUG ] connected to host: ceph0
[ceph0][DEBUG ] detect platform information from remote host
[ceph0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph0
[ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdb journal /dev/sda 
activate True
[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph 
-- /dev/sdb /dev/sda
[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal is 
not the same device as the osd data
[ceph0][ERROR ] Warning: WARNING: the kernel failed to re-read the partition 
table on /dev/sda (Device or resource busy).  As a result, it may not reflect 
all of your changes until after reboot.
[ceph0][ERROR ] BLKPG: Device or resource busy
[ceph0][ERROR ] error adding partition 1
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][DEBUG ] meta-data=/dev/sdb1  isize=2048   agcount=4, 
agsize=61047597 blks
[ceph0][DEBUG ]  =   sectsz=512   attr=2, 
projid32bit=0
[ceph0][DEBUG ] data =   bsize=4096   blocks=244190385, 
imaxpct=25
[ceph0][DEBUG ]  =   sunit=0  swidth=0 blks
[ceph0][DEBUG ] naming   =version 2  bsize=4096   ascii-ci=0
[ceph0][DEBUG ] log  =internal log   bsize=4096   blocks=119233, 
version=2
[ceph0][DEBUG ]  =   sectsz=512   sunit=0 blks, 
lazy-count=1
[ceph0][DEBUG ] realtime =none   extsz=4096   blocks=0, 
rtextents=0
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use.
[ceph0][DEBUG ] connected to host: ceph0
[ceph0][DEBUG ] detect platform information from remote host
[ceph0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdd journal /dev/sda 
activate True
[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph 
-- /dev/sdd /dev/sda
[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal is 
not the same device as the osd data

2 the mount system for that osd shows:
[root@host ~]# mount -l
/dev/sdc1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/sdb1 on /var/lib/ceph/tmp/mnt.6D02EM type xfs (rw,noatime)

3 my testbed information is:
os: centos 6.4 Final
ceph: dumpling 67.4
three hosts: ceph0 ceph1 ceph2
each host have 3 disk sharing one ssd disk as journal

4 my ceph

Re: [ceph-users] ceph-deploy: osd creating hung with one ssd disk as shared journal

2013-11-12 Thread Michael
As long as there's room on the SSD for the partitioner it'll just use 
the conf value for osd journal size to section it up as it adds OSD's (I 
generally use the "ceph-deploy osd create srv:data:journal e.g. 
srv-12:/dev/sdb:/dev/sde" format when adding disks).
Does it being /dev/sda mean you're putting your journal onto an already 
partitioned and in use by the OS SSD?


-Michael

On 12/11/2013 18:09, Gruher, Joseph R wrote:


I didn't think you could specify the journal in this manner (just 
pointing multiple OSDs on the same host all to journal /dev/sda).  
Don't you either need to partition the SSD and point each SSD to a 
separate partition, or format and mount the SSD and each OSD will use 
a unique file on the mount? I've always created a separate partition 
on the SSD for each journal.


Preparing cluster ceph disks ceph0:/dev/sdb:/dev/sda 
ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda 
ceph0:/dev/sdf:/dev/sda ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda


*From:*ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Tim Zhang

*Sent:* Tuesday, November 12, 2013 2:20 AM
*To:* ceph-users@lists.ceph.com
*Subject:* [ceph-users] ceph-deploy: osd creating hung with one ssd 
disk as shared journal


Hi guys,

I use ceph-deploy to manage my cluster, but I get failed while 
creating the OSD, the process seems to hang up at creating first osd. 
By the way, SELinux is disabled, and my ceph-disk is patched according 
to the page:http://www.spinics.net/lists/ceph-users/msg03258.html


can you guys give me some advise?

(1) the output of ceph-deploy is:

Invoked (1.3.1): /usr/bin/ceph-deploy osd create ceph0:sdb:sda 
ceph0:sdd:sda ceph0:sde:sda ceph0:sdf:sda ceph0:sdg:sda ceph0:sdh:sda 
ceph1:sdb:sda ceph1:sdd:sda ceph1:sde:sda ceph1:sdf:sda ceph1:sdg:sda 
ceph1:sdh:sda ceph2:sdb:sda ceph2:sdd:sda ceph2:sde:sda ceph2:sdf:sda 
ceph2:sdg:sda ceph2:sdh:sda


[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph0:/dev/sdb:/dev/sda ceph0:/dev/sdd:/dev/sda 
ceph0:/dev/sde:/dev/sda ceph0:/dev/sdf:/dev/sda 
ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda 
ceph1:/dev/sdb:/dev/sda ceph1:/dev/sdd:/dev/sda 
ceph1:/dev/sde:/dev/sda ceph1:/dev/sdf:/dev/sda 
ceph1:/dev/sdg:/dev/sda ceph1:/dev/sdh:/dev/sda 
ceph2:/dev/sdb:/dev/sda ceph2:/dev/sdd:/dev/sda 
ceph2:/dev/sde:/dev/sda ceph2:/dev/sdf:/dev/sda 
ceph2:/dev/sdg:/dev/sda ceph2:/dev/sdh:/dev/sda


[ceph0][DEBUG ] connected to host: ceph0

[ceph0][DEBUG ] detect platform information from remote host

[ceph0][DEBUG ] detect machine type

[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final

[ceph_deploy.osd][DEBUG ] Deploying osd to ceph0

[ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[ceph0][INFO  ] Running command: udevadm trigger 
--subsystem-match=block --action=add


[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdb journal 
/dev/sda activate True


[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs 
--cluster ceph -- /dev/sdb /dev/sda


[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data


[ceph0][ERROR ] Warning: WARNING: the kernel failed to re-read the 
partition table on /dev/sda (Device or resource busy).  As a result, 
it may not reflect all of your changes until after reboot.


[ceph0][ERROR ] BLKPG: Device or resource busy

[ceph0][ERROR ] error adding partition 1

[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][DEBUG ] meta-data=/dev/sdb1isize=2048   agcount=4, 
agsize=61047597 blks


[ceph0][DEBUG ]  = sectsz=512   attr=2, projid32bit=0

[ceph0][DEBUG ] data = bsize=4096   blocks=244190385, 
imaxpct=25


[ceph0][DEBUG ]  = sunit=0  swidth=0 blks

[ceph0][DEBUG ] naming   =version 2bsize=4096   ascii-ci=0

[ceph0][DEBUG ] log  =internal log   bsize=4096   
blocks=119233, version=2


[ceph0][DEBUG ]  = sectsz=512   sunit=0 blks, 
lazy-count=1


[ceph0][DEBUG ] realtime =none extsz=4096   blocks=0, 
rtextents=0


[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][INFO  ] Running command: udevadm trigger 
--subsystem-match=block --action=add


[ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use.

[ceph0][DEBUG ] connected to host: ceph0

[ceph0][DEBUG ] detect platform information from remote host

[ceph0][DEBUG ] detect machine type

[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final

[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdd journal 
/dev/sda activate True


[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs 
--cluster ceph -- /dev/sdd /dev/sda


[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data


2 the mount system for that osd shows:

[root@host ~]# mount -l

/dev

Re: [ceph-users] ceph-deploy: osd creating hung with one ssd disk as shared journal

2013-11-12 Thread Michael
Sorry, just spotted you're mounting on sdc. Can you chuck out a partx -v 
/dev/sda to see if there's anything odd about the data currently on there?


-Michael

On 12/11/2013 18:22, Michael wrote:
As long as there's room on the SSD for the partitioner it'll just use 
the conf value for osd journal size to section it up as it adds OSD's 
(I generally use the "ceph-deploy osd create srv:data:journal e.g. 
srv-12:/dev/sdb:/dev/sde" format when adding disks).
Does it being /dev/sda mean you're putting your journal onto an 
already partitioned and in use by the OS SSD?


-Michael

On 12/11/2013 18:09, Gruher, Joseph R wrote:


I didn't think you could specify the journal in this manner (just 
pointing multiple OSDs on the same host all to journal /dev/sda).  
Don't you either need to partition the SSD and point each SSD to a 
separate partition, or format and mount the SSD and each OSD will use 
a unique file on the mount?  I've always created a separate partition 
on the SSD for each journal.


Preparing cluster ceph disks ceph0:/dev/sdb:/dev/sda 
ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda 
ceph0:/dev/sdf:/dev/sda ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda


*From:*ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Tim Zhang

*Sent:* Tuesday, November 12, 2013 2:20 AM
*To:* ceph-users@lists.ceph.com
*Subject:* [ceph-users] ceph-deploy: osd creating hung with one ssd 
disk as shared journal


Hi guys,

I use ceph-deploy to manage my cluster, but I get failed while 
creating the OSD, the process seems to hang up at creating first osd. 
By the way, SELinux is disabled, and my ceph-disk is patched 
according to the 
page:http://www.spinics.net/lists/ceph-users/msg03258.html


can you guys give me some advise?

(1) the output of ceph-deploy is:

Invoked (1.3.1): /usr/bin/ceph-deploy osd create ceph0:sdb:sda 
ceph0:sdd:sda ceph0:sde:sda ceph0:sdf:sda ceph0:sdg:sda ceph0:sdh:sda 
ceph1:sdb:sda ceph1:sdd:sda ceph1:sde:sda ceph1:sdf:sda ceph1:sdg:sda 
ceph1:sdh:sda ceph2:sdb:sda ceph2:sdd:sda ceph2:sde:sda ceph2:sdf:sda 
ceph2:sdg:sda ceph2:sdh:sda


[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph0:/dev/sdb:/dev/sda ceph0:/dev/sdd:/dev/sda 
ceph0:/dev/sde:/dev/sda ceph0:/dev/sdf:/dev/sda 
ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda 
ceph1:/dev/sdb:/dev/sda ceph1:/dev/sdd:/dev/sda 
ceph1:/dev/sde:/dev/sda ceph1:/dev/sdf:/dev/sda 
ceph1:/dev/sdg:/dev/sda ceph1:/dev/sdh:/dev/sda 
ceph2:/dev/sdb:/dev/sda ceph2:/dev/sdd:/dev/sda 
ceph2:/dev/sde:/dev/sda ceph2:/dev/sdf:/dev/sda 
ceph2:/dev/sdg:/dev/sda ceph2:/dev/sdh:/dev/sda


[ceph0][DEBUG ] connected to host: ceph0

[ceph0][DEBUG ] detect platform information from remote host

[ceph0][DEBUG ] detect machine type

[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final

[ceph_deploy.osd][DEBUG ] Deploying osd to ceph0

[ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[ceph0][INFO  ] Running command: udevadm trigger 
--subsystem-match=block --action=add


[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdb journal 
/dev/sda activate True


[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs 
--cluster ceph -- /dev/sdb /dev/sda


[ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data


[ceph0][ERROR ] Warning: WARNING: the kernel failed to re-read the 
partition table on /dev/sda (Device or resource busy).  As a result, 
it may not reflect all of your changes until after reboot.


[ceph0][ERROR ] BLKPG: Device or resource busy

[ceph0][ERROR ] error adding partition 1

[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][DEBUG ] meta-data=/dev/sdb1  isize=2048   
agcount=4, agsize=61047597 blks


[ceph0][DEBUG ]  =   sectsz=512   attr=2, 
projid32bit=0


[ceph0][DEBUG ] data =   bsize=4096   
blocks=244190385, imaxpct=25


[ceph0][DEBUG ]  =   sunit=0  swidth=0 blks

[ceph0][DEBUG ] naming   =version 2  bsize=4096   ascii-ci=0

[ceph0][DEBUG ] log  =internal log   bsize=4096   
blocks=119233, version=2


[ceph0][DEBUG ]  =   sectsz=512   sunit=0 blks, 
lazy-count=1


[ceph0][DEBUG ] realtime =none   extsz=4096   blocks=0, 
rtextents=0


[ceph0][DEBUG ] The operation has completed successfully.

[ceph0][INFO  ] Running command: udevadm trigger 
--subsystem-match=block --action=add


[ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use.

[ceph0][DEBUG ] connected to host: ceph0

[ceph0][DEBUG ] detect platform information from remote host

[ceph0][DEBUG ] detect machine type

[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final

[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdd journal 
/dev/sda activate True


[ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs 
--cluster

[ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-12 Thread Dmitry Borodaenko
I can get ephemeral storage for Nova to work with RBD backend, but I
don't understand why it only works with the admin cephx user? With a
different user starting a VM fails, even if I set its caps to 'allow
*'.

Here's what I have in nova.conf:
libvirt_images_type=rbd
libvirt_images_rbd_pool=images
rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399
rbd_user=images

The secret UUID is defined following the same steps as for Cinder and Glance:
http://ceph.com/docs/master/rbd/libvirt/

BTW rbd_user option doesn't seem to be documented anywhere, is that a
documentation bug?

And here's what 'ceph auth list' tells me about my cephx users:

client.admin
key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.images
key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.volumes
key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow
rwx pool=volumes, allow rx pool=images

Setting rbd_user to images or volumes doesn't work.

What am I missing?

Thanks,

-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-12 Thread Dmitry Borodaenko
And to answer my own question, I was missing a meaningful error
message: what the ObjectNotFound exception I got from librados didn't
tell me was that I didn't have the images keyring file in /etc/ceph/
on my compute node. After 'ceph auth get-or-create client.images >
/etc/ceph/ceph.client.images.keyring' and reverting images caps back
to original state, it all works!

On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko
 wrote:
> I can get ephemeral storage for Nova to work with RBD backend, but I
> don't understand why it only works with the admin cephx user? With a
> different user starting a VM fails, even if I set its caps to 'allow
> *'.
>
> Here's what I have in nova.conf:
> libvirt_images_type=rbd
> libvirt_images_rbd_pool=images
> rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399
> rbd_user=images
>
> The secret UUID is defined following the same steps as for Cinder and Glance:
> http://ceph.com/docs/master/rbd/libvirt/
>
> BTW rbd_user option doesn't seem to be documented anywhere, is that a
> documentation bug?
>
> And here's what 'ceph auth list' tells me about my cephx users:
>
> client.admin
> key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg==
> caps: [mds] allow
> caps: [mon] allow *
> caps: [osd] allow *
> client.images
> key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g==
> caps: [mds] allow
> caps: [mon] allow *
> caps: [osd] allow *
> client.volumes
> key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow
> rwx pool=volumes, allow rx pool=images
>
> Setting rbd_user to images or volumes doesn't work.
>
> What am I missing?
>
> Thanks,
>
> --
> Dmitry Borodaenko



-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] near full osd

2013-11-12 Thread John Wilkins
We probably do need to go over it again and account for PG splitting.

On Fri, Nov 8, 2013 at 9:26 AM, Gregory Farnum  wrote:
> After you increase the number of PGs, *and* increase the "pgp_num" to do the
> rebalancing (this is all described in the docs; do a search), data will move
> around and the overloaded OSD will have less stuff on it. If it's actually
> marked as full, though, this becomes a bit trickier. Search the list
> archives for some instructions; I don't remember the best course to follow.
> -Greg
>
> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks again Gregory!
>>
>> One more quick question. If I raise the amount of PGs for a pool, will
>> this REMOVE any data from the full OSD? Or will I have to take the OSD out
>> and put it back in to realize this benefit? Thanks!
>>
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.wei...@imc-chicago.com
>>
>>
>> From: Gregory Farnum 
>> Date: Friday, November 8, 2013 11:00 AM
>> To: Kevin Weiler 
>> Cc: "Aronesty, Erik" , Greg Chavez
>> , "ceph-users@lists.ceph.com"
>> 
>> Subject: Re: [ceph-users] near full osd
>>
>> It's not a hard value; you should adjust based on the size of your pools
>> (many of then are quite small when used with RGW, for instance). But in
>> general it is better to have more than fewer, and if you want to check you
>> can look at the sizes of each PG (ceph pg dump) and increase the counts for
>> pools with wide variability-Greg
>>
>> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks Gregory,
>>
>> One point that was a bit unclear in documentation is whether or not this
>> equation for PGs applies to a single pool, or the entirety of pools.
>> Meaning, if I calculate 3000 PGs, should each pool have 3000 PGs or should
>> all the pools ADD UP to 3000 PGs? Thanks!
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.wei...@imc-chicago.com
>>
>>
>>
>>
>>
>>
>>
>> On 11/7/13 9:59 PM, "Gregory Farnum"  wrote:
>>
>> >It sounds like maybe your PG counts on your pools are too low and so
>> >you're just getting a bad balance. If that's the case, you can
>> >increase the PG count with "ceph osd pool  set pgnum > >value>".
>> >
>> >OSDs should get data approximately equal to /> >weights>, so higher weights get more data and all its associated
>> >traffic.
>> >-Greg
>> >Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >
>> >
>> >On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler
>> > wrote:
>> >> All of the disks in my cluster are identical and therefore all have the
>> >>same
>> >> weight (each drive is 2TB and the automatically generated weight is
>> >>1.82 for
>> >> each one).
>> >>
>> >> Would the procedure here be to reduce the weight, let it rebal, and
>> >>then put
>> >> the weight back to where it was?
>> >>
>> >>
>> >> --
>> >>
>> >> Kevin Weiler
>> >>
>> >> IT
>> >>
>> >>
>> >>
>> >> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> >>60606
>> >> | http://imc-chicago.com/
>> >>
>> >> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> >> kevin.wei...@imc-chicago.com
>> >>
>> >>
>> >> From: , Erik 
>> >> Date: Tuesday, November 5, 2013 10:27 AM
>> >> To: Greg Chavez , Kevin Weiler
>> >> 
>> >> Cc: "ceph-users@lists.ceph.com" 
>> >> Subject: RE: [ceph-users] near full osd
>> >>
>> >> If there¹s an underperforming disk, why on earth would more data be put
>> >>on
>> >> it?  You¹d think it would be lessŠ.   I would think an overperforming
>> >>disk
>> >> should (desirably) cause that case,right?
>> >>
>> >>
>> >>
>> >> From: ceph-users-boun...@lists.ceph.com
>> >> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez
>> >> Sent: Tuesday, November 05, 2013 11:20 AM
>> >> To: Kevin Weiler
>> >> Cc: ceph-users@lists.ceph.com
>> >> Subject: Re: [ceph-users] near full osd
>> >>
>> >>
>> >>
>> >> Kevin, in my experience that usually indicates a bad or underperforming
>> >> disk, or a too-high priority.  Try running "ceph osd crush reweight
>> >>osd.<##>
>> >> 1.0.  If that doesn't do the trick, you may want to just out that guy.
>> >>
>> >>
>> >>
>> >> I don't think the crush algorithm guarantees balancing things out in
>> >>the way
>> >> you're expecting.
>> >>
>> >>
>> >>
>> >> --Greg
>> >>
>> >> On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler
>> >>
>> >> wrote:
>> >>
>> >> Hi guys,
>> >>
>> >>
>> >>
>> >> I have an OSD in my cluster that is near full at 90%, but we're using a
>> >> little less than half the available storage in the cluster. Shouldn't
>> >>this
>> >> be balanced out?
>> >>
>> >>
>> >>
>> >> --
>> >
>>
>>
>> 
>>
>> The information in this e-mail is intended only for the person o

Re: [ceph-users] near full osd

2013-11-12 Thread Samuel Just
I think we removed the experimental warning in cuttlefish.  It
probably wouldn't hurt to do it in bobtail particularly if you test it
extensively on a test cluster first.  However, we didn't do extensive
testing on it until cuttlefish.  I would upgrade to cuttlefish
(actually, dumpling or emperor, now) first.  Also, please note that in
any version, pg split causes massive data movement.
-Sam

On Mon, Nov 11, 2013 at 7:04 AM, Oliver Francke  wrote:
> Hi Greg,
>
> we are in a similar situation with a huge disbalance, so some of our 28
> OSD's are about 40%, whereas some are "near full" 84%.
> Default is 8, we have a default with 32, but for some pools where customers
> raised their VM-hd's quickly to 1TB and more in sum, I think this is where
> the problems come from?!
>
> For some other reason we are still running good'ol' bobtail, and in the lab
> I tried to force increase via "--allow-experimental-feature" with
> 0.56.7-3...
> It's working, but how experimental is it for production?
>
> Thnx in advance,
>
> Oliver.
>
>
> On 11/08/2013 06:26 PM, Gregory Farnum wrote:
>
> After you increase the number of PGs, *and* increase the "pgp_num" to do the
> rebalancing (this is all described in the docs; do a search), data will move
> around and the overloaded OSD will have less stuff on it. If it's actually
> marked as full, though, this becomes a bit trickier. Search the list
> archives for some instructions; I don't remember the best course to follow.
> -Greg
>
> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks again Gregory!
>>
>> One more quick question. If I raise the amount of PGs for a pool, will
>> this REMOVE any data from the full OSD? Or will I have to take the OSD out
>> and put it back in to realize this benefit? Thanks!
>>
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.wei...@imc-chicago.com
>>
>>
>> From: Gregory Farnum 
>> Date: Friday, November 8, 2013 11:00 AM
>> To: Kevin Weiler 
>> Cc: "Aronesty, Erik" , Greg Chavez
>> , "ceph-users@lists.ceph.com"
>> 
>> Subject: Re: [ceph-users] near full osd
>>
>> It's not a hard value; you should adjust based on the size of your pools
>> (many of then are quite small when used with RGW, for instance). But in
>> general it is better to have more than fewer, and if you want to check you
>> can look at the sizes of each PG (ceph pg dump) and increase the counts for
>> pools with wide variability-Greg
>>
>> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks Gregory,
>>
>> One point that was a bit unclear in documentation is whether or not this
>> equation for PGs applies to a single pool, or the entirety of pools.
>> Meaning, if I calculate 3000 PGs, should each pool have 3000 PGs or should
>> all the pools ADD UP to 3000 PGs? Thanks!
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.wei...@imc-chicago.com
>>
>>
>>
>>
>>
>>
>>
>> On 11/7/13 9:59 PM, "Gregory Farnum"  wrote:
>>
>> >It sounds like maybe your PG counts on your pools are too low and so
>> >you're just getting a bad balance. If that's the case, you can
>> >increase the PG count with "ceph osd pool  set pgnum > >value>".
>> >
>> >OSDs should get data approximately equal to /> >weights>, so higher weights get more data and all its associated
>> >traffic.
>> >-Greg
>> >Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >
>> >
>> >On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler
>> > wrote:
>> >> All of the disks in my cluster are identical and therefore all have the
>> >>same
>> >> weight (each drive is 2TB and the automatically generated weight is
>> >>1.82 for
>> >> each one).
>> >>
>> >> Would the procedure here be to reduce the weight, let it rebal, and
>> >>then put
>> >> the weight back to where it was?
>> >>
>> >>
>> >> --
>> >>
>> >> Kevin Weiler
>> >>
>> >> IT
>> >>
>> >>
>> >>
>> >> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> >>60606
>> >> | http://imc-chicago.com/
>> >>
>> >> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> >> kevin.wei...@imc-chicago.com
>> >>
>> >>
>> >> From: , Erik 
>> >> Date: Tuesday, November 5, 2013 10:27 AM
>> >> To: Greg Chavez , Kevin Weiler
>> >> 
>> >> Cc: "ceph-users@lists.ceph.com" 
>> >> Subject: RE: [ceph-users] near full osd
>> >>
>> >> If there¹s an underperforming disk, why on earth would more data be put
>> >>on
>> >> it?  You¹d think it would be lessŠ.   I would think an overperforming
>> >>disk
>> >> should (desirably) cause that case,right?
>> >>
>> >>
>> >>
>> >> From: ceph-users-boun...@lists.ceph.com
>> >> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez
>> >> Sent: Tuesday, November 05, 2013 11:20 AM
>> >> To: Kevi

[ceph-users] Recover from corrupted journals

2013-11-12 Thread Bryan Stillwell
While updating my cluster to use a 2K block size for XFS, I've run
into a couple OSDs failing to start because of corrupted journals:

=== osd.1 ===
   -10> 2013-11-12 13:40:35.388177 7f030458a7a0  1
filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
-9> 2013-11-12 13:40:35.388194 7f030458a7a0  1
filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
-8> 2013-11-12 13:40:49.735893 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is supported and appears to work
-7> 2013-11-12 13:40:49.735955 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
-6> 2013-11-12 13:40:49.778879 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
syscall(SYS_syncfs, fd) fully supported
-5> 2013-11-12 13:41:02.512202 7f030458a7a0  0
filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
-4> 2013-11-12 13:41:05.932177 7f030458a7a0  2 journal open
/var/lib/ceph/osd/ceph-1/journal fsid
f7bde53e-458a-4398-a949-770648ddc414 fs_op_seq 2973368
-3> 2013-11-12 13:41:05.964093 7f030458a7a0  1 journal _open
/var/lib/ceph/osd/ceph-1/journal fd 20: 1072693248 bytes, block size
4096 bytes, directio = 1, aio = 1
-2> 2013-11-12 13:41:05.987641 7f030458a7a0  2 journal read_entry
361586688 : seq 2973370 55428 bytes
-1> 2013-11-12 13:41:05.988024 7f030458a7a0 -1 journal Unable to
read past sequence 2973369 but header indicates the journal has
committed up through 2980190, journal is corrupt
 0> 2013-11-12 13:41:06.070833 7f030458a7a0 -1 os/FileJournal.cc:
In function 'bool FileJournal::read_entry(ceph::bufferlist&,
uint64_t&, bool*)' thread 7f030458a7a0 time 2013-11-12 13:41:05.988054
os/FileJournal.cc: 1697: FAILED assert(0)

 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
bool*)+0xa46) [0x6d9ab6]
 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835]
 3: (FileStore::mount()+0x2db0) [0x70e330]
 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba]
 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9]
 6: (main()+0x3190) [0x5c65d0]
 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd]
 8: /usr/bin/ceph-osd() [0x5c3089]


=== osd.4 ===
   -10> 2013-11-11 16:31:52.697736 7fefe710e7a0  1
filestore(/var/lib/ceph/osd/ceph-4) mount detected xfs
-9> 2013-11-11 16:31:52.697764 7fefe710e7a0  1
filestore(/var/lib/ceph/osd/ceph-4)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
-8> 2013-11-11 16:32:06.301437 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
FIEMAP ioctl is supported and appears to work
-7> 2013-11-11 16:32:06.301478 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
-6> 2013-11-11 16:32:06.321094 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
syscall(SYS_syncfs, fd) fully supported
-5> 2013-11-11 16:32:06.642899 7fefe710e7a0  0
filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
-4> 2013-11-11 16:32:10.047982 7fefe710e7a0  2 journal open
/var/lib/ceph/osd/ceph-4/journal fsid
1c68cdc3-4ba1-4711-86a2-517d32b352fa fs_op_seq 2964169
-3> 2013-11-11 16:32:10.062596 7fefe710e7a0  1 journal _open
/var/lib/ceph/osd/ceph-4/journal fd 21: 1072693248 bytes, block size
4096 bytes, directio = 1, aio = 1
-2> 2013-11-11 16:32:10.132954 7fefe710e7a0  2 journal read_entry
993447936 : seq 2964171 8007 bytes
-1> 2013-11-11 16:32:10.133125 7fefe710e7a0 -1 journal Unable to
read past sequence 2964170 but header indicates the journal has
committed up through 2967854, journal is corrupt
 0> 2013-11-11 16:32:10.135432 7fefe710e7a0 -1 os/FileJournal.cc:
In function 'bool FileJournal::read_entry(ceph::bufferlist&,
uint64_t&, bool*)' thread 7fefe710e7a0 time 2013-11-11 16:32:10.133149
os/FileJournal.cc: 1697: FAILED assert(0)

 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
bool*)+0xa46) [0x6d9ab6]
 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835]
 3: (FileStore::mount()+0x2db0) [0x70e330]
 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba]
 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9]
 6: (main()+0x3190) [0x5c65d0]
 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd]
 8: /usr/bin/ceph-osd() [0x5c3089]


What's the best way to recover from this situation?

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-

[ceph-users] Questions/comments on using ZFS for OSDs

2013-11-12 Thread Eric Eastman
I built Ceph version 0.72 with --with-libzfs on Ubuntu 1304 after 
installing ZFS

from th ppa:zfs-native/stable repository. The ZFS version is v0.6.2-1

I do have a few questions and comments on Ceph using ZFS backed OSDs

As ceph-deploy does not show support for ZFS, I used the instructions 
at:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
and hand created a new OSD on an existing Ceph system. I guest that I 
needed to build a zpool out of a disk, and then create a ZFS file 
system that mounted to  /var/lib/ceph/osd/ceph-X, where X was the 
number given when I ran the ceph osd create command.  As I am testing 
on a VM, I created 2 new disks, one 2GB (/dev/sde) for journal and one 
32GB (/dev/sdd) for data. To setup the system for ZFS based OSDs, I 
first added to all my ceph.conf files:


   filestore zfs_snap = 1
   journal_aio = 0
   journal_dio = 0

I then created the OSD with the commands:

# ceph osd create
4
# parted -s /dev/sdd mklabel gpt mkpart -- -- 1 \-1
# parted -s /dev/sde mklabel gpt mkpart -- -- 1 \-1
# zpool create sdd /dev/sdd
# mkdir /var/lib/ceph/osd/ceph-4
# zfs create -o mountpoint=/var/lib/ceph/osd/ceph-4 sdd/ceph-4
# ceph-osd  -i 4 --mkfs --mkkey --osd-journal=/dev/sde1 --mkjournal
# ceph auth add osd.4 osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/osd/ceph-4/keyring


I then decompiled the crush map, added osd.4, and recompiled the map, 
and set Ceph to use the new crush map.


When I started the osd.4 with:

# start ceph-osd id=4

It failed to start, as the ceph osd log file indicated the journal was 
missing:
 mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (2) 
No such file or directory


So I manually created a link named journal to /dev/sde1 and created the 
journal_uuid file.  Should ceph-osd have done this step?  Is there 
anything else I may of missed?


With limited testing, the ZFS backed OSD seems to function correctly.

I was wondering if there are any ZFS file system options that should be 
set for better performance or data safety.


It would be nice if ceph-deploy would handle ZFS.

Lastly, I want to thank Yan, Zheng and all the rest who worked on this 
project.


Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions/comments on using ZFS for OSDs

2013-11-12 Thread asomers
On Tue, Nov 12, 2013 at 3:43 PM, Eric Eastman  wrote:
> I built Ceph version 0.72 with --with-libzfs on Ubuntu 1304 after installing
> ZFS
> from th ppa:zfs-native/stable repository. The ZFS version is v0.6.2-1
>
> I do have a few questions and comments on Ceph using ZFS backed OSDs
>
> As ceph-deploy does not show support for ZFS, I used the instructions at:
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
> and hand created a new OSD on an existing Ceph system. I guest that I needed
> to build a zpool out of a disk, and then create a ZFS file system that
> mounted to  /var/lib/ceph/osd/ceph-X, where X was the number given when I
> ran the ceph osd create command.  As I am testing on a VM, I created 2 new
> disks, one 2GB (/dev/sde) for journal and one 32GB (/dev/sdd) for data. To
> setup the system for ZFS based OSDs, I first added to all my ceph.conf
> files:
>
>filestore zfs_snap = 1
>journal_aio = 0
>journal_dio = 0
>
> I then created the OSD with the commands:
>
> # ceph osd create
> 4
> # parted -s /dev/sdd mklabel gpt mkpart -- -- 1 \-1
> # parted -s /dev/sde mklabel gpt mkpart -- -- 1 \-1
> # zpool create sdd /dev/sdd

Since you are using the entire disk for your pool, you don't need a
GPT label.  You can eliminate the parted commands.

> # mkdir /var/lib/ceph/osd/ceph-4
> # zfs create -o mountpoint=/var/lib/ceph/osd/ceph-4 sdd/ceph-4
> # ceph-osd  -i 4 --mkfs --mkkey --osd-journal=/dev/sde1 --mkjournal
> # ceph auth add osd.4 osd 'allow *' mon 'allow rwx' -i
> /var/lib/ceph/osd/ceph-4/keyring
>
> I then decompiled the crush map, added osd.4, and recompiled the map, and
> set Ceph to use the new crush map.
>
> When I started the osd.4 with:
>
> # start ceph-osd id=4
>
> It failed to start, as the ceph osd log file indicated the journal was
> missing:
>  mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (2) No
> such file or directory
>
> So I manually created a link named journal to /dev/sde1 and created the
> journal_uuid file.  Should ceph-osd have done this step?  Is there anything
> else I may of missed?
>
> With limited testing, the ZFS backed OSD seems to function correctly.
>
> I was wondering if there are any ZFS file system options that should be set
> for better performance or data safety.
>
> It would be nice if ceph-deploy would handle ZFS.
>
> Lastly, I want to thank Yan, Zheng and all the rest who worked on this
> project.
>
> Eric
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions/comments on using ZFS for OSDs

2013-11-12 Thread Mark Nelson

On 11/12/2013 04:43 PM, Eric Eastman wrote:

I built Ceph version 0.72 with --with-libzfs on Ubuntu 1304 after
installing ZFS
from th ppa:zfs-native/stable repository. The ZFS version is v0.6.2-1

I do have a few questions and comments on Ceph using ZFS backed OSDs

As ceph-deploy does not show support for ZFS, I used the instructions at:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
and hand created a new OSD on an existing Ceph system. I guest that I
needed to build a zpool out of a disk, and then create a ZFS file system
that mounted to  /var/lib/ceph/osd/ceph-X, where X was the number given
when I ran the ceph osd create command.  As I am testing on a VM, I
created 2 new disks, one 2GB (/dev/sde) for journal and one 32GB
(/dev/sdd) for data. To setup the system for ZFS based OSDs, I first
added to all my ceph.conf files:

filestore zfs_snap = 1
journal_aio = 0
journal_dio = 0

I then created the OSD with the commands:

# ceph osd create
4
# parted -s /dev/sdd mklabel gpt mkpart -- -- 1 \-1
# parted -s /dev/sde mklabel gpt mkpart -- -- 1 \-1
# zpool create sdd /dev/sdd
# mkdir /var/lib/ceph/osd/ceph-4
# zfs create -o mountpoint=/var/lib/ceph/osd/ceph-4 sdd/ceph-4
# ceph-osd  -i 4 --mkfs --mkkey --osd-journal=/dev/sde1 --mkjournal
# ceph auth add osd.4 osd 'allow *' mon 'allow rwx' -i
/var/lib/ceph/osd/ceph-4/keyring

I then decompiled the crush map, added osd.4, and recompiled the map,
and set Ceph to use the new crush map.

When I started the osd.4 with:

# start ceph-osd id=4

It failed to start, as the ceph osd log file indicated the journal was
missing:
  mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (2)
No such file or directory

So I manually created a link named journal to /dev/sde1 and created the
journal_uuid file.  Should ceph-osd have done this step?  Is there
anything else I may of missed?

With limited testing, the ZFS backed OSD seems to function correctly.

I was wondering if there are any ZFS file system options that should be
set for better performance or data safety.


You may want to try using SA xattrs.  This resulted in a measurable 
performance improvement when I was testing Ceph on ZFS last spring.




It would be nice if ceph-deploy would handle ZFS.

Lastly, I want to thank Yan, Zheng and all the rest who worked on this
project.

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel Panic / RBD Instability

2013-11-12 Thread Uwe Grohnwaldt
Hi,

we're experiencing the same problem. We have a cluster with 6 machines and 60 
OSDs (Supercmiro 2 HE 24 disks max, LSI controller). We have three R300 as 
monitor nodes and two more R300 as iscsi-targets. We are using targetcli, too. 
I don't need to say we have a cluster, public and iscsi-network. each on 
separate switches. Our operating system is Ubuntu 13.10 (with ceph from 
ubuntu-repos).

We mapped the blockdevices (4TB each) and exported them from /dev/rbd/rbd/. We 
have the same problem that our iSCSI nodes are getting a kernelpanic when a 
backfill or (deep)scrub starts or we have a flaky disk. The last point we are 
working against with one RAID1 for each osd. The other problem is, that the 
iscsi machine gets a kernelpanic when the IO is really slow or doesn't work 
within some seconds. We looked around a bit and found an attribute: 
task_timeout which can be set for block devices but not for rbd-devices. Our 
explanation for this behavior:

1. iscsi works normal
2. an osd gets offline or a whole node gets offline
3. the cluster needs some seconds to get responsive again (it takes over 5 
seconds - maybe tuning can help here?)
4. recovery process starts, e.g. dead disk, dead node and the cluster gets more 
load and the responsetimes get slower (tuning can help here again, too?)
5. we get lio/targetcli IO errors which goes up to our hypervisors and break 
filesystems in virtual machines. Moreover the lio-machines get a kernel panic 
while waiting. 

As I already wrote, we can't set task_timeout to work around this behavior. So 
we tried stgt (http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ and 
http://ceph.com/dev-notes/updates-to-ceph-tgt-iscsi-support/). It works much 
better with rbd-backend, but now we have problems with VMware ESX-machines.

It's incredible slow to write on the iscsi target. Even with tuning it isn't 
possible to get a fast installation. We needed two to three hours to install a 
CentOS 6 VM.

Maybe this can help you to track down your problems. At the moment we are 
searching for a solution to get a ceph cluster with iscsi exports for our 
VMware environment.

Mit freundlichen Grüßen / Best Regards,
--
Uwe Grohnwaldt

- Original Message -
> From: "Gregory Farnum" 
> To: "James Wilkins" 
> Cc: ceph-users@lists.ceph.com
> Sent: Freitag, 8. November 2013 06:23:00
> Subject: Re: [ceph-users] Kernel Panic / RBD Instability
> 
> Well, as you've noted you're getting some slow requests on the OSDs
> when they turn back on; and then the iSCSI gateway is panicking
> (probably because the block device write request is just hanging).
> We've gotten prior reports that iSCSI is a lot more sensitive to a
> few
> slow requests than most use cases, and OSDs coming back in can cause
> some slow requests, but if it's a common case for you then there's
> probably something that can be done to optimize that recovery. Have
> you checked into what's blocking the slow operations or why the PGs
> are taking so long to get ready?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> On Tue, Nov 5, 2013 at 1:33 AM, James Wilkins
>  wrote:
> > Hello,
> >
> > Wondering if anyone else has come over an issue we're having with
> > our POC CEPH Cluster at the moment.
> >
> > Some details about its setup;
> >
> > 6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics
> > 4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB
> > Nics
> > 3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel
> > ISCSI) - 4 x 10GB Nics.  An RBD device is mounted and exported via
> > targetcli, this is then mounted on a client device to push backup
> > data.
> >
> > All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4
> >
> > Machines are split over two racks (distinct layer 2 domains) using
> > a leaf/spine model and we use ECMP/quagga on the ISCSI heads to
> > reach the CEPH Cluster.
> >
> > Crush map has racks defined to spread data over 2 racks -  I've
> > attached the ceph.conf
> >
> > The cluster performs great normally, and we only have issues when
> > simulating rack failure.
> >
> > The issue comes when the following steps are taken
> >
> > o) Initiate load against the cluster (backups going via ISCSI)
> > o) ceph osd set noout
> > o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the
> > nodes in 1 Rack)
> > o) Cluster goes degraded, as expected
> >
> >   cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4
> >health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs
> >stuck unclean; recovery 1286582/2700870 degraded (47.636%);
> >108/216 in osds are down; noout flag(s) set
> >monmap e3: 5 mons at
> >
> > {fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0},
> >election epoch 74, quorum 0,1,2,3,4
> >
> > fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ce

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-12 Thread Dinu Vlad
Out of curiosity - can you live-migrate instances with this setup? 



On Nov 12, 2013, at 10:38 PM, Dmitry Borodaenko  
wrote:

> And to answer my own question, I was missing a meaningful error
> message: what the ObjectNotFound exception I got from librados didn't
> tell me was that I didn't have the images keyring file in /etc/ceph/
> on my compute node. After 'ceph auth get-or-create client.images >
> /etc/ceph/ceph.client.images.keyring' and reverting images caps back
> to original state, it all works!
> 
> On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko
>  wrote:
>> I can get ephemeral storage for Nova to work with RBD backend, but I
>> don't understand why it only works with the admin cephx user? With a
>> different user starting a VM fails, even if I set its caps to 'allow
>> *'.
>> 
>> Here's what I have in nova.conf:
>> libvirt_images_type=rbd
>> libvirt_images_rbd_pool=images
>> rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399
>> rbd_user=images
>> 
>> The secret UUID is defined following the same steps as for Cinder and Glance:
>> http://ceph.com/docs/master/rbd/libvirt/
>> 
>> BTW rbd_user option doesn't seem to be documented anywhere, is that a
>> documentation bug?
>> 
>> And here's what 'ceph auth list' tells me about my cephx users:
>> 
>> client.admin
>>key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg==
>>caps: [mds] allow
>>caps: [mon] allow *
>>caps: [osd] allow *
>> client.images
>>key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g==
>>caps: [mds] allow
>>caps: [mon] allow *
>>caps: [osd] allow *
>> client.volumes
>>key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg==
>>caps: [mon] allow r
>>caps: [osd] allow class-read object_prefix rbd_children, allow
>> rwx pool=volumes, allow rx pool=images
>> 
>> Setting rbd_user to images or volumes doesn't work.
>> 
>> What am I missing?
>> 
>> Thanks,
>> 
>> --
>> Dmitry Borodaenko
> 
> 
> 
> -- 
> Dmitry Borodaenko
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-12 Thread Dmitry Borodaenko
Still working on it, watch this space :)

On Tue, Nov 12, 2013 at 3:44 PM, Dinu Vlad  wrote:
> Out of curiosity - can you live-migrate instances with this setup?
>
>
>
> On Nov 12, 2013, at 10:38 PM, Dmitry Borodaenko  
> wrote:
>
>> And to answer my own question, I was missing a meaningful error
>> message: what the ObjectNotFound exception I got from librados didn't
>> tell me was that I didn't have the images keyring file in /etc/ceph/
>> on my compute node. After 'ceph auth get-or-create client.images >
>> /etc/ceph/ceph.client.images.keyring' and reverting images caps back
>> to original state, it all works!
>>
>> On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko
>>  wrote:
>>> I can get ephemeral storage for Nova to work with RBD backend, but I
>>> don't understand why it only works with the admin cephx user? With a
>>> different user starting a VM fails, even if I set its caps to 'allow
>>> *'.
>>>
>>> Here's what I have in nova.conf:
>>> libvirt_images_type=rbd
>>> libvirt_images_rbd_pool=images
>>> rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399
>>> rbd_user=images
>>>
>>> The secret UUID is defined following the same steps as for Cinder and 
>>> Glance:
>>> http://ceph.com/docs/master/rbd/libvirt/
>>>
>>> BTW rbd_user option doesn't seem to be documented anywhere, is that a
>>> documentation bug?
>>>
>>> And here's what 'ceph auth list' tells me about my cephx users:
>>>
>>> client.admin
>>>key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg==
>>>caps: [mds] allow
>>>caps: [mon] allow *
>>>caps: [osd] allow *
>>> client.images
>>>key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g==
>>>caps: [mds] allow
>>>caps: [mon] allow *
>>>caps: [osd] allow *
>>> client.volumes
>>>key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg==
>>>caps: [mon] allow r
>>>caps: [osd] allow class-read object_prefix rbd_children, allow
>>> rwx pool=volumes, allow rx pool=images
>>>
>>> Setting rbd_user to images or volumes doesn't work.
>>>
>>> What am I missing?
>>>
>>> Thanks,
>>>
>>> --
>>> Dmitry Borodaenko
>>
>>
>>
>> --
>> Dmitry Borodaenko
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-12 Thread David Zafman

Since the disk is failing and you have 2 other copies I would take osd.0 down.  
This means that ceph will not attempt to read the bad disk either for clients 
or to make another copy of the data:

* Not sure about the syntax of this for the version of ceph you are running
ceph osd down 0

Mark it “out” which will immediately trigger recovery to create more copies of 
the data with the remaining OSDs.
ceph osd out 0

You can now finish the process of removing the osd by looking at these 
instructions:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

David Zafman
Senior Developer
http://www.inktank.com

On Nov 12, 2013, at 3:16 AM, Mihály Árva-Tóth 
 wrote:

> Hello,
> 
> I have 3 node, with 3 OSD in each node. I'm using .rgw.buckets pool with 3 
> replica. One of my HDD (osd.0) has just bad sectors, when I try to read an 
> object from OSD direct, I get Input/output errror. dmesg:
> 
> [1214525.670065] mpt2sas0: log_info(0x3108): originator(PL), code(0x08), 
> sub_code(0x)
> [1214525.670072] mpt2sas0: log_info(0x3108): originator(PL), code(0x08), 
> sub_code(0x)
> [1214525.670100] sd 0:0:2:0: [sdc] Unhandled sense code
> [1214525.670104] sd 0:0:2:0: [sdc]  
> [1214525.670107] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [1214525.670110] sd 0:0:2:0: [sdc]  
> [1214525.670112] Sense Key : Medium Error [current] 
> [1214525.670117] Info fld=0x60c8f21
> [1214525.670120] sd 0:0:2:0: [sdc]  
> [1214525.670123] Add. Sense: Unrecovered read error
> [1214525.670126] sd 0:0:2:0: [sdc] CDB: 
> [1214525.670128] Read(16): 88 00 00 00 00 00 06 0c 8f 20 00 00 00 08 00 00
> 
> Okay I known need to replace HDD.
> 
> Fragment of ceph -s  output:
>   pgmap v922039: 856 pgs: 855 active+clean, 1 active+clean+inconsistent;
> 
> ceph pg dump | grep inconsistent
> 
> 11.15d  25443   0   0   0   6185091790  30013001
> active+clean+inconsistent   2013-11-06 02:30:45.23416.
> 
> ceph pg map 11.15d
> 
> osdmap e1600 pg 11.15d (11.15d) -> up [0,8,3] acting [0,8,3]
> 
> pg repair or deep-scrub can not fix this issue. But if I understand 
> correctly, osd has to known it can not retrieve object from osd.0 and need to 
> be replicate an another osd because there is no 3 working replicas now.
> 
> Thank you,
> Mihaly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No monitor sockets after upgrading to Emperor

2013-11-12 Thread Joao Eduardo Luis

On 11/12/2013 03:07 PM, Berant Lemmenes wrote:

I just restarted an OSD node and none of the admin sockets showed up on
reboot (though it joined the cluster fine and all OSDs are happy. The
node is a Ubuntu 12.04.3 system originally deployed via ceph-deploy on
dumpling.

The only thing that stands out to me is the failure on lock_fsid and the
error converting store message.

Here are the snip from OSD 19 of a full reboot starting with the
shutdown complete entry, and going until all the reconnect messages.


This looks an awful lot like you started another instance of an OSD with 
the same ID while another was running.  I'll walk you through the log 
lines that point me towards this conclusion.  Would still be weird if 
the admin sockets vanished because of that, so maybe that's a different 
issue.  Are you able to reproduce the admin socket issue often?


Walking through:


2013-11-12 09:44:00.757576 7fb8a8e24780  1 -- 192.168.200.54:6819/23261
 shutdown complete.


Shutdown, check.  OSD restarts.


2013-11-12 09:47:05.843425 7f7918e9d780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 1734
2013-11-12 09:47:05.892704 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:05.892718 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:47:05.944312 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
FIEMAP ioctl is supported and appears to work
2013-11-12 09:47:05.944327 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:05.944743 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:06.258005 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.567405 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size
4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570098 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size
4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570352 7f7918e9d780  1 journal close
/var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:47:07.571215 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:07.572742 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
FIEMAP ioctl is supported and appears to work
2013-11-12 09:47:07.572750 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:07.573234 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:07.574879 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.577043 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size
4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.578649 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size
4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.680531 7f7918e9d780  0 
cls/hello/cls_hello.cc:271: loading cls_hello


OSD is running.  Another instance starts running; next line is the 
relevant bit that shows that.



2013-11-12 09:47:09.670813 7f8151b5f780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769
2013-11-12 09:47:09.673789 7f8151b5f780  0
filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock
/var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11)
Resource temporarily unavailable


This last line tells us that ceph-osd believes another instance is 
running, so you should first find out whether there's actually another 
instance being run somewhere, somehow.  How did you start these daemons?



2013-11-12 09:47:09.673804 7f8151b5f780 -1
filestore(/var/lib/ceph/osd/ceph-19) FileStore::mount: lock_fsid failed
2013-11-12 09:47:09.673919 7f8151b5f780 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-19: (16) Device or resource busy
2013-11-12 09:47:14.169305 7f78fd548700  0 -- 10.200.1.54:6802/1734
 >> 10.200.1.51:6800/13263
 pipe(0x1e48c80 sd=42 :55275 s=2 pgs=5530
cs=1 l=0 c=0x1eae2c0).fault, initiating reconnect
2013-11-12 09:47:14.169444 7f78fd346700  0 -- 10.200.1.54:6802/1734
 >> 10.200.1.57:6804/8226
 pipe(0xc1ed500 sd=

Re: [ceph-users] ceph-deploy: osd creating hung with one ssd disk as shared journal

2013-11-12 Thread Tim Zhang
Hi Michael,
you are right, my system is installed on disk sdc, and sda is the journal
disk to be shared.
This is the output of  partx -v /dev/sda, didn't see anything unusual:
device /dev/sda: start 0 size 117231408
gpt: 2 slices
# 1:  2048-  2099199 (  2097152 sectors,   1073 MB)
# 2:   2099200-  4196351 (  2097152 sectors,   1073 MB)
dos: 0 slices


2013/11/13 Michael 

>  Sorry, just spotted you're mounting on sdc. Can you chuck out a partx -v
> /dev/sda to see if there's anything odd about the data currently on there?
>
> -Michael
>
>
> On 12/11/2013 18:22, Michael wrote:
>
> As long as there's room on the SSD for the partitioner it'll just use the
> conf value for osd journal size to section it up as it adds OSD's (I
> generally use the "ceph-deploy osd create srv:data:journal e.g.
> srv-12:/dev/sdb:/dev/sde" format when adding disks).
> Does it being /dev/sda mean you're putting your journal onto an already
> partitioned and in use by the OS SSD?
>
> -Michael
>
> On 12/11/2013 18:09, Gruher, Joseph R wrote:
>
>  I didn’t think you could specify the journal in this manner (just
> pointing multiple OSDs on the same host all to journal /dev/sda).  Don’t
> you either need to partition the SSD and point each SSD to a separate
> partition, or format and mount the SSD and each OSD will use a unique file
> on the mount?  I’ve always created a separate partition on the SSD for each
> journal.
>
>
>
> Preparing cluster ceph disks ceph0:/dev/sdb:/dev/sda
> ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda ceph0:/dev/sdf:/dev/sda
> ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda
>
>
>
> *From:* ceph-users-boun...@lists.ceph.com [
> mailto:ceph-users-boun...@lists.ceph.com]
> *On Behalf Of *Tim Zhang
> *Sent:* Tuesday, November 12, 2013 2:20 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] ceph-deploy: osd creating hung with one ssd disk
> as shared journal
>
>
>
> Hi guys,
>
> I use ceph-deploy to manage my cluster, but I get failed while creating
> the OSD, the process seems to hang up at creating first osd. By the way,
> SELinux is disabled, and my ceph-disk is patched according to the page:
> http://www.spinics.net/lists/ceph-users/msg03258.html
>
> can you guys give me some advise?
>
> (1) the output of ceph-deploy is:
>
> Invoked (1.3.1): /usr/bin/ceph-deploy osd create ceph0:sdb:sda
> ceph0:sdd:sda ceph0:sde:sda ceph0:sdf:sda ceph0:sdg:sda ceph0:sdh:sda
> ceph1:sdb:sda ceph1:sdd:sda ceph1:sde:sda ceph1:sdf:sda ceph1:sdg:sda
> ceph1:sdh:sda ceph2:sdb:sda ceph2:sdd:sda ceph2:sde:sda ceph2:sdf:sda
> ceph2:sdg:sda ceph2:sdh:sda
>
> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
> ceph0:/dev/sdb:/dev/sda ceph0:/dev/sdd:/dev/sda ceph0:/dev/sde:/dev/sda
> ceph0:/dev/sdf:/dev/sda ceph0:/dev/sdg:/dev/sda ceph0:/dev/sdh:/dev/sda
> ceph1:/dev/sdb:/dev/sda ceph1:/dev/sdd:/dev/sda ceph1:/dev/sde:/dev/sda
> ceph1:/dev/sdf:/dev/sda ceph1:/dev/sdg:/dev/sda ceph1:/dev/sdh:/dev/sda
> ceph2:/dev/sdb:/dev/sda ceph2:/dev/sdd:/dev/sda ceph2:/dev/sde:/dev/sda
> ceph2:/dev/sdf:/dev/sda ceph2:/dev/sdg:/dev/sda ceph2:/dev/sdh:/dev/sda
>
> [ceph0][DEBUG ] connected to host: ceph0
>
> [ceph0][DEBUG ] detect platform information from remote host
>
> [ceph0][DEBUG ] detect machine type
>
> [ceph_deploy.osd][INFO  ] Distro info: CentOS 6.4 Final
>
> [ceph_deploy.osd][DEBUG ] Deploying osd to ceph0
>
> [ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>
> [ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block
> --action=add
>
> [ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdb journal
> /dev/sda activate True
>
> [ceph0][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster
> ceph -- /dev/sdb /dev/sda
>
> [ceph0][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal
> is not the same device as the osd data
>
> [ceph0][ERROR ] Warning: WARNING: the kernel failed to re-read the
> partition table on /dev/sda (Device or resource busy).  As a result, it may
> not reflect all of your changes until after reboot.
>
> [ceph0][ERROR ] BLKPG: Device or resource busy
>
> [ceph0][ERROR ] error adding partition 1
>
> [ceph0][DEBUG ] The operation has completed successfully.
>
> [ceph0][DEBUG ] The operation has completed successfully.
>
> [ceph0][DEBUG ] meta-data=/dev/sdb1  isize=2048   agcount=4,
> agsize=61047597 blks
>
> [ceph0][DEBUG ]  =   sectsz=512   attr=2,
> projid32bit=0
>
> [ceph0][DEBUG ] data =   bsize=4096
> blocks=244190385, imaxpct=25
>
> [ceph0][DEBUG ]  =   sunit=0  swidth=0 blks
>
> [ceph0][DEBUG ] naming   =version 2  bsize=4096   ascii-ci=0
>
> [ceph0][DEBUG ] log  =internal log   bsize=4096
> blocks=119233, version=2
>
> [ceph0][DEBUG ]  =   sectsz=512   sunit=0
> blks, lazy-count=1
>
> [ceph0][DEBUG ] realtime =none   extsz=4096   blocks=0,
> rtextents=0
>
> [ceph0][D

Re: [ceph-users] No monitor sockets after upgrading to Emperor

2013-11-12 Thread Berant Lemmenes
On Tue, Nov 12, 2013 at 7:28 PM, Joao Eduardo Luis wrote:

>
> This looks an awful lot like you started another instance of an OSD with
> the same ID while another was running.  I'll walk you through the log lines
> that point me towards this conclusion.  Would still be weird if the admin
> sockets vanished because of that, so maybe that's a different issue.  Are
> you able to reproduce the admin socket issue often?
>
> Walking through:
>

Thanks for taking the time to walk through these logs, I appreciate the
explanation.

2013-11-12 09:47:09.670813 7f8151b5f780  0 ceph version 0.72
>> (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769
>> 2013-11-12 09:47:09.673789 7f8151b5f780  0
>> filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock
>> /var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11)
>> Resource temporarily unavailable
>>
>
> This last line tells us that ceph-osd believes another instance is
> running, so you should first find out whether there's actually another
> instance being run somewhere, somehow.  How did you start these daemons?
>

That proved to be the crux of it, both upstart and the Sys V init scripts
were trying to start the ceph daemons. Looking in /etc/rc2.d there are
symlinks from S20ceph to ../init.d/ceph

Upstart thought it was controlling things - doing an 'initctl list | grep
ceph' would show the correct PIDs, and 'service ceph status' thought they
were not running.

So that would seem to inidcate that sys V was trying to start it first, and
upstart was the one that had started the instance that generated those logs.

The part that doesn't make sense is that is if the Sys V init script was
starting before upstart, why wouldn't it be the one that was writing to
/var/log/ceph/?

After running 'update-rc.d ceph disable', the admin sockets were present
after a system reboot.

I wonder, was the Sys V init scripts being enabled a ceph-deploy artifact
or a issue with the packages?

Thanks for pointing me in the right direction!

Thanks,
Berant
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions/comments on using ZFS for OSDs

2013-11-12 Thread Yan, Zheng
On Wed, Nov 13, 2013 at 6:43 AM, Eric Eastman  wrote:
> I built Ceph version 0.72 with --with-libzfs on Ubuntu 1304 after installing
> ZFS
> from th ppa:zfs-native/stable repository. The ZFS version is v0.6.2-1
>
> I do have a few questions and comments on Ceph using ZFS backed OSDs
>
> As ceph-deploy does not show support for ZFS, I used the instructions at:
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
> and hand created a new OSD on an existing Ceph system. I guest that I needed
> to build a zpool out of a disk, and then create a ZFS file system that
> mounted to  /var/lib/ceph/osd/ceph-X, where X was the number given when I
> ran the ceph osd create command.  As I am testing on a VM, I created 2 new
> disks, one 2GB (/dev/sde) for journal and one 32GB (/dev/sdd) for data. To
> setup the system for ZFS based OSDs, I first added to all my ceph.conf
> files:
>
>filestore zfs_snap = 1
>journal_aio = 0
>journal_dio = 0

no need to disable journal dio/aio if the journal is not in ZFS.

Regards
Yan, Zheng

>
> I then created the OSD with the commands:
>
> # ceph osd create
> 4
> # parted -s /dev/sdd mklabel gpt mkpart -- -- 1 \-1
> # parted -s /dev/sde mklabel gpt mkpart -- -- 1 \-1
> # zpool create sdd /dev/sdd
> # mkdir /var/lib/ceph/osd/ceph-4
> # zfs create -o mountpoint=/var/lib/ceph/osd/ceph-4 sdd/ceph-4
> # ceph-osd  -i 4 --mkfs --mkkey --osd-journal=/dev/sde1 --mkjournal
> # ceph auth add osd.4 osd 'allow *' mon 'allow rwx' -i
> /var/lib/ceph/osd/ceph-4/keyring
>
> I then decompiled the crush map, added osd.4, and recompiled the map, and
> set Ceph to use the new crush map.
>
> When I started the osd.4 with:
>
> # start ceph-osd id=4
>
> It failed to start, as the ceph osd log file indicated the journal was
> missing:
>  mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (2) No
> such file or directory
>
> So I manually created a link named journal to /dev/sde1 and created the
> journal_uuid file.  Should ceph-osd have done this step?  Is there anything
> else I may of missed?
>
> With limited testing, the ZFS backed OSD seems to function correctly.
>
> I was wondering if there are any ZFS file system options that should be set
> for better performance or data safety.
>
> It would be nice if ceph-deploy would handle ZFS.
>
> Lastly, I want to thank Yan, Zheng and all the rest who worked on this
> project.
>
> Eric
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to load load custom methods (rados_exec)

2013-11-12 Thread
Hi all!
  I try to use the rados_exec methods, it allows librados users to call the 
custom methods !
my ceph version is 0.62。 It is worked for the class cls_rbd,  for it is alerdy 
build and load into the ceph class(/usr/local/lib/rados-class). but I do not 
konw how to build and load a custom methods.

for example, cls_crypto.cc ,which lay in ceph/src/ , have not bulid and 
load into the ceph。 how could I use the rados_exec call this method?

   the loadclass.sh , which I download form github, can load the methods in to 
the ceph !and how to use it ! 
#./loadclass.sh ceph-0.62/src/cls_crypto.cc
  mn: ceph-0.62/src/cls_crypto.cc : File Format mot recngced!

 Any points would be much appreciated!!!


thinks ,
peng







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com