Re: [ceph-users] Ceph for "home lab" / hobbyist use?

2019-09-10 Thread Hector Martin
I run Ceph on both a home server and a personal offsite backup server 
(both single-host setups). It's definitely feasible and comes with a lot 
of advantages over traditional RAID and ZFS and the like. The main 
disadvantages are performance overhead and resource consumption.


On 07/09/2019 06.16, William Ferrell wrote:

They're about $50 each, can boot from MicroSD or eMMC flash (basically
an SSD with a custom connector), and have one SATA port. They have
8-core 32-bit CPUs, 2GB of RAM and a gigabit ethernet port. Four of
them (including disks) can run off a single 12V/8A power adapter
(basically 100 watts per set of 4). The obvious appeal is price, plus
they're stackable so they'd be easy to hide away in a closet.

Is it feasible for these to work as OSDs at all? The Ceph hardware
recommendations page suggests OSDs need 1GB per TB of space, so does
this mean these wouldn't be suitable with, say, a 4TB or 8TB disk? Or
would they work, but just more slowly?


2GB seems tight, but *should* work if you're literally running an OSD 
and only an OSD on the thing.


I use a 32GB RAM server to run 64TB worth of raw storage (8x8TB SATA 
disks), plus mon/mds, plus a bunch of unrelated applications and 
servers, routing, and even a desktop UI (I'll soon be splitting off some 
of these duties, since this box has grown into way too much of a kitchen 
sink). It used to be 16GB when I first moved to Ceph, and that juuust 
about worked but it was tight. So the 1GB/1TB recommendation is ample, 
1GB/2TB works well, and 1GB/4TB is tight.


I configure my OSDs for a 1.7GB memory target, so that should work on 
your 2GB RAM board, but it doesn't give you much headroom for increased 
consumption during recovery. To be safe I'd set them up for 1.2GB or so 
target on a board like yours.


I would recommend keeping your PG count low and relying on the Ceph 
balancer to even out your disk usage. Keep in mind that the true number 
of PGs is your PG count multiplied by your pool width: that's your 
replica count for replicated pools, or your erasure code k+m width for 
EC pools. I use 8 PGs for my metadata pool (x3 replication = 24 PGs, 3 
per OSD) and 64 PGs for my data pool (x 5.2 RS profile = 448 PG, 56 per 
OSD). All my data is on CephFS on my home setup.


If you use EC and since your memory usage is tight, you might want to 
drop that a bit, e.g. target something like 24 PGs per OSD. For 
utilization balance, as long as your overall PG count (multiplied by 
pool width) is a multiple of your OSD count, you should be able to 
achieve a "perfect" distribution when using the balancer (otherwise 
you'll be off by +/- 1 PG, where more PGs makes that a smaller imbalance).



Pushing my luck further (assuming the HC2 can handle OSD duties at
all), is that enough muscle to run the monitor and/or metadata
servers? Should monitors and MDS's be run separately, or can/should
they piggyback on hosts running OSDs?


The MDS will need RAM to cache your metadata (and you want it to, 
because it makes performance *way* better). I would definitely keep it 
well away from such tight OSDs. In fact the MDS in my setup uses more 
RAM than any given OSD, about 2GB or so (the cache size is also 
configurable). If you have fewer large files then resource consumption 
will be lower (I have a mixed workload with most of the space taken up 
by large files, but I have a few trees full of tiny ones, totaling 
several million files). You might get away with dedicating a separate 
identical board to the MDS. Maybe consider multi-mds, but I've found 
that during recovery it adds some phases that eat up even more RAM, so 
that might not be a good idea.


The mon isn't as bad, but I wouldn't push it and try to co-host it on 
such limited hosts. Mine's sitting at about ~1GB res.


In either case, keep some larger x86 box that you have the software 
installed on and can plug in some disks into around. In the worst case, 
if you end up in an out-of-memory loop, you can move e.g. the MDS or 
some OSDs to that machine and bring the cluster back to health.



I'd be perfectly happy with a setup like this even if it could only
achieve speeds in the 20-30MB/sec range.


I can't speak for small ARM boards, but my performance (on a quad-core 
Haswell i5 hosting everything) is somewhere around half of what I'd get 
on a RAID6 over the same storage, using ~equivalent RS 5.2 encoding and 
dm-crypt (AES-NI) under the OSDs. Since you'd be running a single OSD 
per host, I imagine you should be able to get reasonable aggregate 
performance out of the whole thing, but I've never tried a setup like that.


I'm actually considering this kind of thing in the future (moving from 
one monolithic server to a more cluster-like setup) but it's just an 
idea for now.


--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-cep

[ceph-users] ceph fs with backtrace damage

2019-09-10 Thread Fyodor Ustinov
Hi!

After MDS scrub I got error:
1 MDSs report damaged metadata

#ceph tell mds.0 damage ls 

[
{
"damage_type": "backtrace",
"id": 712325338,
"ino": 1099526730308,
"path": "/erant/smb/public/docs/3. Zvity/1. Prodazhi/~$Data-center 
2019.08.xlsx"
},
{
"damage_type": "backtrace",
"id": 1513472891,
"ino": 1099526730309,
"path": "/erant/smb/public/docs/3. Zvity/2. 
Menedzhery/2019/2019_09/Berezovskyh/Berezovskyh_2019_09_10.xlsx"
}
]

What is a "backtrace" damage?
What caused it?
How to fix it?

Help me, please!

WBR,
Fyodor.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] regurlary 'no space left on device' when deleting on cephfs

2019-09-10 Thread Kenneth Waegeman

Hi Paul, all,

Thanks! But I don't seem to find how to debug the purge queue. When I 
check the purge queue, I get these numbers:


[root@mds02 ~]# ceph daemon mds.mds02 perf dump | grep -E 'purge|pq'
    "purge_queue": {
    "pq_executing_ops": 0,
    "pq_executing": 0,
    "pq_executed": 469026

[root@mds03 ~]# ceph daemon mds.mds03 perf dump | grep -E 'purge|pq'
    "purge_queue": {
    "pq_executing_ops": 0,
    "pq_executing": 0,
    "pq_executed": 0

But even after more than 10 minutes these numbers are still the same, 
but I'm not yet able to delete anything.


What bothers me most is that while I can't delete anything, ceph cluster 
is still healthy - no warnings, and there is nothing in the mds logs.. 
(running ceph 13.2.6)


Thanks again!

Kenneth

On 06/09/2019 16:21, Paul Emmerich wrote:

Yeah, no ENOSPC error code on deletion is a little bit unintuitive,
but what it means is: the purge queue is full.
You've already told the MDS to purge faster.

Not sure how to tell it to increase the maximum backlog for
deletes/purges, though, but you should be able to find something with
the search term "purge queue". :)


Paul


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
Hello Mike,

Am 03.09.19 um 04:41 schrieb Mike Christie:
> On 09/02/2019 06:20 AM, Marc Schöchlin wrote:
>> Hello Mike,
>>
>> i am having a quick look  to this on vacation because my coworker
>> reports daily and continuous crashes ;-)
>> Any updates here (i am aware that this is not very easy to fix)?
> I am still working on it. It basically requires rbd-nbd to be written so
> it preallocates its memory used for IO, and when it can't like when
> doing network IO it requires adding a interface to tell the kernel to
> not use allocation flags that can cause disk IO back on to the device.
>
> There are some workraounds like adding more memory and setting the vm
> values. For the latter, if it seems if you set:
>
> vm.dirty_background_ratio = 0 then it looks like it avoids the problem
> because the kernel will immediately start to write dirty pages from the
> background worker threads, so we do not end up later needing to write
> out pages from the rbd-nbd thread to free up memory.

Sigh, I set this yesterday on my system ("sysctl vm.dirty_background_ratio=0") 
and got an additional crash this night :-(

I now restarted the system and invoked all of the following commands mentioned 
by your last mail:

sysctl vm.dirty_background_ratio=0
sysctl vm.dirty_ratio=0
sysctl vm.vfs_cache_pressure=0

Let's see if that helps

Regards

Marc


Am 03.09.19 um 04:41 schrieb Mike Christie:
> On 09/02/2019 06:20 AM, Marc Schöchlin wrote:
>> Hello Mike,
>>
>> i am having a quick look  to this on vacation because my coworker
>> reports daily and continuous crashes ;-)
>> Any updates here (i am aware that this is not very easy to fix)?
> I am still working on it. It basically requires rbd-nbd to be written so
> it preallocates its memory used for IO, and when it can't like when
> doing network IO it requires adding a interface to tell the kernel to
> not use allocation flags that can cause disk IO back on to the device.
>
> There are some workraounds like adding more memory and setting the vm
> values. For the latter, if it seems if you set:
>
> vm.dirty_background_ratio = 0 then it looks like it avoids the problem
> because the kernel will immediately start to write dirty pages from the
> background worker threads, so we do not end up later needing to write
> out pages from the rbd-nbd thread to free up memory.
>
> or
>
> vm.dirty_ratio = 0 then it looks like it avoids the problem because the
> kernel will just write out the data right away similar to above, but
> from its normally going to be written out from the thread that you are
> running your test from.
>
> and this seems optional and can result in other problems:
>
> vm.vfs_cache_pressure = 0 then for at least XFS it looks like we avoid
> one of the immediate problems where allocations would always cause the
> inode caches to be reclaimed and that memory to be written out to the
> device. For EXT4, I did not see a similar issue.
>
>> I think the severity of this problem
>>  (currently "minor") is not
>> suitable to the consequences of this problem.
>>
>> This reproducible problem can cause:
>>
>>   * random service outage
>>   * data corruption
>>   * long recovery procedures on huge filesystems
>>
>> Is it adequate to increase the severity to major or critical?
>>
>> What might the reason for a very reliable rbd-nbd running on my xen
>> servers as storage repository?
>> (see https://github.com/vico-research-and-consulting/RBDSR/tree/v2.0 -
>> hundreds of devices, high workload)
>>
>> Regards
>> Marc
>>
>> Am 15.08.19 um 20:07 schrieb Marc Schöchlin:
>>> Hello Mike,
>>>
>>> Am 15.08.19 um 19:57 schrieb Mike Christie:
> Don't waste your time. I found a way to replicate it now.
>
 Just a quick update.

 Looks like we are trying to allocate memory in the IO path in a way that
 can swing back on us, so we can end up locking up. You are probably not
 hitting this with krbd in your setup because normally it's preallocating
 structs, using flags like GFP_NOIO, etc. For rbd-nbd, we cannot
 preallocate some structs and cannot control the allocation flags for
 some operations initiated from userspace, so its possible to hit this
 every IO. I can replicate this now in a second just doing a cp -r.

 It's not going to be a simple fix. We have had a similar issue for
 storage daemons like iscsid and multipathd since they were created. It's
 less likey to hit with them because you only hit the paths they cannot
 control memory allocation behavior during recovery.

 I am looking into some things now.
>>> Great to hear, that the problem is now identified.
>>>
>>> As described I'm on vacation -  if you need anything after the 8.9. we can 
>>> probably invest some time to test upcoming fixes.
>>>
>>> Regards
>>> Marc
>>>
>>>
>> -- 
>> GPG encryption available: 0x670DCBEC/pool.sks-keyservers.net
>>

___
ceph-users mailing list
ceph-u

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
Hello Mike,

as described i set all the settings.

Unfortunately it crashed also with these settings :-(

Regards
Marc

[Tue Sep 10 12:25:56 2019] Btrfs loaded, crc32c=crc32c-intel
[Tue Sep 10 12:25:57 2019] EXT4-fs (dm-0): mounted filesystem with ordered data 
mode. Opts: (null)
[Tue Sep 10 12:25:59 2019] systemd[1]: systemd 237 running in system mode. 
(+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 
default-hierarchy=hybrid)
[Tue Sep 10 12:25:59 2019] systemd[1]: Detected virtualization xen.
[Tue Sep 10 12:25:59 2019] systemd[1]: Detected architecture x86-64.
[Tue Sep 10 12:25:59 2019] systemd[1]: Set hostname to .
[Tue Sep 10 12:26:01 2019] systemd[1]: Started ntp-systemd-netif.path.
[Tue Sep 10 12:26:01 2019] systemd[1]: Created slice System Slice.
[Tue Sep 10 12:26:01 2019] systemd[1]: Listening on udev Kernel Socket.
[Tue Sep 10 12:26:01 2019] systemd[1]: Created slice 
system-serial\x2dgetty.slice.
[Tue Sep 10 12:26:01 2019] systemd[1]: Listening on Journal Socket.
[Tue Sep 10 12:26:01 2019] systemd[1]: Mounting POSIX Message Queue File 
System...
[Tue Sep 10 12:26:01 2019] RPC: Registered named UNIX socket transport module.
[Tue Sep 10 12:26:01 2019] RPC: Registered udp transport module.
[Tue Sep 10 12:26:01 2019] RPC: Registered tcp transport module.
[Tue Sep 10 12:26:01 2019] RPC: Registered tcp NFSv4.1 backchannel transport 
module.
[Tue Sep 10 12:26:01 2019] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
[Tue Sep 10 12:26:01 2019] Loading iSCSI transport class v2.0-870.
[Tue Sep 10 12:26:01 2019] iscsi: registered transport (tcp)
[Tue Sep 10 12:26:01 2019] systemd-journald[497]: Received request to flush 
runtime journal from PID 1
[Tue Sep 10 12:26:01 2019] Installing knfsd (copyright (C) 1996 
o...@monad.swb.de).
[Tue Sep 10 12:26:01 2019] iscsi: registered transport (iser)
[Tue Sep 10 12:26:01 2019] systemd-journald[497]: File 
/var/log/journal/cef15a6d1b80c9fbcb31a3a65aec21ad/system.journal corrupted or 
uncleanly shut down, renaming and replacing.
[Tue Sep 10 12:26:04 2019] EXT4-fs (dm-1): mounted filesystem with ordered data 
mode. Opts: (null)
[Tue Sep 10 12:26:05 2019] EXT4-fs (xvda1): mounted filesystem with ordered 
data mode. Opts: (null)
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.659:2): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="/usr/bin/lxc-start" pid=902 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:3): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="/usr/bin/man" pid=904 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:4): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="man_filter" pid=904 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:5): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="man_groff" pid=904 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:6): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="lxc-container-default" pid=900 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:7): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="lxc-container-default-cgns" pid=900 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:8): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="lxc-container-default-with-mounting" pid=900 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:9): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="lxc-container-default-with-nesting" pid=900 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:10): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="/usr/lib/snapd/snap-confine" pid=905 comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:11): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=905 
comm="apparmor_parser"
[Tue Sep 10 12:26:06 2019] new mount options do not match the existing 
superblock, will be ignored
[Tue Sep 10 12:26:09 2019] SGI XFS with ACLs, security attributes, realtime, no 
debug enabled
[Tue Sep 10 12:26:09 2019] XFS (nbd0): Mounting V5 Filesystem
[Tue Sep 10 12:26:11 2019] XFS (nbd0): Starting recovery (logdev: internal)
[Tue Sep 10 12:26:12 2019] XFS (nbd0): Ending recovery (logdev: internal)
[Tue Sep 10 12:26:12 2019] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 
state recovery directory
[Tue Sep 10 12:26:12 2019] NFSD: starting 90-second grace period (net f0f8)
[Tue Sep 10 14:45:04 2019] block nbd0: Connection timed out
[Tue Sep 10 14:45:04 2019] block nbd0: shu

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 9:46 AM Marc Schöchlin  wrote:
>
> Hello Mike,
>
> as described i set all the settings.
>
> Unfortunately it crashed also with these settings :-(
>
> Regards
> Marc
>
> [Tue Sep 10 12:25:56 2019] Btrfs loaded, crc32c=crc32c-intel
> [Tue Sep 10 12:25:57 2019] EXT4-fs (dm-0): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:25:59 2019] systemd[1]: systemd 237 running in system mode. 
> (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
> +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN 
> -PCRE2 default-hierarchy=hybrid)
> [Tue Sep 10 12:25:59 2019] systemd[1]: Detected virtualization xen.
> [Tue Sep 10 12:25:59 2019] systemd[1]: Detected architecture x86-64.
> [Tue Sep 10 12:25:59 2019] systemd[1]: Set hostname to .
> [Tue Sep 10 12:26:01 2019] systemd[1]: Started ntp-systemd-netif.path.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Created slice System Slice.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Listening on udev Kernel Socket.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Created slice 
> system-serial\x2dgetty.slice.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Listening on Journal Socket.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Mounting POSIX Message Queue File 
> System...
> [Tue Sep 10 12:26:01 2019] RPC: Registered named UNIX socket transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered udp transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered tcp transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered tcp NFSv4.1 backchannel transport 
> module.
> [Tue Sep 10 12:26:01 2019] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
> [Tue Sep 10 12:26:01 2019] Loading iSCSI transport class v2.0-870.
> [Tue Sep 10 12:26:01 2019] iscsi: registered transport (tcp)
> [Tue Sep 10 12:26:01 2019] systemd-journald[497]: Received request to flush 
> runtime journal from PID 1
> [Tue Sep 10 12:26:01 2019] Installing knfsd (copyright (C) 1996 
> o...@monad.swb.de).
> [Tue Sep 10 12:26:01 2019] iscsi: registered transport (iser)
> [Tue Sep 10 12:26:01 2019] systemd-journald[497]: File 
> /var/log/journal/cef15a6d1b80c9fbcb31a3a65aec21ad/system.journal corrupted or 
> uncleanly shut down, renaming and replacing.
> [Tue Sep 10 12:26:04 2019] EXT4-fs (dm-1): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:26:05 2019] EXT4-fs (xvda1): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.659:2): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/bin/lxc-start" pid=902 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:3): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/bin/man" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:4): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="man_filter" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:5): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="man_groff" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:6): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:7): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-cgns" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:8): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-with-mounting" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:9): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-with-nesting" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:10): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/lib/snapd/snap-confine" pid=905 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:11): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=905 
> comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] new mount options do not match the existing 
> superblock, will be ignored
> [Tue Sep 10 12:26:09 2019] SGI XFS with ACLs, security attributes, realtime, 
> no debug enabled
> [Tue Sep 10 12:26:09 2019] XFS (nbd0): Mounting V5 Filesystem
> [Tue Sep 10 12:26:11 2019] XFS (nbd0): Starting recovery (logdev: internal)
> [Tue Sep 10 12:26:12 2019] XFS (nbd0): Ending recovery (logdev: internal)
> [Tue Sep 10 12:26:12 2019] NFSD: Using /var/lib/nfs/v4recov

[ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Oliver Freyermuth

Dear Cephalopodians,

I have two questions about RBD mirroring.

1) I can not get it to work - my setup is:

   - One cluster holding the live RBD volumes and snapshots, in pool "rbd", cluster name 
"ceph",
 running latest Mimic.
 I ran "rbd mirror pool enable rbd pool" on that cluster and created a cephx user 
"rbd_mirror" with (is there a better way?):
 ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
class-read object_prefix rbd_children, allow pool rbd r' -o 
ceph.client.rbd_mirror.keyring --cluster ceph
 In that pool, two images have the journaling feature activated, all others 
have it disabled still (so I would expect these two to be mirrored).
 
   - Another (empty) cluster running latest Nautilus, cluster name "ceph", pool "rbd".

 I've used the dashboard to activate mirroring for the RBD pool, and then added a peer with 
cluster name "ceph-virt", cephx-ID "rbd_mirror", filled in the mons and key 
created above.
 I've then run:
 ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 'allow 
class-read object_prefix rbd_children, allow pool rbd rwx' -o 
client.rbd_mirror_backup.keyring --cluster ceph
 and deployed that key on the rbd-mirror machine, and started the service 
with:
 systemctl start ceph-rbd-mirror@rbd_mirror_backup.service

  After this, everything looks fine:
   # rbd mirror pool info
 Mode: pool
 Peers:
  UUID NAME  CLIENT
  XXX  ceph-virt client.rbd_mirror

  The service also seems to start fine, but logs show (debug rbd_mirror=20):

  rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
  rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
  rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting failed 
pool replayer for uuid: XXX cluster: ceph-virt client: client.rbd_mirror
  rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
XXX cluster: ceph-virt client: client.rbd_mirror
  rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting to 
remote peer uuid: XXX cluster: ceph-virt client: client.rbd_mirror: 
(95) Operation not supported
  rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: pool_id=2, 
callout_id=2, callout_level=error, text=unable to connect to remote cluster

I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
cluster with the live images) on the rbd-mirror machine explicitly (i.e. not 
only in mon config storage),
and after doing that:
 rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
works fine. So it's not a connectivity issue. Maybe a permission issue? Or did 
I miss something?

Any idea what "operation not supported" means?
It's unclear to me whether things should work well using Mimic with Nautilus, 
and enabling pool mirroring but only having journaling on for two images is a 
supported case.

2) Since there is a performance drawback (about 2x) for journaling, is it also 
possible to only mirror snapshots, and leave the live volumes alone?
   This would cover the common backup usecase before deferred mirroring is 
implemented (or is it there already?).

Cheers and thanks in advance,
Oliver



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
 wrote:
>
> Dear Cephalopodians,
>
> I have two questions about RBD mirroring.
>
> 1) I can not get it to work - my setup is:
>
> - One cluster holding the live RBD volumes and snapshots, in pool "rbd", 
> cluster name "ceph",
>   running latest Mimic.
>   I ran "rbd mirror pool enable rbd pool" on that cluster and created a 
> cephx user "rbd_mirror" with (is there a better way?):
>   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
> class-read object_prefix rbd_children, allow pool rbd r' -o 
> ceph.client.rbd_mirror.keyring --cluster ceph
>   In that pool, two images have the journaling feature activated, all 
> others have it disabled still (so I would expect these two to be mirrored).

You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
but you definitely need more than read-only permissions to the remote
cluster since it needs to be able to create snapshots of remote images
and update/trim the image journals.

> - Another (empty) cluster running latest Nautilus, cluster name "ceph", 
> pool "rbd".
>   I've used the dashboard to activate mirroring for the RBD pool, and 
> then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
> filled in the mons and key created above.
>   I've then run:
>   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
> 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
> client.rbd_mirror_backup.keyring --cluster ceph
>   and deployed that key on the rbd-mirror machine, and started the 
> service with:

Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].

>   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
>
>After this, everything looks fine:
> # rbd mirror pool info
>   Mode: pool
>   Peers:
>UUID NAME  CLIENT
>XXX  ceph-virt client.rbd_mirror
>
>The service also seems to start fine, but logs show (debug rbd_mirror=20):
>
>rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
> retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
> failed pool replayer for uuid: XXX cluster: ceph-virt client: 
> client.rbd_mirror
>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
> XXX cluster: ceph-virt client: client.rbd_mirror
>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting to 
> remote peer uuid: XXX cluster: ceph-virt client: client.rbd_mirror: 
> (95) Operation not supported
>rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
> remote cluster

If it's still broken after fixing your caps above, perhaps increase
debugging for "rados", "monc", "auth", and "ms" to see if you can
determine the source of the op not supported error.

> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
> cluster with the live images) on the rbd-mirror machine explicitly (i.e. not 
> only in mon config storage),
> and after doing that:
>   rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> works fine. So it's not a connectivity issue. Maybe a permission issue? Or 
> did I miss something?
>
> Any idea what "operation not supported" means?
> It's unclear to me whether things should work well using Mimic with Nautilus, 
> and enabling pool mirroring but only having journaling on for two images is a 
> supported case.

Yes and yes.

> 2) Since there is a performance drawback (about 2x) for journaling, is it 
> also possible to only mirror snapshots, and leave the live volumes alone?
> This would cover the common backup usecase before deferred mirroring is 
> implemented (or is it there already?).

This is in-development right now and will hopefully land for the
Octopus release.

> Cheers and thanks in advance,
> Oliver
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon

--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Oliver Freyermuth
Dear Jason,

On 2019-09-10 18:50, Jason Dillaman wrote:
> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
>  wrote:
>>
>> Dear Cephalopodians,
>>
>> I have two questions about RBD mirroring.
>>
>> 1) I can not get it to work - my setup is:
>>
>> - One cluster holding the live RBD volumes and snapshots, in pool "rbd", 
>> cluster name "ceph",
>>   running latest Mimic.
>>   I ran "rbd mirror pool enable rbd pool" on that cluster and created a 
>> cephx user "rbd_mirror" with (is there a better way?):
>>   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
>> class-read object_prefix rbd_children, allow pool rbd r' -o 
>> ceph.client.rbd_mirror.keyring --cluster ceph
>>   In that pool, two images have the journaling feature activated, all 
>> others have it disabled still (so I would expect these two to be mirrored).
> 
> You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
> but you definitely need more than read-only permissions to the remote
> cluster since it needs to be able to create snapshots of remote images
> and update/trim the image journals.

these profiles really make life a lot easier. I should have thought of them 
rather than "guessing" a potentially good configuration... 

> 
>> - Another (empty) cluster running latest Nautilus, cluster name "ceph", 
>> pool "rbd".
>>   I've used the dashboard to activate mirroring for the RBD pool, and 
>> then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
>> filled in the mons and key created above.
>>   I've then run:
>>   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
>> 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
>> client.rbd_mirror_backup.keyring --cluster ceph
>>   and deployed that key on the rbd-mirror machine, and started the 
>> service with:
> 
> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].

That did the trick (in combination with the above)! 
Again a case of PEBKAC: I should have read the documentation until the end, 
clearly my fault. 

It works well now, even though it seems to run a bit slow (~35 MB/s for the 
initial sync when everything is 1 GBit/s), 
but that may also be caused by combination of some very limited hardware on the 
receiving end (which will be scaled up in the future). 
A single host with 6 disks, replica 3 and a RAID controller which can only do 
RAID0 and not JBOD is certainly not ideal, so commit latency may cause this 
slow bandwidth. 

> 
>>   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
>>
>>After this, everything looks fine:
>> # rbd mirror pool info
>>   Mode: pool
>>   Peers:
>>UUID NAME  CLIENT
>>XXX  ceph-virt client.rbd_mirror
>>
>>The service also seems to start fine, but logs show (debug rbd_mirror=20):
>>
>>rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
>> retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
>>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
>>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
>> failed pool replayer for uuid: XXX cluster: ceph-virt client: 
>> client.rbd_mirror
>>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
>> XXX cluster: ceph-virt client: client.rbd_mirror
>>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting to 
>> remote peer uuid: XXX cluster: ceph-virt client: client.rbd_mirror: 
>> (95) Operation not supported
>>rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
>> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
>> remote cluster
> 
> If it's still broken after fixing your caps above, perhaps increase
> debugging for "rados", "monc", "auth", and "ms" to see if you can
> determine the source of the op not supported error.
> 
>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
>> cluster with the live images) on the rbd-mirror machine explicitly (i.e. not 
>> only in mon config storage),
>> and after doing that:
>>   rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
>> works fine. So it's not a connectivity issue. Maybe a permission issue? Or 
>> did I miss something?
>>
>> Any idea what "operation not supported" means?
>> It's unclear to me whether things should work well using Mimic with 
>> Nautilus, and enabling pool mirroring but only having journaling on for two 
>> images is a supported case.
> 
> Yes and yes.
> 
>> 2) Since there is a performance drawback (about 2x) for journaling, is it 
>> also possible to only mirror snapshots, and leave the live volumes alone?
>> This would cover the common backup usecase before deferred mirroring is 
>> implemented (or is it there already?).
> 
> This is in-development right now and will hopef

Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
 wrote:
>
> Dear Jason,
>
> On 2019-09-10 18:50, Jason Dillaman wrote:
> > On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >  wrote:
> >>
> >> Dear Cephalopodians,
> >>
> >> I have two questions about RBD mirroring.
> >>
> >> 1) I can not get it to work - my setup is:
> >>
> >> - One cluster holding the live RBD volumes and snapshots, in pool 
> >> "rbd", cluster name "ceph",
> >>   running latest Mimic.
> >>   I ran "rbd mirror pool enable rbd pool" on that cluster and created 
> >> a cephx user "rbd_mirror" with (is there a better way?):
> >>   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
> >> class-read object_prefix rbd_children, allow pool rbd r' -o 
> >> ceph.client.rbd_mirror.keyring --cluster ceph
> >>   In that pool, two images have the journaling feature activated, all 
> >> others have it disabled still (so I would expect these two to be mirrored).
> >
> > You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
> > but you definitely need more than read-only permissions to the remote
> > cluster since it needs to be able to create snapshots of remote images
> > and update/trim the image journals.
>
> these profiles really make life a lot easier. I should have thought of them 
> rather than "guessing" a potentially good configuration...
>
> >
> >> - Another (empty) cluster running latest Nautilus, cluster name 
> >> "ceph", pool "rbd".
> >>   I've used the dashboard to activate mirroring for the RBD pool, and 
> >> then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
> >> filled in the mons and key created above.
> >>   I've then run:
> >>   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
> >> 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
> >> client.rbd_mirror_backup.keyring --cluster ceph
> >>   and deployed that key on the rbd-mirror machine, and started the 
> >> service with:
> >
> > Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].
>
> That did the trick (in combination with the above)!
> Again a case of PEBKAC: I should have read the documentation until the end, 
> clearly my fault.
>
> It works well now, even though it seems to run a bit slow (~35 MB/s for the 
> initial sync when everything is 1 GBit/s),
> but that may also be caused by combination of some very limited hardware on 
> the receiving end (which will be scaled up in the future).
> A single host with 6 disks, replica 3 and a RAID controller which can only do 
> RAID0 and not JBOD is certainly not ideal, so commit latency may cause this 
> slow bandwidth.

You could try increasing "rbd_concurrent_management_ops" from the
default of 10 ops to something higher to attempt to account for the
latency. However, I wouldn't expect near-line speed w/ RBD mirroring.

> >
> >>   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
> >>
> >>After this, everything looks fine:
> >> # rbd mirror pool info
> >>   Mode: pool
> >>   Peers:
> >>UUID NAME  CLIENT
> >>XXX  ceph-virt client.rbd_mirror
> >>
> >>The service also seems to start fine, but logs show (debug 
> >> rbd_mirror=20):
> >>
> >>rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
> >> retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
> >>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
> >>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
> >> failed pool replayer for uuid: XXX cluster: ceph-virt client: 
> >> client.rbd_mirror
> >>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
> >> XXX cluster: ceph-virt client: client.rbd_mirror
> >>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting 
> >> to remote peer uuid: XXX cluster: ceph-virt client: 
> >> client.rbd_mirror: (95) Operation not supported
> >>rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
> >> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
> >> remote cluster
> >
> > If it's still broken after fixing your caps above, perhaps increase
> > debugging for "rados", "monc", "auth", and "ms" to see if you can
> > determine the source of the op not supported error.
> >
> >> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
> >> cluster with the live images) on the rbd-mirror machine explicitly (i.e. 
> >> not only in mon config storage),
> >> and after doing that:
> >>   rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> >> works fine. So it's not a connectivity issue. Maybe a permission issue? Or 
> >> did I miss something?
> >>
> >> Any idea what "operation not supported" means?
> >> It's unclear to me whether things should work well using Mimic with 
> >> Nautil

Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Oliver Freyermuth
Dear Jason,

On 2019-09-10 23:04, Jason Dillaman wrote:
> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
>  wrote:
>>
>> Dear Jason,
>>
>> On 2019-09-10 18:50, Jason Dillaman wrote:
>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
>>>  wrote:

 Dear Cephalopodians,

 I have two questions about RBD mirroring.

 1) I can not get it to work - my setup is:

 - One cluster holding the live RBD volumes and snapshots, in pool 
 "rbd", cluster name "ceph",
   running latest Mimic.
   I ran "rbd mirror pool enable rbd pool" on that cluster and created 
 a cephx user "rbd_mirror" with (is there a better way?):
   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
 class-read object_prefix rbd_children, allow pool rbd r' -o 
 ceph.client.rbd_mirror.keyring --cluster ceph
   In that pool, two images have the journaling feature activated, all 
 others have it disabled still (so I would expect these two to be mirrored).
>>>
>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
>>> but you definitely need more than read-only permissions to the remote
>>> cluster since it needs to be able to create snapshots of remote images
>>> and update/trim the image journals.
>>
>> these profiles really make life a lot easier. I should have thought of them 
>> rather than "guessing" a potentially good configuration...
>>
>>>
 - Another (empty) cluster running latest Nautilus, cluster name 
 "ceph", pool "rbd".
   I've used the dashboard to activate mirroring for the RBD pool, and 
 then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
 filled in the mons and key created above.
   I've then run:
   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
 client.rbd_mirror_backup.keyring --cluster ceph
   and deployed that key on the rbd-mirror machine, and started the 
 service with:
>>>
>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].
>>
>> That did the trick (in combination with the above)!
>> Again a case of PEBKAC: I should have read the documentation until the end, 
>> clearly my fault.
>>
>> It works well now, even though it seems to run a bit slow (~35 MB/s for the 
>> initial sync when everything is 1 GBit/s),
>> but that may also be caused by combination of some very limited hardware on 
>> the receiving end (which will be scaled up in the future).
>> A single host with 6 disks, replica 3 and a RAID controller which can only 
>> do RAID0 and not JBOD is certainly not ideal, so commit latency may cause 
>> this slow bandwidth.
> 
> You could try increasing "rbd_concurrent_management_ops" from the
> default of 10 ops to something higher to attempt to account for the
> latency. However, I wouldn't expect near-line speed w/ RBD mirroring.

Thanks - I will play with this option once we have more storage available in 
the target pool ;-). 

> 
>>>
   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service

After this, everything looks fine:
 # rbd mirror pool info
   Mode: pool
   Peers:
UUID NAME  CLIENT
XXX  ceph-virt client.rbd_mirror

The service also seems to start fine, but logs show (debug 
 rbd_mirror=20):

rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
 retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
 failed pool replayer for uuid: XXX cluster: ceph-virt client: 
 client.rbd_mirror
rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
 XXX cluster: ceph-virt client: client.rbd_mirror
rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting 
 to remote peer uuid: XXX cluster: ceph-virt client: 
 client.rbd_mirror: (95) Operation not supported
rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
 pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
 remote cluster
>>>
>>> If it's still broken after fixing your caps above, perhaps increase
>>> debugging for "rados", "monc", "auth", and "ms" to see if you can
>>> determine the source of the op not supported error.
>>>
 I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
 cluster with the live images) on the rbd-mirror machine explicitly (i.e. 
 not only in mon config storage),
 and after doing that:
   rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
 works fine. So it's not a connectivity issue. Maybe a

[ceph-users] Using same name for rgw / beast web front end

2019-09-10 Thread Eric Choi
Hi there, we have been using ceph for a few years now, it's only now that
I've noticed we have been using the same name for all RGW hosts, resulting
when you run ceph -s:

rgw: 1 daemon active (..)

despite having more than 10 RGW hosts.

* What are the side effects of doing this? Is this a no-no? I can see the
metrics can (ceph daemon ... perf dump) be wrong, are the metrics kept
track independently (per host)?

* My second question (maybe this should be a separate email!) is that
comparison between Beast vs Civetweb.  We only recently upgraded to
Nautilus, so Beast became available as an option to us.  I couldn't find
any blogs / docs on comparing these 2 frontends.  Is there any recommended
reading or could someone give me an overview?

Much appreciated!



-- 

Eric Choi
Senior Software Engineer 2 | Core Platform
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com