[ceph-users] Backup & Restore?

2014-04-02 Thread Robert Sander
Hi, what are the options to consistently backup and restore data out of a ceph cluster? - RBDs can be snapshotted. - Data on RBDs used inside VMs can be backed up using tools from the guest. - CephFS data can be backed up using rsync are similar tools What about object data in other pools? Ther

Re: [ceph-users] rbd map error - numerical result out of range

2014-04-02 Thread Tom
Hi again Ilya, No, no snapshots in this case. It's a brand new RBD that I've created. Cheers. Tom. On 01/04/14 16:08, Ilya Dryomov wrote: On Tue, Apr 1, 2014 at 6:55 PM, Tom wrote: Thanks for the reply. Ceph is version 0.73-1precise, and the kernel release is 3.11.9-031109-generic. also

Re: [ceph-users] Backup & Restore?

2014-04-02 Thread Karan Singh
Hi Robert Thanks for raising this question , backup and restores options has always been interesting to discuss. i too have a connected question for Inktank. —> Is there any work going for support of ceph cluster getting backed by enterprise *proprietary* backup solutions available today ***

Re: [ceph-users] rbd map error - numerical result out of range

2014-04-02 Thread Ilya Dryomov
On Wed, Apr 2, 2014 at 11:28 AM, Tom wrote: > Hi again Ilya, > > No, no snapshots in this case. It's a brand new RBD that I've created. Is this format 1 or format 2 image? Can you paste the contents of /proc/devices and /proc/partitions somewhere? Also, can you unmap a few of those 15 images t

[ceph-users] OpenStack + Ceph Integration

2014-04-02 Thread Tomokazu HIRAI
I Integrated Ceph + OpenStack with following document. https://ceph.com/docs/master/rbd/rbd-openstack/ I could put image to glance on ceph cluster. but I can not create any volume to cinder. error messages are the same on this URL. http://comments.gmane.org/gmane.comp.file-systems.ceph.user/764

Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-02 Thread Benedikt Fraunhofer
Hi Yehuda, i tried your patch and it feels fine, except you might need some special handling for those already corrupt uploads, as trying to delete them gets radosgw in an endless loop and high cpu usage: 2014-04-02 11:03:15.045627 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=3355443

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Kenneth Waegeman
- Message from Gregory Farnum - Date: Tue, 1 Apr 2014 09:03:17 -0700 From: Gregory Farnum Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error) To: "Yan, Zheng" Cc: Kenneth Waegeman , ceph-users On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng wrote

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Gregory Farnum
Hmm. I guess I'd look at: 1) How big and what shape the filesystem is. Do you have some extremely large directory that the MDS keeps trying to load and then dump? 2) Use tcmalloc's heap analyzer to see where all the memory is being allocated. 3) Look through the logs for when the beacon fails (the

Re: [ceph-users] MDS crash when client goes to sleep

2014-04-02 Thread Gregory Farnum
A *clean* shutdown? That sounds like a different issue; hjcho616's issue only happens when a client wakes back up again. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 2, 2014 at 6:34 AM, Florent B wrote: > Can someone confirm that this issue is also in Emperor re

Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-02 Thread Yehuda Sadeh
On Wed, Apr 2, 2014 at 2:08 AM, Benedikt Fraunhofer wrote: > Hi Yehuda, > > i tried your patch and it feels fine, > except you might need some special handling for those already corrupt uploads, > as trying to delete them gets radosgw in an endless loop and high cpu usage: The problem was with th

Re: [ceph-users] Setting root directory in fstab with Fuse

2014-04-02 Thread Gregory Farnum
It's been a while, but I think you need to use the long form "client_mountpoint" config option here instead. If you search the list archives it'll probably turn up; this is basically the only reason we ever discuss "-r". ;) Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt
hi gregory, (i'm a colleague of kenneth) 1) How big and what shape the filesystem is. Do you have some extremely large directory that the MDS keeps trying to load and then dump? anyway to extract this from the mds without having to start it? as it was an rsync operation, i can try to locate po

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden
Thanks for the response Greg. Unfortunately, I appear to be missing something. If I use my "cephfs" key with these perms: client.cephfs key: caps: [mds] allow rwx caps: [mon] allow r caps: [osd] allow rwx pool=data This is what happens when I mount: # ceph-fuse -k /etc/ceph/ce

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Gregory Farnum
Hrm, I don't remember. Let me know which permutation works and we can dig into it. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 2, 2014 at 9:00 AM, Travis Rhoden wrote: > Thanks for the response Greg. > > Unfortunately, I appear to be missing something. If I us

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden
Ah, I figured it out. My original key worked, but I needed to use the --id option with ceph-fuse to tell it to use the cephfs user rather than the admin user. Tailing the log on my monitor pointed out that it was logging in with client.admin, but providing the key for client.cephfs. So, final wo

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt
hi, 1) How big and what shape the filesystem is. Do you have some extremely large directory that the MDS keeps trying to load and then dump? anyway to extract this from the mds without having to start it? as it was an rsync operation, i can try to locate possible candidates on the source filesy

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Gregory Farnum
Did you see http://ceph.com/docs/master/rados/troubleshooting/memory-profiling? That should have what you need to get started, although you'll also need to learn the basics of using the heap analysis tools elsewhere. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 2

Re: [ceph-users] Multi-site Implementation

2014-04-02 Thread Craig Lewis
I assume you're talking about "Option Two: MULTI-SITE OBJECT STORAGE WITH FEDERATED GATEWAYS", from Inktank's http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise There are still some options. Each zone has a master and one (or more) replicas. You can only write to the ma

[ceph-users] write speed issue on RBD image

2014-04-02 Thread Russell E. Glaue
Can someone recommend some testing I can do to further investigate why this issue with slow-disk-write in the VM OS is occurring? It seems the issue, details below, are perhaps related to the VM OS running on the RADOS images in Ceph. Issue: I have a handful (like 10) of VM's running that, when

Re: [ceph-users] OpenStack + Ceph Integration

2014-04-02 Thread Sebastien Han
The section should be [client.keyring] keyring = Then restart cinder-volume after. Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood.” Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.eno

Re: [ceph-users] write speed issue on RBD image

2014-04-02 Thread Russell E. Glaue
Correction: When I wrote "Here I provide the test results of two VMs that are running on the same Ceph host, using disk images from the same ceph pool, and were cloned from the same RADOS snapshot." I really meant: "Here I provide the test results of two VMs that are running on the same Ceph hos

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt
wow. kudos for integrating this in ceph. more projects should do it like that! anyway, in attachement a gzipped ps file. heap is at 4.4GB, top reports 6.5GB mem usage. care to point out what to look for? i'll send a new one when the usage is starting to cause swapping. thanks, stijn On 0

Re: [ceph-users] Backup & Restore?

2014-04-02 Thread Craig Lewis
The short answer is "no". The longer answer is "it depends". The most concise discussion I've seen is Inktank's Multi-site option whitepaper: http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise That white paper only addresses RBD backups (using snapshots) and RadosGW backu

[ceph-users] CentOS radosgw-agent cannot be installed

2014-04-02 Thread Georgios Dimitrakakis
When I try to install the CentOS radosgw-agent on a CentOS 6.5 machine I get the following: [root@ceph1 centos]# yum install radosgw-agent Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: ftp.riken.jp * epel: ftp.jaist.ac.jp * extras: ftp.riken.jp

[ceph-users] Cancel a scrub?

2014-04-02 Thread Craig Lewis
Is there any way to cancel a scrub on a PG? I have an OSD that's recovering, and there's a single PG left waiting: 2014-04-02 13:15:39.868994 mon.0 [INF] pgmap v5322756: 2592 pgs: 2589 active+clean, 1 active+recovery_wait, 2 active+clean+scrubbing+deep; 15066 GB data, 30527 GB used, 29061 GB /

Re: [ceph-users] Cancel a scrub?

2014-04-02 Thread Sage Weil
On Wed, 2 Apr 2014, Craig Lewis wrote: > Is there any way to cancel a scrub on a PG? > > > I have an OSD that's recovering, and there's a single PG left waiting: > 2014-04-02 13:15:39.868994 mon.0 [INF] pgmap v5322756: 2592 pgs: 2589 > active+clean, 1 active+recovery_wait, 2 active+clean+scrubbin

Re: [ceph-users] write speed issue on RBD image

2014-04-02 Thread German Anders
Did you try those DD statements with the oflag=direct ? like: dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct; dd if=disk-test of=/dev/null bs=1048576 oflag=direct; /bin/rm disk-test In that way you are bypassing the host cache and wait for the ACK to first go straight to the

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt
an updated version with 20GB heap (top reports 24GB used) would starting more MDS and setting them active help? is there any guideline paper wrt the impact of multiple active MDS and the number of files in the tree? or some rule of thumb wrt memory, number of active MDS and files/directories i

Re: [ceph-users] Cancel a scrub?

2014-04-02 Thread Craig Lewis
Thanks! I knew about noscrub, but I didn't realize that the flapping would cancel a scrub in progress. So the scrub doesn't appear to be the reason it wasn't recovering. After a flap, it goes into: 2014-04-02 14:11:09.776810 mon.0 [INF] pgmap v5323181: 2592 pgs: 2591 active+clean, 1 active

[ceph-users] Cleaning up; data usage, snap-shots, auth users

2014-04-02 Thread Jonathan Gowar
Hi, I have a small 8TB testing cluster. During testing I've used 94G. But, I have since removed pools and images from Ceph, I shouldn't be using any space, but still the 94G usage remains. How can I reclaim old used space? Also, this:- ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba

[ceph-users] heartbeat_map is_healthy had timed out after 15

2014-04-02 Thread Craig Lewis
I'm seeing one OSD spamming it's log with 2014-04-02 16:49:21.547339 7f5cc6c5d700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f5cc3456700' had timed out after 15 It starts about 30 seconds after the OSD daemon is started. It continues until 2014-04-02 16:48:57.526925 7f0e5a683700 1 hea

Re: [ceph-users] write speed issue on RBD image

2014-04-02 Thread German Anders
So the real 'fast' performance was 100MB/s? Or you got some improve numbers? I'm looking for a cluster that could provide me at least 600-700MB/s of throughput per thread. Could you try this DDs and see what are the results?: dd if=/dev/zero of=./$RANDOM bs=4k count=22 oflag=direct dd

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Yan, Zheng
which version of kernel client did you use? please send out context of client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots memory. Regards Yan, Zheng On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt wrote: > wow. kudos for integrating this in ceph. more projects should do it like >

Re: [ceph-users] Cleaning up; data usage, snap-shots, auth users

2014-04-02 Thread Jean-Charles Lopez
Hi >From what is pasted, your remove failed so make sure you purge snapshots then the rbd image. For the user removal, as explained in www.ceph.com/docs or ceph auth help just issue ceph auth del {user} JC On Wednesday, April 2, 2014, Jonathan Gowar wrote: > Hi, > >I have a small 8TB test

[ceph-users] out then rm / just rm an OSD?

2014-04-02 Thread Chad William Seys
Hi All, Slide 19 of Ceph at CERN presentation http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern says that when removing an OSD from Ceph it is faster to just "ceph osd crush rm " rather than marking the osd as "out", waiting for data migration, and then "rm" the OSD. The reason the

Re: [ceph-users] Largest Production Ceph Cluster

2014-04-02 Thread Christian Balzer
On Tue, 1 Apr 2014 14:18:51 + Dan Van Der Ster wrote: [snip] > > > > http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern > > [snap] In that slide it says that replacing failed OSDs is automated via puppet. I'm very curious on how exactly this happens, as in: Do you just fire up a "sp

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt
hi, latest pprof output attached. this is no kernel client, this is ceph-fuse on EL6. starting the mds without any ceph-fuse mounts works without issue. mounting ceph-fuse afterwards also works fine. simple filesystem operations work as expected. we'll check the state of the fuse mount via f