Hi,
what are the options to consistently backup and restore
data out of a ceph cluster?
- RBDs can be snapshotted.
- Data on RBDs used inside VMs can be backed up using tools from the guest.
- CephFS data can be backed up using rsync are similar tools
What about object data in other pools?
Ther
Hi again Ilya,
No, no snapshots in this case. It's a brand new RBD that I've created.
Cheers. Tom.
On 01/04/14 16:08, Ilya Dryomov wrote:
On Tue, Apr 1, 2014 at 6:55 PM, Tom wrote:
Thanks for the reply.
Ceph is version 0.73-1precise, and the kernel release is
3.11.9-031109-generic.
also
Hi Robert
Thanks for raising this question , backup and restores options has always been
interesting to discuss. i too have a connected question for Inktank.
—> Is there any work going for support of ceph cluster getting backed by
enterprise *proprietary* backup solutions available today
***
On Wed, Apr 2, 2014 at 11:28 AM, Tom wrote:
> Hi again Ilya,
>
> No, no snapshots in this case. It's a brand new RBD that I've created.
Is this format 1 or format 2 image? Can you paste the contents of
/proc/devices and /proc/partitions somewhere?
Also, can you unmap a few of those 15 images t
I Integrated Ceph + OpenStack with following document.
https://ceph.com/docs/master/rbd/rbd-openstack/
I could put image to glance on ceph cluster. but I can not create any
volume to cinder.
error messages are the same on this URL.
http://comments.gmane.org/gmane.comp.file-systems.ceph.user/764
Hi Yehuda,
i tried your patch and it feels fine,
except you might need some special handling for those already corrupt uploads,
as trying to delete them gets radosgw in an endless loop and high cpu usage:
2014-04-02 11:03:15.045627 7fbf157d2700 0
RGWObjManifest::operator++(): result: ofs=3355443
- Message from Gregory Farnum -
Date: Tue, 1 Apr 2014 09:03:17 -0700
From: Gregory Farnum
Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
To: "Yan, Zheng"
Cc: Kenneth Waegeman , ceph-users
On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng wrote
Hmm. I guess I'd look at:
1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
2) Use tcmalloc's heap analyzer to see where all the memory is being allocated.
3) Look through the logs for when the beacon fails (the
A *clean* shutdown? That sounds like a different issue; hjcho616's
issue only happens when a client wakes back up again.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 2, 2014 at 6:34 AM, Florent B wrote:
> Can someone confirm that this issue is also in Emperor re
On Wed, Apr 2, 2014 at 2:08 AM, Benedikt Fraunhofer
wrote:
> Hi Yehuda,
>
> i tried your patch and it feels fine,
> except you might need some special handling for those already corrupt uploads,
> as trying to delete them gets radosgw in an endless loop and high cpu usage:
The problem was with th
It's been a while, but I think you need to use the long form
"client_mountpoint" config option here instead. If you search the list
archives it'll probably turn up; this is basically the only reason we
ever discuss "-r". ;)
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr
hi gregory,
(i'm a colleague of kenneth)
1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
anyway to extract this from the mds without having to start it? as it
was an rsync operation, i can try to locate po
Thanks for the response Greg.
Unfortunately, I appear to be missing something. If I use my "cephfs" key
with these perms:
client.cephfs
key:
caps: [mds] allow rwx
caps: [mon] allow r
caps: [osd] allow rwx pool=data
This is what happens when I mount:
# ceph-fuse -k /etc/ceph/ce
Hrm, I don't remember. Let me know which permutation works and we can
dig into it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 2, 2014 at 9:00 AM, Travis Rhoden wrote:
> Thanks for the response Greg.
>
> Unfortunately, I appear to be missing something. If I us
Ah, I figured it out. My original key worked, but I needed to use the --id
option with ceph-fuse to tell it to use the cephfs user rather than the
admin user. Tailing the log on my monitor pointed out that it was logging
in with client.admin, but providing the key for client.cephfs.
So, final wo
hi,
1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
anyway to extract this from the mds without having to start it? as it
was an rsync operation, i can try to locate possible candidates on the
source filesy
Did you see http://ceph.com/docs/master/rados/troubleshooting/memory-profiling?
That should have what you need to get started, although you'll also
need to learn the basics of using the heap analysis tools elsewhere.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 2
I assume you're talking about "Option Two: MULTI-SITE OBJECT STORAGE
WITH FEDERATED GATEWAYS", from Inktank's
http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise
There are still some options. Each zone has a master and one (or more)
replicas. You can only write to the ma
Can someone recommend some testing I can do to further investigate why this
issue with slow-disk-write in the VM OS is occurring?
It seems the issue, details below, are perhaps related to the VM OS running on
the RADOS images in Ceph.
Issue:
I have a handful (like 10) of VM's running that, when
The section should be
[client.keyring]
keyring =
Then restart cinder-volume after.
Sébastien Han
Cloud Engineer
"Always give 100%. Unless you're giving blood.”
Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.eno
Correction:
When I wrote "Here I provide the test results of two VMs that are running on
the same Ceph host, using disk images from the same ceph pool, and were cloned
from the same RADOS snapshot."
I really meant: "Here I provide the test results of two VMs that are running on
the same Ceph hos
wow. kudos for integrating this in ceph. more projects should do it like
that!
anyway, in attachement a gzipped ps file. heap is at 4.4GB, top reports
6.5GB mem usage.
care to point out what to look for? i'll send a new one when the usage
is starting to cause swapping.
thanks,
stijn
On 0
The short answer is "no". The longer answer is "it depends". The most
concise discussion I've seen is Inktank's Multi-site option whitepaper:
http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise
That white paper only addresses RBD backups (using snapshots) and
RadosGW backu
When I try to install the CentOS radosgw-agent on a CentOS 6.5 machine
I get the following:
[root@ceph1 centos]# yum install radosgw-agent
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
* base: ftp.riken.jp
* epel: ftp.jaist.ac.jp
* extras: ftp.riken.jp
Is there any way to cancel a scrub on a PG?
I have an OSD that's recovering, and there's a single PG left waiting:
2014-04-02 13:15:39.868994 mon.0 [INF] pgmap v5322756: 2592 pgs: 2589
active+clean, 1 active+recovery_wait, 2 active+clean+scrubbing+deep;
15066 GB data, 30527 GB used, 29061 GB /
On Wed, 2 Apr 2014, Craig Lewis wrote:
> Is there any way to cancel a scrub on a PG?
>
>
> I have an OSD that's recovering, and there's a single PG left waiting:
> 2014-04-02 13:15:39.868994 mon.0 [INF] pgmap v5322756: 2592 pgs: 2589
> active+clean, 1 active+recovery_wait, 2 active+clean+scrubbin
Did you try those DD statements with the oflag=direct ? like:
dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct; dd
if=disk-test of=/dev/null bs=1048576 oflag=direct; /bin/rm disk-test
In that way you are bypassing the host cache and wait for the ACK to
first go straight to the
an updated version with 20GB heap (top reports 24GB used)
would starting more MDS and setting them active help? is there any
guideline paper wrt the impact of multiple active MDS and the number of
files in the tree? or some rule of thumb wrt memory, number of active
MDS and files/directories i
Thanks!
I knew about noscrub, but I didn't realize that the flapping would
cancel a scrub in progress.
So the scrub doesn't appear to be the reason it wasn't recovering.
After a flap, it goes into:
2014-04-02 14:11:09.776810 mon.0 [INF] pgmap v5323181: 2592 pgs: 2591
active+clean, 1 active
Hi,
I have a small 8TB testing cluster. During testing I've used 94G.
But, I have since removed pools and images from Ceph, I shouldn't be
using any space, but still the 94G usage remains. How can I reclaim old
used space?
Also, this:-
ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba
I'm seeing one OSD spamming it's log with
2014-04-02 16:49:21.547339 7f5cc6c5d700 1 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f5cc3456700' had timed out after 15
It starts about 30 seconds after the OSD daemon is started. It
continues until
2014-04-02 16:48:57.526925 7f0e5a683700 1 hea
So the real 'fast' performance was 100MB/s? Or you got some improve
numbers? I'm looking for a cluster that could provide me at least
600-700MB/s of throughput per thread. Could you try this DDs and see
what are the results?:
dd if=/dev/zero of=./$RANDOM bs=4k count=22 oflag=direct
dd
which version of kernel client did you use? please send out context of
client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots
memory.
Regards
Yan, Zheng
On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt wrote:
> wow. kudos for integrating this in ceph. more projects should do it like
>
Hi
>From what is pasted, your remove failed so make sure you purge snapshots
then the rbd image.
For the user removal, as explained in www.ceph.com/docs or ceph auth help
just issue ceph auth del {user}
JC
On Wednesday, April 2, 2014, Jonathan Gowar wrote:
> Hi,
>
>I have a small 8TB test
Hi All,
Slide 19 of Ceph at CERN presentation
http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
says that when removing an OSD from Ceph it is faster to
just "ceph osd crush rm " rather than marking the
osd as "out", waiting for data migration, and then "rm" the
OSD.
The reason the
On Tue, 1 Apr 2014 14:18:51 + Dan Van Der Ster wrote:
[snip]
> >
> > http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern
> >
[snap]
In that slide it says that replacing failed OSDs is automated via puppet.
I'm very curious on how exactly this happens, as in:
Do you just fire up a "sp
hi,
latest pprof output attached.
this is no kernel client, this is ceph-fuse on EL6. starting the mds
without any ceph-fuse mounts works without issue. mounting ceph-fuse
afterwards also works fine. simple filesystem operations work as expected.
we'll check the state of the fuse mount via f
37 matches
Mail list logo