[ceph-users] How To Properly Failover a HA Setup

2019-01-21 Thread Charles Tassell
Hello Everyone,   I've got a 3 node Jewel cluster setup, and I think I'm missing something.  When I want to take one of my nodes down for maintenance (kernel upgrades or the like) all of my clients (running the kernel module for the cephfs filesystem) hang for a couple of minutes before the r

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 12:12 PM Albert Yue wrote: > > Hi Yan Zheng, > > 1. mds cache limit is set to 64GB > 2. we get the size of meta data pool by running `ceph df` and saw meta data > pool just used 200MB space. > That's very strange. One file uses about 1k metadata storage. 560M files should

Re: [ceph-users] How To Properly Failover a HA Setup

2019-01-21 Thread Robert Sander
On 21.01.19 09:22, Charles Tassell wrote: > Hello Everyone, > >   I've got a 3 node Jewel cluster setup, and I think I'm missing > something.  When I want to take one of my nodes down for maintenance > (kernel upgrades or the like) all of my clients (running the kernel > module for the cephfs

Re: [ceph-users] How To Properly Failover a HA Setup

2019-01-21 Thread Marc Roos
I think his downtime is coming from the mds failover, that takes a while in my case to. But I am not using the cephfs that much yet. -Original Message- From: Robert Sander [mailto:r.san...@heinlein-support.de] Sent: 21 January 2019 10:05 To: ceph-users@lists.ceph.com Subject: Re: [

[ceph-users] RadosGW replication and failover issues

2019-01-21 Thread Ronnie Lazar
Hi, We are running the following radosgw( luminous 12.2.8) replications scenario. 1) We have 2 clusters, each running a radosgw, Cluster1 defined as master, and Cluster2 as slave. 2) We create a number of bucket with objects via master and slave 3) We shutdown the Cluster1 4) We execute failover on

Re: [ceph-users] monitor cephfs mount io's

2019-01-21 Thread Marc Roos
Hi Mohamad, How do you do that client side, I am having currently two kernel mounts? -Original Message- From: Mohamad Gebai [mailto:mge...@suse.de] Sent: 17 January 2019 15:57 To: Marc Roos; ceph-users Subject: Re: [ceph-users] monitor cephfs mount io's You can do that either st

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-21 Thread Marc Roos
I will do that next time. Do you know by any chance if using 'timeout' could prevent this? From the manual I get that timeout will send a signal like HUP or KILL, in that case it will be not so different. -Original Message- From: Yan, Zheng [mailto:uker...@gmail.com] Sent: 21 Janua

[ceph-users] process stuck in D state on cephfs kernel mount

2019-01-21 Thread Marc Roos
I had this weekend a process stuck in D state writing to a cephfs kernel mount, causing the load of the server go to 80 (normally around 1). Forcing me to reboot it. I think this problem is related to the networking between this vm and ceph nodes. Rsync also sometimes complains about a broke

Re: [ceph-users] Ceph in OSPF environment

2019-01-21 Thread Max Krasilnikov
Hello! Sun, Jan 20, 2019 at 09:07:35PM +, robbat2 wrote: > On Sun, Jan 20, 2019 at 09:05:10PM +, Max Krasilnikov wrote: > > > Just checking, since it isn't mentioned here: Did you explicitly add > > > public_network+cluster_network as empty variables? > > > > > > Trace the code in the

[ceph-users] RBD client hangs

2019-01-21 Thread ST Wong (ITSC)
Hi, we're trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and 3 MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on different clients without problem. Then one day we perform failover test and stopped one of the OSD. Not sure if it's related but after that test

Re: [ceph-users] process stuck in D state on cephfs kernel mount

2019-01-21 Thread Stijn De Weirdt
hi marc, > - how to prevent the D state process to accumulate so much load? you can't. in linux, uninterruptable tasks themself count as "load", this does not mean you eg ran out of cpu resources. stijn > > Thanks, > > > > > > ___ > ceph-users ma

Re: [ceph-users] RBD client hangs

2019-01-21 Thread Ilya Dryomov
On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) wrote: > > Hi, we’re trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and 3 > MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on > different clients without problem. Is this an upgraded or a fresh cluster? > > Th

Re: [ceph-users] How To Properly Failover a HA Setup

2019-01-21 Thread David C
It could also be the kernel client versions, what are you running? I remember older kernel clients didn't always deal with recovery scenarios very well. On Mon, Jan 21, 2019 at 9:18 AM Marc Roos wrote: > > > I think his downtime is coming from the mds failover, that takes a while > in my case to

Re: [ceph-users] Ceph in OSPF environment

2019-01-21 Thread Max Krasilnikov
День добрий! Mon, Jan 21, 2019 at 10:42:58AM +, pseudo wrote: > > On Sun, Jan 20, 2019 at 09:05:10PM +, Max Krasilnikov wrote: > > > > Just checking, since it isn't mentioned here: Did you explicitly add > > > > public_network+cluster_network as empty variables? > > > > > > > > Trace

Re: [ceph-users] Ceph in OSPF environment

2019-01-21 Thread Burkhard Linke
Hi, I'm curious.what is the advantage of OSPF in your setup over e.g. LACP bonding both links? Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph in OSPF environment

2019-01-21 Thread Serkan Çoban
If ToR switches are L3 then you can not use LACP. On Mon, Jan 21, 2019 at 4:02 PM Burkhard Linke wrote: > > Hi, > > > I'm curious.what is the advantage of OSPF in your setup over e.g. > LACP bonding both links? > > > Regards, > > Burkhard > > > ___

Re: [ceph-users] Problem with OSDs

2019-01-21 Thread Alfredo Deza
On Sun, Jan 20, 2019 at 11:30 PM Brian Topping wrote: > > Hi all, looks like I might have pooched something. Between the two nodes I > have, I moved all the PGs to one machine, reformatted the other machine, > rebuilt that machine, and moved the PGs back. In both cases, I did this by > taking t

Re: [ceph-users] Bluestore 32bit max_object_size limit

2019-01-21 Thread Igor Fedotov
On 1/18/2019 6:33 PM, KEVIN MICHAEL HRPCEK wrote: On 1/18/19 7:26 AM, Igor Fedotov wrote: Hi Kevin, On 1/17/2019 10:50 PM, KEVIN MICHAEL HRPCEK wrote: Hey, I recall reading about this somewhere but I can't find it in the docs or list archive and confirmation from a dev or someone who kn

Re: [ceph-users] Ceph in OSPF environment

2019-01-21 Thread Simon Leinen
Burkhard Linke writes: > I'm curious.what is the advantage of OSPF in your setup over > e.g. LACP bonding both links? Good question! Some people (including myself) are uncomfortable with LACP (in particular "MLAG", i.e. port aggregation across multiple chassis), and with fancy L2 setups in gen

[ceph-users] Additional meta data attributes for rgw user?

2019-01-21 Thread Benjeman Meekhof
Hi all, I'm looking to keep some extra meta-data associated with radosgw users created by radosgw-admin. I saw in the output of 'radosgw-admin metadata get user:someuser" there is an 'attrs' structure that looked promising. However it seems to be strict about what it accepts so I wonder if that'

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 5:41 PM Marc Roos wrote: > > > > I will do that next time. Do you know by any chance if using 'timeout' > could prevent this? From the manual I get that timeout will send a > signal like HUP or KILL, in that case it will be not so different. > > no, there is no config for

[ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-21 Thread Thomas
Hi,   my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster.   This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote: > > Dear Ceph Users, > > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We create

[ceph-users] osd deployment: DB/WAL links

2019-01-21 Thread Vladimir Prokofev
Hello list. Today while redeploying an OSD I've noticed that links to DB/WAL devices are pointing to partitions themselves, not to the partition UUID how it was before. I think that changed with latest ceph-deploy. I'm using 12.2.2 on my mon/osd nodes. ceph-deploy is 2.0.1 on admin node. All node

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Marc Roos
How can you see that the cache is filling up and you need to execute "echo 2 > /proc/sys/vm/drop_caches"? -Original Message- From: Yan, Zheng [mailto:uker...@gmail.com] Sent: 21 January 2019 15:50 To: Albert Yue Cc: ceph-users Subject: Re: [ceph-users] MDS performance issue On Mon,

Re: [ceph-users] [Ceph-announce] Ceph tech talk tomorrow: NooBaa data platform for distributed hybrid clouds

2019-01-21 Thread Mike Perez
Hey all, Here's the tech talk recording: https://www.youtube.com/watch?v=uW6NvsYFX-s -- Mike Perez (thingee) On Wed, Jan 16, 2019 at 4:01 PM Sage Weil wrote: > > Hi everyone, > > First, this is a reminder that there is a Tech Talk tomorrow from Guy > Margalit about NooBaa, a multi-cloud object

Re: [ceph-users] quick questions about a 5-node homelab setup

2019-01-21 Thread Janne Johansson
Den fre 18 jan. 2019 kl 12:42 skrev Robert Sander : > > Assuming BlueStore is too fat for my crappy nodes, do I need to go to > > FileStore? If yes, then with xfs as the file system? Journal on the SSD as > > a directory, then? > > Journal for FileStore is also a block device. It can be a file

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Alexandre DERUMIER
>>How can you see that the cache is filling up and you need to execute >>"echo 2 > /proc/sys/vm/drop_caches"? you can monitor number of ceph dentry in slabinfo here a small script I'm running in cron. #!/bin/bash if pidof -o %PPID -x "dropcephinodecache.sh">/dev/null; then echo "Pro

[ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-21 Thread Thomas
Hi,   my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster.   This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single

[ceph-users] Cephalocon Barcelona 2019 Early Bird Registration Now Available!

2019-01-21 Thread Mike Perez
Hey everyone, Cephalocon Barcelona 2019 early bird registration is now available through February 15th. After that rates go up, so please register now to lock in your discounted ticket. https://ceph.com/cephalocon/barcelona-2019/ As a reminder, the CFP will close February 1st. If you need assist

Re: [ceph-users] [Ceph-ansible] [ceph-ansible]Failure at TASK [ceph-osd : activate osd(s) when device is a disk]

2019-01-21 Thread Cody
Hi Sebastien, Thank you for following up on this. I have resolved the issue by zapping all the disks and switching to the LVM scenario. I will open an issue on GitHub if I ever run into the same problem again later. Thanks and Cheers, Cody On Mon, Jan 21, 2019 at 4:23 AM Sebastien Han wrote: >

Re: [ceph-users] quick questions about a 5-node homelab setup

2019-01-21 Thread Brian Topping
> On Jan 18, 2019, at 3:48 AM, Eugen Leitl wrote: > > > (Crossposting this from Reddit /r/ceph , since likely to have more technical > audience present here). > > I've scrounged up 5 old Atom Supermicro nodes and would like to run them > 365/7 for limited production as RBD with Bluestore (ide

Re: [ceph-users] Problem with OSDs

2019-01-21 Thread Brian Topping
> On Jan 21, 2019, at 6:47 AM, Alfredo Deza wrote: > > When creating an OSD, ceph-volume will capture the ID and the FSID and > use these to create a systemd unit. When the system boots, it queries > LVM for devices that match that ID/FSID information. Thanks Alfredo, I see that now. The name co

Re: [ceph-users] process stuck in D state on cephfs kernel mount

2019-01-21 Thread Brad Hubbard
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html should still be current enough and makes good reading on the subject. On Mon, Jan 21, 2019 at 8:46 PM Stijn De Weirdt wrote: > > hi marc, > > > - how to prevent the D state process to accumulate so much load? > you can't. in lin

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Gregory Farnum
On Mon, Jan 21, 2019 at 12:52 AM Yan, Zheng wrote: > On Mon, Jan 21, 2019 at 12:12 PM Albert Yue > wrote: > > > > Hi Yan Zheng, > > > > 1. mds cache limit is set to 64GB > > 2. we get the size of meta data pool by running `ceph df` and saw meta > data pool just used 200MB space. > > > > That's v

Re: [ceph-users] RBD client hangs

2019-01-21 Thread ST Wong (ITSC)
Hi, > Is this an upgraded or a fresh cluster? It's a fresh cluster. > Does client.acapp1 have the permission to blacklist other clients? You can > check with "ceph auth get client.acapp1". No, it's our first Ceph cluster with basic setup for testing, without any blacklist implemented.

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Albert Yue
Hi Yan Zheng, In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB memory machine? On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng wrote: > On Mon, Jan 21, 2019 at 11:16 AM Albert Yue > wrote: > > > > Dear Ceph Users, > > > > We have set up a cephFS cluster with 6 osd machines,