Hello, I just write to say that after more than a week the server still working without problem and the OSD are not marked as down erroneously. On my tests the webpage stop working for less than a minute when i stop an OSD, so the failover is working fine.
Greetings and thanks for all your help!! 2017-06-15 19:04 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > Hello, thanks for the info. > > I'll give a try tomorrow. On one of my test I got the messages that yo say > (wrongfully marked), but i've lowered other options and now is fine. For > now the OSD are not reporting down messages even with an high load test, > but I'll see the logs tomorrow to confirm. > > The most of time the server is used as RO and the load is not high, so if > an OSD is marked as down for some seconds is not a big problem (at least I > think that recovery traffic is low because it only has to check that pgs > are in both OSD). > > Greetings and thanks again! > > 2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>: > >> osd_heartbeat_grace is a setting for how many seconds since the last time >> an osd received a successful response from another osd before telling the >> mons that it's down. This is one you may want to lower from its default >> value of 20 seconds. >> >> mon_osd_min_down_reporters is a setting for how many osds need to report >> an osd as down before the mons will mark it as down. I recommend setting >> this to N+1 where N is how many osds you have in a node or failure domain. >> If you end up with a network problem and you have 1 osd node that can talk >> to the mons, but not the other osd nodes, then you will end up with that >> one node marking the entire cluster down while the rest of the cluster >> marks that node down. If your min_down_reporters is N+1, then 1 node cannot >> mark down the rest of the cluster. The default setting is 1 so that small >> test clusters can mark down osds, but if you have 3+ nodes, you should set >> it to N+1 if you can. Setting it to more than 2 nodes is equally as >> problematic. However, if you just want things to report as fast as >> possible, leaving this at 1 still might be optimal to getting it marked >> down sooner. >> >> The downside to lowering these settings is if OSDs are getting marked >> down for running slower, then they will re-assert themselves to the mons >> and end up causing backfilling and peering for no really good reason. >> You'll want to monitor your cluster for OSDs being marked down for a few >> seconds before marking themselves back up. You can see this in the OSD >> logs where the OSD says it was wrongfully marked down in one line and then >> the next is where it tells the mons it is actually up. >> >> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com> >> wrote: >> >>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD >>> daemons has started to use only a 5% (about 200MB). Is like magic, and now >>> I've about 3.2Gb of free RAM. >>> >>> Greetings!! >>> >>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: >>> >>>> Finally, the problem was W3Total Cache, that seems to be unable to >>>> manage HA and when the master redis host is down, it stop working without >>>> try the slave. >>>> >>>> I've added some options to make it faster to detect a down OSD and the >>>> page is online again in about 40s. >>>> >>>> [global] >>>> fsid = Hidden >>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 >>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 >>>> auth_cluster_required = cephx >>>> auth_service_required = cephx >>>> auth_client_required = cephx >>>> osd mon heartbeat interval = 5 >>>> osd mon report interval max = 10 >>>> mon osd report timeout = 15 >>>> osd fast fail on connection refused = True >>>> >>>> public network = 10.20.1.0/24 >>>> osd pool default size = 2 >>>> >>>> >>>> Greetings and thanks for all your help. >>>> >>>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>: >>>> >>>>> I've used the kernel client and the ceph-fuse driver for mapping the >>>>> cephfs volume. I didn't notice any network hiccups while failing over, >>>>> but >>>>> I was reading large files during my tests (and live) and some caching may >>>>> have hidden hidden network hiccups for my use case. >>>>> >>>>> Going back to the memory potentially being a problem. Ceph has a >>>>> tendency to start using 2-3x more memory while it's in a degraded state as >>>>> opposed to when everything is health_ok. Always plan for >>>>> over-provisioning >>>>> your memory to account for a minimum of 2x. I've seen clusters stuck in >>>>> an >>>>> OOM killer death spiral because it kept killing OSDs for running out of >>>>> memory, that caused more peering and backfilling, ... which caused more >>>>> OSDs to be killed by OOM killer. >>>>> >>>>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com> >>>>> wrote: >>>>> >>>>>> Is strange because on my test cluster (three nodes) with two nodes >>>>>> with OSD, and all with MON and MDS, I've configured the size to 2 and >>>>>> min_size to 1, I've restarted all nodes one by one and the client loose >>>>>> the >>>>>> connection for about 5 seconds until connect to other MDS. >>>>>> >>>>>> Are you using ceph client or kernel client? >>>>>> I forgot to say that I'm using Debian 8. >>>>>> >>>>>> Anyway, maybe the problem was what I've said before, the clients >>>>>> connection with that node started to fail, but the node was not >>>>>> officially >>>>>> down. And it wasn't a client problem, because it happened on both clients >>>>>> and on my monitoring service at same time. >>>>>> >>>>>> Just now I'm not on the office, so I can't post the config file. >>>>>> Tomorrow I'll send it. >>>>>> Anyway, is the basic file generated by ceph-deploy with client >>>>>> network and min_size configurations. Just like my test config. >>>>>> >>>>>> Thanks!!, and greetings!! >>>>>> >>>>>> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com> >>>>>> escribió: >>>>>> >>>>>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 >>>>>> at a time to do ceph and kernel upgrades. The VM's running out of ceph, >>>>>> the clients accessing MDS, etc all keep working fine without any problem >>>>>> during these restarts. What is your full ceph configuration? There must >>>>>> be something not quite right in there. >>>>>> >>>>>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com> >>>>>>> escribió: >>>>>>> >>>>>>> Not just the min_size of your cephfs data pool, but also your >>>>>>> cephfs_metadata pool. >>>>>>> >>>>>>> >>>>>>> Both were at 1. I don't know why because I don't remember to have >>>>>>> changed the min_size and the cluster has 3 odd from beginning (I >>>>>>> did it on another cluster for testing purposes, but I don't remember to >>>>>>> have changed on this). I've changed both to two, but after the fail. >>>>>>> >>>>>>> About the size, I use 50Gb because it's for a single webpage and I >>>>>>> don't need more space. >>>>>>> >>>>>>> I'll try to increase the memory to 3Gb. >>>>>>> >>>>>>> Greetings!! >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Ceph recommends 1GB of RAM for ever 1TB of OSD space. Your 2GB >>>>>>>> nodes are definitely on the low end. 50GB OSDs... I don't know what >>>>>>>> that >>>>>>>> will require, but where you're running the mon and mds on the same >>>>>>>> node, >>>>>>>> I'd still say that 2GB is low. The Ceph OSD daemon using 1GB of RAM >>>>>>>> is not >>>>>>>> surprising, even at that size. >>>>>>>> >>>>>>>> When you say you increased the size of the pools to 3, what did you >>>>>>>> do to the min_size? Is that still set to 2? >>>>>>>> >>>>>>>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco < >>>>>>>> d.carra...@i2tic.com> wrote: >>>>>>>> >>>>>>>>> Finally I've created three nodes, I've increased the size of pools >>>>>>>>> to 3 and I've created 3 MDS (active, standby, standby). >>>>>>>>> >>>>>>>>> Today the server has decided to fail and I've noticed that >>>>>>>>> failover is not working... The ceph -s command shows like everything >>>>>>>>> was OK >>>>>>>>> but the clients weren't able to connect and I had to restart the >>>>>>>>> failing >>>>>>>>> node and reconect the clients manually to make it work again (even I >>>>>>>>> think >>>>>>>>> that the active MDS was in another node). >>>>>>>>> >>>>>>>>> I don't know if maybe is because the server was not fully down, >>>>>>>>> and only some connections were failing. I'll do some tests too see. >>>>>>>>> >>>>>>>>> Another question: How many memory needs a node to work?, because >>>>>>>>> I've nodes with 2GB of RAM (one MDS, one MON and one OSD), and they >>>>>>>>> have an >>>>>>>>> high memory usage (more than 1GB on the OSD). >>>>>>>>> The OSD size is 50GB and the data that contains is less than 3GB. >>>>>>>>> >>>>>>>>> Thanks, and Greetings!! >>>>>>>>> >>>>>>>>> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>: >>>>>>>>> >>>>>>>>>> Since your app is an Apache / php app is it possible for you to >>>>>>>>>> reconfigure the app to use S3 module rather than a posix open >>>>>>>>>> file()? Then >>>>>>>>>> with Ceph drop CephFS and configure Civetweb S3 gateway? You can >>>>>>>>>> have >>>>>>>>>> "active-active" endpoints with round robin dns or F5 or something. >>>>>>>>>> You >>>>>>>>>> would also have to repopulate objects into the rados pools. >>>>>>>>>> >>>>>>>>>> Also increase that size parameter to 3. ;-) >>>>>>>>>> >>>>>>>>>> Lots of work for active-active but the whole stack will be much >>>>>>>>>> more resilient coming from some with a ClearCase / NFS / stale file >>>>>>>>>> handles >>>>>>>>>> up the wazoo background >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco < >>>>>>>>>> d.carra...@i2tic.com> wrote: >>>>>>>>>> >>>>>>>>>>> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>>> I have an incredibly light-weight cephfs configuration. I set >>>>>>>>>>>> up an MDS on each mon (3 total), and have 9TB of data in cephfs. >>>>>>>>>>>> This data >>>>>>>>>>>> only has 1 client that reads a few files at a time. I haven't >>>>>>>>>>>> noticed any >>>>>>>>>>>> downtime when it fails over to a standby MDS. So it definitely >>>>>>>>>>>> depends on >>>>>>>>>>>> your workload as to how a failover will affect your environment. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini < >>>>>>>>>>>> jpetr...@coredial.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> We use the following in our ceph.conf for MDS failover. We're >>>>>>>>>>>>> running one active and one standby. Last time it failed over >>>>>>>>>>>>> there was >>>>>>>>>>>>> about 2 minutes of downtime before the mounts started responding >>>>>>>>>>>>> again but >>>>>>>>>>>>> it did recover gracefully. >>>>>>>>>>>>> >>>>>>>>>>>>> [mds] >>>>>>>>>>>>> max_mds = 1 >>>>>>>>>>>>> mds_standby_for_rank = 0 >>>>>>>>>>>>> mds_standby_replay = true >>>>>>>>>>>>> >>>>>>>>>>>>> ___ >>>>>>>>>>>>> >>>>>>>>>>>>> John Petrini >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks to both. >>>>>>>>>>> Just now i'm working on that because I needs a very fast >>>>>>>>>>> failover. For now the tests give me a very fast response when an >>>>>>>>>>> OSD fails >>>>>>>>>>> (about 5 seconds), but a very slow response when the main MDS fails >>>>>>>>>>> (I've >>>>>>>>>>> not tested the real time, but was not working for a long time). >>>>>>>>>>> Maybe was >>>>>>>>>>> because I created the other MDS after mount, because I've done some >>>>>>>>>>> test >>>>>>>>>>> just before send this email and now looks very fast (i've not >>>>>>>>>>> noticed the >>>>>>>>>>> downtime). >>>>>>>>>>> >>>>>>>>>>> Greetings!! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> _________________________________________ >>>>>>>>>>> >>>>>>>>>>> Daniel Carrasco Marín >>>>>>>>>>> Ingeniería para la Innovación i2TIC, S.L. >>>>>>>>>>> Tlf: +34 911 12 32 84 Ext: 223 >>>>>>>>>>> www.i2tic.com >>>>>>>>>>> _________________________________________ >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list >>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> _________________________________________ >>>>>>>>> >>>>>>>>> Daniel Carrasco Marín >>>>>>>>> Ingeniería para la Innovación i2TIC, S.L. >>>>>>>>> Tlf: +34 911 12 32 84 Ext: 223 >>>>>>>>> www.i2tic.com >>>>>>>>> _________________________________________ >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>> >>>> >>>> -- >>>> _________________________________________ >>>> >>>> Daniel Carrasco Marín >>>> Ingeniería para la Innovación i2TIC, S.L. >>>> Tlf: +34 911 12 32 84 Ext: 223 >>>> www.i2tic.com >>>> _________________________________________ >>>> >>> >>> >>> >>> -- >>> _________________________________________ >>> >>> Daniel Carrasco Marín >>> Ingeniería para la Innovación i2TIC, S.L. >>> Tlf: +34 911 12 32 84 Ext: 223 >>> www.i2tic.com >>> _________________________________________ >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > > > -- > _________________________________________ > > Daniel Carrasco Marín > Ingeniería para la Innovación i2TIC, S.L. > Tlf: +34 911 12 32 84 Ext: 223 > www.i2tic.com > _________________________________________ > -- _________________________________________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _________________________________________
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com