Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

Daniel Carrasco Tue, 27 Jun 2017 00:56:49 -0700

Hello,

I just write to say that after more than a week the server still working
without problem and the OSD are not marked as down erroneously. On my tests
the webpage stop working for less than a minute when i stop an OSD, so the
failover is working fine.


Greetings and thanks for all your help!!

2017-06-15 19:04 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Hello, thanks for the info.
>
> I'll give a try tomorrow. On one of my test I got the messages that yo say
> (wrongfully marked), but i've lowered other options and now is fine. For
> now the OSD are not reporting down messages even with an high load test,
> but I'll see the logs tomorrow to confirm.
>
> The most of time the server is used as RO and the load is not high, so if
> an OSD is marked as down for some seconds is not a big problem (at least I
> think that recovery traffic is low because it only has to check that pgs
> are in both OSD).
>
> Greetings and thanks again!
>
> 2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>:
>
>> osd_heartbeat_grace is a setting for how many seconds since the last time
>> an osd received a successful response from another osd before telling the
>> mons that it's down.  This is one you may want to lower from its default
>> value of 20 seconds.
>>
>> mon_osd_min_down_reporters is a setting for how many osds need to report
>> an osd as down before the mons will mark it as down.  I recommend setting
>> this to N+1 where N is how many osds you have in a node or failure domain.
>> If you end up with a network problem and you have 1 osd node that can talk
>> to the mons, but not the other osd nodes, then you will end up with that
>> one node marking the entire cluster down while the rest of the cluster
>> marks that node down. If your min_down_reporters is N+1, then 1 node cannot
>> mark down the rest of the cluster.  The default setting is 1 so that small
>> test clusters can mark down osds, but if you have 3+ nodes, you should set
>> it to N+1 if you can.  Setting it to more than 2 nodes is equally as
>> problematic.  However, if you just want things to report as fast as
>> possible, leaving this at 1 still might be optimal to getting it marked
>> down sooner.
>>
>> The downside to lowering these settings is if OSDs are getting marked
>> down for running slower, then they will re-assert themselves to the mons
>> and end up causing backfilling and peering for no really good reason.
>> You'll want to monitor your cluster for OSDs being marked down for a few
>> seconds before marking themselves back up.  You can see this in the OSD
>> logs where the OSD says it was wrongfully marked down in one line and then
>> the next is where it tells the mons it is actually up.
>>
>> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com>
>> wrote:
>>
>>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD
>>> daemons has started to use only a 5% (about 200MB). Is like magic, and now
>>> I've about 3.2Gb of free RAM.
>>>
>>> Greetings!!
>>>
>>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:
>>>
>>>> Finally, the problem was W3Total Cache, that seems to be unable to
>>>> manage HA and when the master redis host is down, it stop working without
>>>> try the slave.
>>>>
>>>> I've added some options to make it faster to detect a down OSD and the
>>>> page is online again in about 40s.
>>>>
>>>> [global]
>>>> fsid = Hidden
>>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
>>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
>>>> auth_cluster_required = cephx
>>>> auth_service_required = cephx
>>>> auth_client_required = cephx
>>>> osd mon heartbeat interval = 5
>>>> osd mon report interval max = 10
>>>> mon osd report timeout = 15
>>>> osd fast fail on connection refused = True
>>>>
>>>> public network = 10.20.1.0/24
>>>> osd pool default size = 2
>>>>
>>>>
>>>> Greetings and thanks for all your help.
>>>>
>>>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>>>
>>>>> I've used the kernel client and the ceph-fuse driver for mapping the
>>>>> cephfs volume.  I didn't notice any network hiccups while failing over, 
>>>>> but
>>>>> I was reading large files during my tests (and live) and some caching may
>>>>> have hidden hidden network hiccups for my use case.
>>>>>
>>>>> Going back to the memory potentially being a problem.  Ceph has a
>>>>> tendency to start using 2-3x more memory while it's in a degraded state as
>>>>> opposed to when everything is health_ok.  Always plan for 
>>>>> over-provisioning
>>>>> your memory to account for a minimum of 2x.  I've seen clusters stuck in 
>>>>> an
>>>>> OOM killer death spiral because it kept killing OSDs for running out of
>>>>> memory, that caused more peering and backfilling, ... which caused more
>>>>> OSDs to be killed by OOM killer.
>>>>>
>>>>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com>
>>>>> wrote:
>>>>>
>>>>>> Is strange because on my test cluster (three nodes) with two nodes
>>>>>> with OSD, and all with MON and MDS, I've configured the size to 2 and
>>>>>> min_size to 1, I've restarted all nodes one by one and the client loose 
>>>>>> the
>>>>>> connection for about 5 seconds until connect to other MDS.
>>>>>>
>>>>>> Are you using ceph client or kernel client?
>>>>>> I forgot to say that I'm using Debian 8.
>>>>>>
>>>>>> Anyway, maybe the problem was what I've said before, the clients
>>>>>> connection with that node started to fail, but the node was not 
>>>>>> officially
>>>>>> down. And it wasn't a client problem, because it happened on both clients
>>>>>> and on my monitoring service at same time.
>>>>>>
>>>>>> Just now I'm not on the office, so I can't post the config file.
>>>>>> Tomorrow I'll send it.
>>>>>> Anyway, is the basic file generated by ceph-deploy with client
>>>>>> network and min_size configurations. Just like my test config.
>>>>>>
>>>>>> Thanks!!, and greetings!!
>>>>>>
>>>>>> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com>
>>>>>> escribió:
>>>>>>
>>>>>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1
>>>>>> at a time to do ceph and kernel upgrades.  The VM's running out of ceph,
>>>>>> the clients accessing MDS, etc all keep working fine without any problem
>>>>>> during these restarts.  What is your full ceph configuration?  There must
>>>>>> be something not quite right in there.
>>>>>>
>>>>>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> Not just the min_size of your cephfs data pool, but also your
>>>>>>> cephfs_metadata pool.
>>>>>>>
>>>>>>>
>>>>>>> Both were at 1. I don't know why because I don't remember to have
>>>>>>> changed the min_size and the cluster has 3 odd from beginning (I
>>>>>>> did it on another cluster for testing purposes, but I don't remember to
>>>>>>> have changed on this). I've changed both to two, but after the fail.
>>>>>>>
>>>>>>> About the size, I use 50Gb because it's for a single webpage and I
>>>>>>> don't need more space.
>>>>>>>
>>>>>>> I'll try to increase the memory to 3Gb.
>>>>>>>
>>>>>>> Greetings!!
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB
>>>>>>>> nodes are definitely on the low end.  50GB OSDs... I don't know what 
>>>>>>>> that
>>>>>>>> will require, but where you're running the mon and mds on the same 
>>>>>>>> node,
>>>>>>>> I'd still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM 
>>>>>>>> is not
>>>>>>>> surprising, even at that size.
>>>>>>>>
>>>>>>>> When you say you increased the size of the pools to 3, what did you
>>>>>>>> do to the min_size?  Is that still set to 2?
>>>>>>>>
>>>>>>>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco <
>>>>>>>> d.carra...@i2tic.com> wrote:
>>>>>>>>
>>>>>>>>> Finally I've created three nodes, I've increased the size of pools
>>>>>>>>> to 3 and I've created 3 MDS (active, standby, standby).
>>>>>>>>>
>>>>>>>>> Today the server has decided to fail and I've noticed that
>>>>>>>>> failover is not working... The ceph -s command shows like everything 
>>>>>>>>> was OK
>>>>>>>>> but the clients weren't able to connect and I had to restart the 
>>>>>>>>> failing
>>>>>>>>> node and reconect the clients manually to make it work again (even I 
>>>>>>>>> think
>>>>>>>>> that the active MDS was in another node).
>>>>>>>>>
>>>>>>>>> I don't know if maybe is because the server was not fully down,
>>>>>>>>> and only some connections were failing. I'll do some tests too see.
>>>>>>>>>
>>>>>>>>> Another question: How many memory needs a node to work?, because
>>>>>>>>> I've nodes with 2GB of RAM (one MDS, one MON and one OSD), and they 
>>>>>>>>> have an
>>>>>>>>> high memory usage (more than 1GB on the OSD).
>>>>>>>>> The OSD size is 50GB and the data that contains is less than 3GB.
>>>>>>>>>
>>>>>>>>> Thanks, and Greetings!!
>>>>>>>>>
>>>>>>>>> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Since your app is an Apache / php app is it possible for you to
>>>>>>>>>> reconfigure the app to use S3 module rather than a posix open 
>>>>>>>>>> file()?  Then
>>>>>>>>>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can 
>>>>>>>>>> have
>>>>>>>>>> "active-active" endpoints with round robin dns or F5 or something.  
>>>>>>>>>> You
>>>>>>>>>> would also have to repopulate objects into the rados pools.
>>>>>>>>>>
>>>>>>>>>> Also increase that size parameter to 3.  ;-)
>>>>>>>>>>
>>>>>>>>>> Lots of work for active-active but the whole stack will be much
>>>>>>>>>> more resilient coming from some with a ClearCase / NFS / stale file 
>>>>>>>>>> handles
>>>>>>>>>> up the wazoo background
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <
>>>>>>>>>> d.carra...@i2tic.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> I have an incredibly light-weight cephfs configuration.  I set
>>>>>>>>>>>> up an MDS on each mon (3 total), and have 9TB of data in cephfs.  
>>>>>>>>>>>> This data
>>>>>>>>>>>> only has 1 client that reads a few files at a time.  I haven't 
>>>>>>>>>>>> noticed any
>>>>>>>>>>>> downtime when it fails over to a standby MDS.  So it definitely 
>>>>>>>>>>>> depends on
>>>>>>>>>>>> your workload as to how a failover will affect your environment.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <
>>>>>>>>>>>> jpetr...@coredial.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> We use the following in our ceph.conf for MDS failover. We're
>>>>>>>>>>>>> running one active and one standby. Last time it failed over 
>>>>>>>>>>>>> there was
>>>>>>>>>>>>> about 2 minutes of downtime before the mounts started responding 
>>>>>>>>>>>>> again but
>>>>>>>>>>>>> it did recover gracefully.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [mds]
>>>>>>>>>>>>> max_mds = 1
>>>>>>>>>>>>> mds_standby_for_rank = 0
>>>>>>>>>>>>> mds_standby_replay = true
>>>>>>>>>>>>>
>>>>>>>>>>>>> ___
>>>>>>>>>>>>>
>>>>>>>>>>>>> John Petrini
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks to both.
>>>>>>>>>>> Just now i'm working on that because I needs a very fast
>>>>>>>>>>> failover. For now the tests give me a very fast response when an 
>>>>>>>>>>> OSD fails
>>>>>>>>>>> (about 5 seconds), but a very slow response when the main MDS fails 
>>>>>>>>>>> (I've
>>>>>>>>>>> not tested the real time, but was not working for a long time). 
>>>>>>>>>>> Maybe was
>>>>>>>>>>> because I created the other MDS after mount, because I've done some 
>>>>>>>>>>> test
>>>>>>>>>>> just before send this email and now looks very fast (i've not 
>>>>>>>>>>> noticed the
>>>>>>>>>>> downtime).
>>>>>>>>>>>
>>>>>>>>>>> Greetings!!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> _________________________________________
>>>>>>>>>>>
>>>>>>>>>>>       Daniel Carrasco Marín
>>>>>>>>>>>       Ingeniería para la Innovación i2TIC, S.L.
>>>>>>>>>>>       Tlf:  +34 911 12 32 84 Ext: 223
>>>>>>>>>>>       www.i2tic.com
>>>>>>>>>>> _________________________________________
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> _________________________________________
>>>>>>>>>
>>>>>>>>>       Daniel Carrasco Marín
>>>>>>>>>       Ingeniería para la Innovación i2TIC, S.L.
>>>>>>>>>       Tlf:  +34 911 12 32 84 Ext: 223
>>>>>>>>>       www.i2tic.com
>>>>>>>>> _________________________________________
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list
>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> _________________________________________
>>>>
>>>>       Daniel Carrasco Marín
>>>>       Ingeniería para la Innovación i2TIC, S.L.
>>>>       Tlf:  +34 911 12 32 84 Ext: 223
>>>>       www.i2tic.com
>>>> _________________________________________
>>>>
>>>
>>>
>>>
>>> --
>>> _________________________________________
>>>
>>>       Daniel Carrasco Marín
>>>       Ingeniería para la Innovación i2TIC, S.L.
>>>       Tlf:  +34 911 12 32 84 Ext: 223
>>>       www.i2tic.com
>>> _________________________________________
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
>
> --
> _________________________________________
>
>       Daniel Carrasco Marín
>       Ingeniería para la Innovación i2TIC, S.L.
>       Tlf:  +34 911 12 32 84 Ext: 223
>       www.i2tic.com
> _________________________________________
>



-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

Reply via email to