Things *mostly* work if hosts on the same network have different MTUs, at
least with TCP, because the hosts will negotiate the MSS for each
connection.  UDP will still break, but large UDP packets are less common.
You don't want to run that way for very long, but there's no need for an
atomic MTU swap.

What *really* screws things up is when the host MTU is bigger than the
switch MTU.

On Tue, Apr 14, 2015 at 1:42 AM Martin Millnert <mar...@millnert.se> wrote:

> On Tue, Mar 31, 2015 at 10:44:51PM +0300, koukou73gr wrote:
> > On 03/31/2015 09:23 PM, Sage Weil wrote:
> > >
> > >It's nothing specific to peering (or ceph).  The symptom we've seen is
> > >just that byte stop passing across a TCP connection, usually when there
> is
> > >some largish messages being sent.  The ping/heartbeat messages get
> through
> > >because they are small and we disable nagle so they never end up in
> large
> > >frames.
> >
> > Is there any special route one should take in order to transition a
> > live cluster to use jumbo frames and avoid such pitfalls with OSD
> > peering?
>
> 1. Configure entire switch infrastructure for jumbo frames.
> 2. Enable config versioning of switch infrastructure configurations
> 3. Bonus points: Monitor config changes of switch infrastructure
> 4. Run ping test using e.g. fping from each node to every other node,
> with large frames.
> 5. Bonus points: Setup such a test in some monitor infrastructure.
> 6. Once you trust the config (and monitoring), up all the nodes MTU
> to jumbo size, simultaneously.  This is the critical step and perhaps
> it could be further perfected. Ideally you would like an atomic
> MTU-upgrade command on the entire cluster.
>
> /M
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to