On Tue, Aug 8, 2017 at 12:42 AM, Moacir Ferreira <moacirferre...@hotmail.com > wrote:
> Fabrice, > > > If you choose to have jumbo frames all over, then when the traffic goes > outside of your "jumbo frames" enabled network it will be necessary to be > fragmented back again to the destination MTU. Most of the datacenters will > provide services to the outside world where the MTU is 1500 bytes. In this > case, you will slow down your performance because your router will be doing > the fragmentation. So I would always use jumbo frames in the datacenter for > east/west traffic and standard (1500 bytes) for north/south traffic. > I doubt this would happen with modern TCP/IP stacks, for TCP connections. It'll adjust to the path most likely, using PMTUD. Of course, this does not always work (depends on HW en-route). UDP packets might fail miserably too (dropped), depending on the HW en-route, but UDP traffic (and specifically large packets) are not that common these days. Nevertheless, I don't see a huge advantage in enabling this for north-south traffic, TBH, and the mysterious, random traffic drop issues it may cause is not worth it. Y. > > Moacir > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 7 Aug 2017 21:50:36 +0200 > From: Fabrice Bacchella <fabrice.bacche...@orange.fr> > To: FERNANDO FREDIANI <fernando.fredi...@upx.com> > Cc: users@ovirt.org > Subject: Re: [ovirt-users] Good practices > Message-ID: <4365e3f7-4c77-4ff5-8401-1cda2f002...@orange.fr> > Content-Type: text/plain; charset="windows-1252" > > >> Moacir: Yes! This is another reason to have separate networks for > north/south and east/west. In that way I can use the standard MTU on the > 10Gb NICs and jumbo frames on the file/move 40Gb NICs. > > Why not Jumbo frame every where ? > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.ovirt.org/pipermail/users/attachments/ > 20170807/4ba55f08/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Mon, 7 Aug 2017 16:52:40 -0300 > From: FERNANDO FREDIANI <fernando.fredi...@upx.com> > To: Fabrice Bacchella <fabrice.bacche...@orange.fr> > Cc: users@ovirt.org > Subject: Re: [ovirt-users] Good practices > Message-ID: <40d044ae-a41d-082e-131a-bf5fb5503...@upx.com> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > What you mentioned is a specific case and not a generic situation. The > main point there is that RAID 5 or 6 impacts write performance compared > when you write to only 2 given disks at a time. That was the comparison > made. > > Fernando > > > On 07/08/2017 16:49, Fabrice Bacchella wrote: > > > >> Le 7 ao?t 2017 ? 17:41, FERNANDO FREDIANI <fernando.fredi...@upx.com > >> <mailto:fernando.fredi...@upx.com <fernando.fredi...@upx.com>>> a > ?crit : > >> > > > >> Yet another downside of having a RAID (specially RAID 5 or 6) is that > >> it reduces considerably the write speeds as each group of disks will > >> end up having the write speed of a single disk as all other disks of > >> that group have to wait for each other to write as well. > >> > > > > That's not true if you have medium to high range hardware raid. For > > example, HP Smart Array come with a flash cache of about 1 or 2 Gb > > that hides that from the OS. > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.ovirt.org/pipermail/users/attachments/ > 20170807/db3094e7/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Mon, 7 Aug 2017 22:05:19 +0200 > From: Erekle Magradze <erekle.magra...@recogizer.de> > To: FERNANDO FREDIANI <fernando.fredi...@upx.com>, users@ovirt.org > Subject: Re: [ovirt-users] Good practices > Message-ID: <bac362c7-daba-918c-f728-13e1a74d6...@recogizer.de> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > Hi Franando, > > So let's go with the following scenarios: > > 1. Let's say you have two servers (replication factor is 2), i.e. two > bricks per volume, in this case it is strongly recommended to have the > arbiter node, the metadata storage that will guarantee avoiding the > split brain situation, in this case for arbiter you don't even need a > disk with lots of space, it's enough to have a tiny ssd but hosted on a > separate server. Advantage of such setup is that you don't need the RAID > 1 for each brick, you have the metadata information stored in arbiter > node and brick replacement is easy. > > 2. If you have odd number of bricks (let's say 3, i.e. replication > factor is 3) in your volume and you didn't create the arbiter node as > well as you didn't configure the quorum, in this case the entire load > for keeping the consistency of the volume resides on all 3 servers, each > of them is important and each brick contains key information, they need > to cross-check each other (that's what people usually do with the first > try of gluster :) ), in this case replacing a brick is a big pain and in > this case RAID 1 is a good option to have (that's the disadvantage, i.e. > loosing the space and not having the JBOD option) advantage is that you > don't have the to have additional arbiter node. > > 3. You have odd number of bricks and configured arbiter node, in this > case you can easily go with JBOD, however a good practice would be to > have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly > sufficient for volumes with 10s of TB-s in size.) > > That's basically it > > The rest about the reliability and setup scenarios you can find in > gluster documentation, especially look for quorum and arbiter node > configs+options. > > Cheers > > Erekle > > P.S. What I was mentioning, regarding a good practice is mostly related > to the operations of gluster not installation or deployment, i.e. not > the conceptual understanding of gluster (conceptually it's a JBOD system). > > > On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote: > > > > Thanks for the clarification Erekle. > > > > However I get surprised with this way of operating from GlusterFS as > > it adds another layer of complexity to the system (either a hardware > > or software RAID) before the gluster config and increase the system's > > overall costs. > > > > An important point to consider is: In RAID configuration you already > > have space 'wasted' in order to build redundancy (either RAID 1, 5, or > > 6). Then when you have GlusterFS on the top of several RAIDs you have > > again more data replicated so you end up with the same data consuming > > more space in a group of disks and again on the top of several RAIDs > > depending on the Gluster configuration you have (in a RAID 1 config > > the same data is replicated 4 times). > > > > Yet another downside of having a RAID (specially RAID 5 or 6) is that > > it reduces considerably the write speeds as each group of disks will > > end up having the write speed of a single disk as all other disks of > > that group have to wait for each other to write as well. > > > > Therefore if Gluster already replicates data why does it create this > > big pain you mentioned if the data is replicated somewhere else, can > > still be retrieved to both serve clients and reconstruct the > > equivalent disk when it is replaced ? > > > > Fernando > > > > > > On 07/08/2017 10:26, Erekle Magradze wrote: > >> > >> Hi Frenando, > >> > >> Here is my experience, if you consider a particular hard drive as a > >> brick for gluster volume and it dies, i.e. it becomes not accessible > >> it's a huge hassle to discard that brick and exchange with another > >> one, since gluster some tries to access that broken brick and it's > >> causing (at least it cause for me) a big pain, therefore it's better > >> to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, > >> in this case if the disk is down you can easily exchange it and > >> rebuild the RAID without going offline, i.e switching off the volume > >> doing brick manipulations and switching it back on. > >> > >> Cheers > >> > >> Erekle > >> > >> > >> On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote: > >>> > >>> For any RAID 5 or 6 configuration I normally follow a simple gold > >>> rule which gave good results so far: > >>> - up to 4 disks RAID 5 > >>> - 5 or more disks RAID 6 > >>> > >>> However I didn't really understand well the recommendation to use > >>> any RAID with GlusterFS. I always thought that GlusteFS likes to > >>> work in JBOD mode and control the disks (bricks) directlly so you > >>> can create whatever distribution rule you wish, and if a single disk > >>> fails you just replace it and which obviously have the data > >>> replicated from another. The only downside of using in this way is > >>> that the replication data will be flow accross all servers but that > >>> is not much a big issue. > >>> > >>> Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS. > >>> > >>> Thanks > >>> Regards > >>> Fernando > >>> > >>> > >>> On 07/08/2017 03:46, Devin Acosta wrote: > >>>> > >>>> Moacir, > >>>> > >>>> I have recently installed multiple Red Hat Virtualization hosts for > >>>> several different companies, and have dealt with the Red Hat > >>>> Support Team in depth about optimal configuration in regards to > >>>> setting up GlusterFS most efficiently and I wanted to share with > >>>> you what I learned. > >>>> > >>>> In general Red Hat Virtualization team frowns upon using each DISK > >>>> of the system as just a JBOD, sure there is some protection by > >>>> having the data replicated, however, the recommendation is to use > >>>> RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least. > >>>> > >>>> Here is the direct quote from Red Hat when I asked about RAID and > >>>> Bricks: > >>>> / > >>>> / > >>>> /"A typical Gluster configuration would use RAID underneath the > >>>> bricks. RAID 6 is most typical as it gives you 2 disk failure > >>>> protection, but RAID 5 could be used too. Once you have the RAIDed > >>>> bricks, you'd then apply the desired replication on top of that. > >>>> The most popular way of doing this would be distributed replicated > >>>> with 2x replication. In general you'll get better performance with > >>>> larger bricks. 12 drives is often a sweet spot. Another option > >>>> would be to create a separate tier using all SSD?s.? / > >>>> > >>>> /In order to SSD tiering from my understanding you would need 1 x > >>>> NVMe drive in each server, or 4 x SSD hot tier (it needs to be > >>>> distributed, replicated for the hot tier if not using NVME). So > >>>> with you only having 1 SSD drive in each server, I?d suggest maybe > >>>> looking into the NVME option. / > >>>> / > >>>> / > >>>> /Since your using only 3-servers, what I?d probably suggest is to > >>>> do (2 Replicas + Arbiter Node), this setup actually doesn?t require > >>>> the 3rd server to have big drives at all as it only stores > >>>> meta-data about the files and not actually a full copy. / > >>>> / > >>>> / > >>>> /Please see the attached document that was given to me by Red Hat > >>>> to get more information on this. Hope this information helps you./ > >>>> / > >>>> / > >>>> > >>>> -- > >>>> > >>>> Devin Acosta, RHCA, RHVCA > >>>> Red Hat Certified Architect > >>>> > >>>> On August 6, 2017 at 7:29:29 PM, Moacir Ferreira > >>>> (moacirferre...@hotmail.com <mailto:moacirferre...@hotmail.com > <moacirferre...@hotmail.com>>) wrote: > >>>> > >>>>> I am willing to assemble a oVirt "pod", made of 3 servers, each > >>>>> with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The > >>>>> idea is to use GlusterFS to provide HA for the VMs. The 3 servers > >>>>> have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to > >>>>> create a loop like a server triangle using the 40Gb NICs for > >>>>> virtualization files (VMs .qcow2) access and to move VMs around > >>>>> the pod (east /west traffic) while using the 10Gb interfaces for > >>>>> giving services to the outside world (north/south traffic). > >>>>> > >>>>> > >>>>> This said, my first question is: How should I deploy GlusterFS in > >>>>> such oVirt scenario? My questions are: > >>>>> > >>>>> > >>>>> 1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, > >>>>> and then create a GlusterFS using them? > >>>>> > >>>>> 2 - Instead, should I create a JBOD array made of all server's disks? > >>>>> > >>>>> 3 - What is the best Gluster configuration to provide for HA while > >>>>> not consuming too much disk space? > >>>>> > >>>>> 4 - Does a oVirt hypervisor pod like I am planning to build, and > >>>>> the virtualization environment, benefits from tiering when using a > >>>>> SSD disk? And yes, will Gluster do it by default or I have to > >>>>> configure it to do so? > >>>>> > >>>>> > >>>>> At the bottom line, what is the good practice for using GlusterFS > >>>>> in small pods for enterprises? > >>>>> > >>>>> > >>>>> You opinion/feedback will be really appreciated! > >>>>> > >>>>> Moacir > >>>>> > >>>>> _______________________________________________ > >>>>> Users mailing list > >>>>> Users@ovirt.org <mailto:Users@ovirt.org <Users@ovirt.org>> > >>>>> http://lists.ovirt.org/mailman/listinfo/users > >>>> > >>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> Users@ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/users > >>> > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list > >>> Users@ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/users > >> > > > > -- > Recogizer Group GmbH > > Dr.rer.nat. Erekle Magradze > Lead Big Data Engineering & DevOps > Rheinwerkallee 2, 53227 Bonn > Tel: +49 228 29974555 <+49%20228%2029974555> > > E-Mail erekle.magra...@recogizer.de > Web: www.recogizer.com > > Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/ > Folgen Sie uns auf Twitter https://twitter.com/recogizer > > ----------------------------------------------------------------- > Recogizer Group GmbH > Gesch?ftsf?hrer: Oliver Habisch, Carsten Kreutze > Handelsregister: Amtsgericht Bonn HRB 20724 > Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993 > > Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte > Informationen. > Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich > erhalten haben, > informieren Sie bitte sofort den Absender und l?schen Sie diese Mail. > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der > darin enthaltenen Informationen ist nicht gestattet. > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://lists.ovirt.org/pipermail/users/attachments/ > 20170807/1a5c2ac2/attachment.html> > > ------------------------------ > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > End of Users Digest, Vol 71, Issue 37 > ************************************* > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users