Re: [CentOS] SOS: Production VM not starting!
On 11/12/2012 4:33 πμ, Markus Falb wrote: > I had a look at your sreenshot. Output stops at the moment init is > taking over. I suspect that console output is going elsewhere, maybe to > a serial console. That way it could well be that the machine is doing > something but you just can not see it. > > My first bet would have been a fsck Thanks, I think you are probably right. This VM features a large (virtual) data hard disk, and I found that it was mounted (in /etc/fstab) with autocheck options. Therefore, to avoid this problem in the future, I changed to "0 0" options. I had already suspected this (an auto fsck) might be the case, but in such cases in the past (with other VMs), the process was visible in the virtual console, while in this case apparently it was not. However, I did not find in /var/log/messages any instance of fsck checks during loading. Thanks again. Regards, Nick ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] SOS: Production VM not starting!
On 12/12/2012 7:37 πμ, Gordon Messmer wrote: > On 12/10/2012 05:01 PM, Nikolaos Milas wrote: > >> I still wonder what caused that delay. > What does "getenforce" output? It sort of looks like you went from an > SELinux-disabled configuration to an enforcing or permissive > configuration and required a relabel. > Thank you for helping find the cause of this behavior. SELinux was always disabled (and still is) on that VM: # getenforce Disabled Any other ideas would be appreciated. Regards, Nick ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] fixing partition alignment ?
Hi there, I've discovered that most of the hard drives used in our cluster got misaligned partitions, thus crippling perfs. Is there any way to fix that without having to delete/recreate properly aligned partitions, then format it and refill disks ? I'd be glad not to have to toy with moving several 10s of TB disk by disk :D (most disks are JBOD as we're using a fault tolerant network FS, moosefs not to name it). wasn't helpful, unfortunately. Drives are ext4, driven by C6 x86_64. Thanks, Laurent. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CONFIG_ARPD turned on in centosplus kernel.
On 05/17/2012 03:13 PM, Akemi Yagi wrote: > On Thu, May 17, 2012 at 11:33 AM, Steve Clark wrote: >> On 05/05/2012 04:45 PM, Akemi Yagi wrote: >> >> On Sat, May 5, 2012 at 12:40 PM, Steve Clark wrote: >> >> http://bugs.centos.org/view.php?id=5709 >> >> I actually took the latest centosplus kernel srpm and got it going in my >> environment (would like to have >> semi official support though ;-) ). It has been running stable for a >> couple of weeks now. >> >> I have 4 systems setup using it and opennhrp in a DMVPN hub and spoke >> arrangement. >> I have 2 hubs with each of the spokes connected to each of the hubs. The >> spokes would be units in the field while >> the hubs would be at HQ in a failover environment using OSPF amongst them >> all. >> >> I've updated the RFE you filed with some info. Will be nice if you >> also add the above note in there. >> >> Akemi >> ___ >> >> Thanks Akemi, >> >> I am just preparing to test the new kernel. > That's great. I closed the bug report thinking there would be no > response. I will reopen it so that you can add your test result. > > Akemi > Hi Akemi, I just downloaded the latest plus kernel and much to my chagrin CONFIG_ARPD is not set. Is there a reason this has been turned back off? Opennhrp will not work correctly with it off. # egrep ARPD config-2.6.32-279.14.1.el6.centos.plus.i686 # CONFIG_ARPD is not set Wed Dec 12 10:38:56 EST 2012 Z703108:/boot # egrep ARPD config-2.6.32-220.17.1.el6.centos.plus.i686 CONFIG_ARPD=y Wed Dec 12 10:39:27 EST 2012 Regards, Steve -- Stephen Clark *NetWolves* Director of Technology Phone: 813-579-3200 Fax: 813-882-0209 Email: steve.cl...@netwolves.com http://www.netwolves.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CONFIG_ARPD turned on in centosplus kernel.
On Wed, Dec 12, 2012 at 7:41 AM, Steve Clark wrote: > On 05/17/2012 03:13 PM, Akemi Yagi wrote: > > On Thu, May 17, 2012 at 11:33 AM, Steve Clark wrote: > > On 05/05/2012 04:45 PM, Akemi Yagi wrote: > > On Sat, May 5, 2012 at 12:40 PM, Steve Clark wrote: > > http://bugs.centos.org/view.php?id=5709 > > I actually took the latest centosplus kernel srpm and got it going in my > environment (would like to have > semi official support though ;-) ). It has been running stable for a > couple of weeks now. > > I have 4 systems setup using it and opennhrp in a DMVPN hub and spoke > arrangement. > I have 2 hubs with each of the spokes connected to each of the hubs. The > spokes would be units in the field while > the hubs would be at HQ in a failover environment using OSPF amongst them > all. > > I've updated the RFE you filed with some info. Will be nice if you > also add the above note in there. > > Akemi > ___ > > Thanks Akemi, > > I am just preparing to test the new kernel. > > That's great. I closed the bug report thinking there would be no > response. I will reopen it so that you can add your test result. > > Akemi > > Hi Akemi, > > I just downloaded the latest plus kernel and much to my chagrin CONFIG_ARPD > is not set. > Is there a reason this has been turned back off? Opennhrp will not work > correctly with it off. > > # egrep ARPD config-2.6.32-279.14.1.el6.centos.plus.i686 > # CONFIG_ARPD is not set > Wed Dec 12 10:38:56 EST 2012 > Z703108:/boot > > # egrep ARPD config-2.6.32-220.17.1.el6.centos.plus.i686 > CONFIG_ARPD=y > Wed Dec 12 10:39:27 EST 2012 > > Regards, > Steve Hi Steve, Sorry about that. I apparently missed it when doing the config files for CentOS 6.3. So, all 6.3 cplus kernels have that option disabled. :( I reopened the bug 5709. Could you add a note there so we can track this issue? I will try to get this corrected on the next kernel update. Akemi ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] CONFIG_ARPD turned on in centosplus kernel.
On 12/12/2012 11:02 AM, Akemi Yagi wrote: > On Wed, Dec 12, 2012 at 7:41 AM, Steve Clark wrote: >> On 05/17/2012 03:13 PM, Akemi Yagi wrote: >> >> On Thu, May 17, 2012 at 11:33 AM, Steve Clark wrote: >> >> On 05/05/2012 04:45 PM, Akemi Yagi wrote: >> >> On Sat, May 5, 2012 at 12:40 PM, Steve Clark wrote: >> >> http://bugs.centos.org/view.php?id=5709 >> >> I actually took the latest centosplus kernel srpm and got it going in my >> environment (would like to have >> semi official support though ;-) ). It has been running stable for a >> couple of weeks now. >> >> I have 4 systems setup using it and opennhrp in a DMVPN hub and spoke >> arrangement. >> I have 2 hubs with each of the spokes connected to each of the hubs. The >> spokes would be units in the field while >> the hubs would be at HQ in a failover environment using OSPF amongst them >> all. >> >> I've updated the RFE you filed with some info. Will be nice if you >> also add the above note in there. >> >> Akemi >> ___ >> >> Thanks Akemi, >> >> I am just preparing to test the new kernel. >> >> That's great. I closed the bug report thinking there would be no >> response. I will reopen it so that you can add your test result. >> >> Akemi >> >> Hi Akemi, >> >> I just downloaded the latest plus kernel and much to my chagrin CONFIG_ARPD >> is not set. >> Is there a reason this has been turned back off? Opennhrp will not work >> correctly with it off. >> >> # egrep ARPD config-2.6.32-279.14.1.el6.centos.plus.i686 >> # CONFIG_ARPD is not set >> Wed Dec 12 10:38:56 EST 2012 >> Z703108:/boot >> >> # egrep ARPD config-2.6.32-220.17.1.el6.centos.plus.i686 >> CONFIG_ARPD=y >> Wed Dec 12 10:39:27 EST 2012 >> >> Regards, >> Steve > Hi Steve, > > Sorry about that. I apparently missed it when doing the config files > for CentOS 6.3. So, all 6.3 cplus kernels have that option disabled. > :( > > I reopened the bug 5709. Could you add a note there so we can track > this issue? I will try to get this corrected on the next kernel > update. > > Akemi > Hi Akemi, I already added a note. Thanks a lot for doing this. Regards, Steve -- Stephen Clark *NetWolves* Director of Technology Phone: 813-579-3200 Fax: 813-882-0209 Email: steve.cl...@netwolves.com http://www.netwolves.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] SOS: Production VM not starting!
On 12.12.2012 11:51, Nikolaos Milas wrote: > On 11/12/2012 4:33 πμ, Markus Falb wrote: > >> I suspect that console output is going elsewhere, maybe to >> a serial console. That way it could well be that the machine is doing >> something but you just can not see it. >> >> My first bet would have been a fsck > However, I did not find in /var/log/messages any instance of fsck checks > during loading. You will never find fscks in /var/log/messages. fsck happens too early in the boot process, syslog is not yet running. There is a mechanism to log this early stuff though. What you could have seen at the console while booting is also in /var/log/boot.log. With CentOS 6 this is working. Sadly, boot.log on my CentOS 5 machines is empty and so will be yours. https://bugzilla.redhat.com/show_bug.cgi?id=223446 -- Kind Regards, Markus Falb signature.asc Description: OpenPGP digital signature ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] SOS: Production VM not starting!
On 12/12/2012 7:35 μμ, Markus Falb wrote: > Sadly, boot.log on my CentOS 5 machines is empty and so will be yours. Yes, I had checked already, it's always 0 size... Thanks for your info. Nick ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 1:58 AM, Nicolas KOWALSKI wrote: > On Mon, Dec 10, 2012 at 11:37:50AM -0600, Matt Garman wrote: >> OS is CentOS 5.6, home directory partition is ext3, with options >> “rw,data=journal,usrquota”. > > Is the data=journal option really wanted here? Did you try with the > other journalling modes available? I also think you are missing the > noatime option here. Short answer: I don't know. Intuitively, it seems like it's not the right thing. However, there are a number of articles out there[1], that say in data=journal may improve performance dramatically, in cases where there is a both a lot of reading and writing. That's what a home directory server is to me: a lot of reading and writing. However, I haven't seen any tool or mechanism for precisely quantifying when data=journal will improve performance; everyone just says "change it and test". Unfortunately, in my situation, I didn't have the luxury of testing, because things were unusable "now". [1] for example: http://www.ibm.com/developerworks/linux/library/l-fs8/index.html ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 2:24 PM, Dan Young wrote: > Just going to throw this out there. What is RPCNFSDCOUNT in > /etc/sysconfig/nfs? It was 64 (upped from the default of... 8 I think). ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Tue, Dec 11, 2012 at 4:01 PM, Steve Thompson wrote: > This is in fact a very interesting question. The default value of > RPCNFSDCOUNT (8) is in my opinion way too low for many kinds of NFS > servers. My own setup has 7 NFS servers ranging from small ones (7 TB disk > served) to larger ones (25 TB served), and there are about 1000 client > cores making use of this. After spending some time looking at NFS > performance problems, I discovered that the number of nfsd's had to be > much higher to prevent stalls. On the largest servers I now use 256-320 > nfsd's, and 64 nfsd's on the very smallest ones. Along with suitable > adjustment of vm.dirty_ratio and vm.dirty_background_ratio, this makes a > huge difference. Could you perhaps elaborate a bit on your scenario? In particular, how much memory and CPU cores do the servers have with the really high NFSD counts? Is there a rule of thumb for nfsd counts relative to the system specs? Or, like so many IO tuning situations, just a matter of "test and see"? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Wed, Dec 12, 2012 at 12:29 AM, Gordon Messmer wrote: > That may be difficult at this point, because you really want to start by > measuring the number of IOPS. That's difficult to do if your > applications demand more than your hardware currently provices. Since my original posting, we temporarily moved the data from the centos 5 server to the centos 6 server. We rebuilt the original (slow) server with centos 6, then migrated the data back. So far (fingers crossed) so good. I'm running a constant "iostat -kx 30", and logging it to a file. Disk utilization is virtually always under 50%. Random spikes in the 90% range, but they are few and far between. Now that it appears the hardware + software configuration can handle the load. So I still have the same question: how I can accurately *quantify* the kind of IO load these servers have? I.e., how to measure IOPS? > This might not be the result of your NFS server performance. You might > actually be seeing bad performance in your directory service. What are > you using for that service? LDAP? NIS? Are you running nscd or sssd > on the clients? Not using a directory service (manually sync'ed passwd files, and kerberos for authentication). Not running nscd or sssd. > RAID 6 is good for $/GB, but bad for performance. If you find that your > performance is bad, RAID10 will offer you a lot more IOPS. > > Mixing 15k drives with RAID-6 is probably unusual. Typically 15k drives > are used when the system needs maximum IOPS, and RAID-6 is used when > storage capacity is more important than performance. > > It's also unusual to see a RAID-6 array with a hot spare. You already > have two disks of parity. At this point, your available storage > capacity is only 600GB greater than a RAID-10 configuration, but your > performance is MUCH worse. I agree with all that. Problem is, there is a higher risk of storage failure with RAID-10 compared to RAID-6. We do have good, reliable *data* backups, but no real hardware backup. Our current service contract on the hardware is next business day. That's too much down time to tolerate with this particular system. As I typed that, I realized we technically do have a hardware backup---the other server I mentioned. But even the time to restore from backup would make a lot of people extremely unhappy. How do most people handle this kind of scenario, i.e. can't afford to have a hardware failure for any significant length of time? Have a whole redundant system in place? I would have to "sell" the idea to management, and for that, I'd need to precisely quantify our situation (i.e. my initial question). >> OS is CentOS 5.6, home >> directory partition is ext3, with options “rw,data=journal,usrquota”. > > data=journal actually offers better performance than the default in some > workloads, but not all. You should try the default and see which is > better. With a hardware RAID controller that has battery backed write > cache, data=journal should not perform any better than the default, but > probably not any worse. Right, that was mentioned in another response. Unfortunately, I don't have the ability to test this. My only system is the real production system. I can't afford the interruption to the users while I fully unmount and mount the partition (can't change data= type with remount). In general, it seems like a lot of IO tuning is "change parameter, then test". But (1) what test? It's hard to simulate a very random/unpredictable workload like user home directories, and (2) what to test on when one only has the single production system? I wish there were more "analytic" tools where you could simply measure a number of attributes, and from there, derive the ideal settings and configuration parameters. > If your drives are really 4k sectors, rather than the reported 512B, > then they're not optimal and writes will suffer. The best policy is to > start your first partition at 1M offset. parted should be aligning > things well if it's updated, but if your partition sizes (in sectors) > are divisible by 8, you should be in good shape. It appears that centos 6 does the 1M offset by default. Centos 5 definitely doesn't do that. Anyway... as I suggested above, the problem appears to be resolved... But the "fix" was kind of a shotgun approach, i.e. I changed too many things at once to know exactly what specific item fixed the problem. I'm sure this will inevitably come up again at some point, so I'd still like to learn/understand more to better handle the situation next time. Thanks! ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Wed, Dec 12, 2012 at 1:52 PM, Matt Garman wrote: >> > I agree with all that. Problem is, there is a higher risk of storage > failure with RAID-10 compared to RAID-6. Does someone have the real odds here? I think the big risks are always that you have unnoticed bad sectors on the remaining mirror/parity drive when you lose a disk or that you keep running long enough to develop them before replacing it. > We do have good, reliable > *data* backups, but no real hardware backup. Our current service > contract on the hardware is next business day. That's too much down > time to tolerate with this particular system. > > As I typed that, I realized we technically do have a hardware > backup---the other server I mentioned. But even the time to restore > from backup would make a lot of people extremely unhappy. > > How do most people handle this kind of scenario, i.e. can't afford to > have a hardware failure for any significant length of time? Have a > whole redundant system in place? I would have to "sell" the idea to > management, and for that, I'd need to precisely quantify our situation > (i.e. my initial question). The simple-minded approach is to have a spare chassis and some spare drives to match your critical boxes. The most likely thing to go are the drives so all you have to do is rebuild the raid. In less likely event of a chassis failure, you can swap the drives into a spare a lot faster than copying the data. You only need a few spares to cover the likely failures across many production boxes but storage servers might be a special case with a different chassis type. You are still going to have some downtime with this approach, though - and it works best where you have operations staff on site to do the swaps. Also, you need to test it to be sure you understand what you have to change to make the system come up with new NIC's, etc. -- Les Mikesell lesmikes...@gmail.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On 12/12/2012 12:16 PM, Les Mikesell wrote: > On Wed, Dec 12, 2012 at 1:52 PM, Matt Garman wrote: >>> >> >> >I agree with all that. Problem is, there is a higher risk of storage >> >failure with RAID-10 compared to RAID-6. > Does someone have the real odds here? I think the big risks are > always that you have unnoticed bad sectors on the remaining > mirror/parity drive when you lose a disk or that you keep running long > enough to develop them before replacing it. > a decent raid system does periodic 'scrubs' where in the background (when otherwise idle), it reads all the disks and verifies the raid. any marginal sectors should get detected and remapped at this point. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Wed, 12 Dec 2012, Matt Garman wrote: > Could you perhaps elaborate a bit on your scenario? In particular, > how much memory and CPU cores do the servers have with the really high > NFSD counts? Is there a rule of thumb for nfsd counts relative to the > system specs? Or, like so many IO tuning situations, just a matter of > "test and see"? My NFS servers that run 256 nfsd's have four cores (Xeon, 3.16 GHz) and 16 GB memory, with three incoming network segments on which the clients live (each of which is a dual bonded GbE link). I don't know of any rule of thumb; indeed I am using 256 nfsd's at the moment because that is the nature of the current workload. It might be different in a few month's time, especially as we add more clients. Indeed I started with 64 nfsd's and kept adding more until the NFS stalls essentially stopped. Steve ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
Matt Garman wrote: > On Wed, Dec 12, 2012 at 12:29 AM, Gordon Messmer > wrote: > As I typed that, I realized we technically do have a hardware > backup---the other server I mentioned. But even the time to restore > from backup would make a lot of people extremely unhappy. > > How do most people handle this kind of scenario, i.e. can't afford to > have a hardware failure for any significant length of time? Have a > whole redundant system in place? I would have to "sell" the idea to > management, and for that, I'd need to precisely quantify our situation > (i.e. my initial question). About selling it: ask them to consider what happens if one goes down... and "next day" service means someone shows up the next day (if you convince the OEM that you need on-site support). That does *not* guarantee that the server will be back up that next day (there was a Dell box that we replaced the m/b three, the second one was dead, and they finally just replaced the box because no one could figure out what was going wrong, but that was two weeks or so). *Then*, once it's up, you get to restore everything to production. Try a tabletop exercise, as we have to do once a year, on what do we do with two or three scenarios, and guesstimate time for each. That might scare management into buying more hardware. As long as they don't freeze your salary or lay someone off to pay for it mark ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] home directory server performance issues
On Wed, Dec 12, 2012 at 2:24 PM, John R Pierce wrote: > >>> >I agree with all that. Problem is, there is a higher risk of storage >>> >failure with RAID-10 compared to RAID-6. >> Does someone have the real odds here? I think the big risks are >> always that you have unnoticed bad sectors on the remaining >> mirror/parity drive when you lose a disk or that you keep running long >> enough to develop them before replacing it. >> > > a decent raid system does periodic 'scrubs' where in the background > (when otherwise idle), it reads all the disks and verifies the raid. > any marginal sectors should get detected and remapped at this point. Yes, but if you are doing it in software you need to make sure that the functionality is enabled and you are getting notifications from smartmon or something about the disk health - and in hardware you need some sort of controller-specific monitor running to track the drive health. -- Les Mikesell lesmikes...@gmail.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] fixing partition alignment ?
On 12/12/2012 09:36 AM, Laurent Wandrebeck wrote: > I've discovered that most of the hard drives used in our cluster got > misaligned partitions, thus crippling perfs. Is there any way to fix > that without having to delete/recreate properly aligned partitions, then > format it and refill disks ? > I'd be glad not to have to toy with moving several 10s of TB disk by > disk :D (most disks are JBOD as we're using a fault tolerant network > FS, moosefs not to name it). The data is going to have to be moved. There's just no way around it, and shifting it a few sectors one way or the other is going to be no faster, and a lot more dangerous, than copying it to a new partition. Copying to a different drive should be somewhat faster than copying to the same drive since neither drive needs to share its controller with both the read and write streams. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] fixing partition alignment ?
On Wed, Dec 12, 2012 at 9:36 AM, Laurent Wandrebeck wrote: > > I've discovered that most of the hard drives used in our cluster got > misaligned partitions, thus crippling perfs. Is there any way to fix > that without having to delete/recreate properly aligned partitions, then > format it and refill disks ? > I'd be glad not to have to toy with moving several 10s of TB disk by > disk :D (most disks are JBOD as we're using a fault tolerant network > FS, moosefs not to name it). > wasn't helpful, unfortunately. > Drives are ext4, driven by C6 x86_64. Hmmm, might be a fun test of your fault-tolerance to remove disks one at a time and add them back empty with the partitions properly aligned. Maybe... -- Les Mikesell lesmikes...@gmail.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] trying to get the debug version of httpd so I can use it in conjunction with gdb.
On 11.12.2012 10:15, Leon Fauster wrote: > Am 11.12.2012 um 03:24 schrieb Zippy Zeppoli: >> I am trying to get the debug version of httpd so I can use it in >> conjunction with gdb. I am having a hard time getting them, and they don't >> seem to be in the standard epel-debuginfo repository. What should I do? > > > http://debuginfo.centos.org/ yes, and there is also an yum repo for it, but disabled $ yum --enablerepo=debug list httpd-debuginfo -- Kind Regards, Markus Falb signature.asc Description: OpenPGP digital signature ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos