Savannah Hackers. Two things are happening. To not totally "bury the lead"... We are getting an additional vcs1 VM for the cvs files.
First problem is that previously all of the vcs files were rolled off of the old vcs system and onto the new nfs1 for storage and the new vcs0 was the frontend server for it. All mostly good. Things appeared to work great for a while. But the "mostly" part was the problem. Things worked for a bit but then we discovered that the www team was then unable to update web pages. The file ACLs were not working from nfs1+vcs0. Therefore had to roll the CVS data back onto the old vcs+vcs0 combination again where ACLs worked. All of the other data remains on nfs1. Only the CVS data, needing file ACLs, was bungied back onto the old vcs system. And currently that is the only data requiring the old vcs system as a dependency. The files of trouble are the web page cvs directories plus the web team's cvs womb project directory. (This is not a cvs problem but an access problem and it just happens to be that it is cvs. If the files were in git then it would be the same problem for git.) The thousand plus projects all have their web pages in cvs. Every project has a Unix project group. Each project member is a member of that project's Unix group and accesses their files by being in their project group. There are thousands of Unix project groups, one per project. vcs:/# ls -l /web | head total 15496 drwxrwsr-x+ 4 root 3dldf 4096 Jan 1 2004 3dldf drwxrwsr-x+ 4 root 7pages 4096 Jan 8 2005 7pages drwxrwsr-x+ 4 root 8sync 4096 Mar 9 2016 8sync drwxrwsr-x+ 4 root 9box 4096 Jan 8 2005 9box drwxrwsr-x+ 4 root AutismTools 4096 Aug 26 2014 AutismTools drwxrwsr-x+ 4 root a2ps 4096 Dec 30 2003 a2ps drwxrwsr-x+ 4 root aasm 4096 Dec 30 2003 aasm drwxrwsr-x+ 4 root abcsh 4096 Aug 2 2004 abcsh drwxrwsr-x+ 4 root abdabi 4096 Dec 30 2003 abdabi ls: write error: Broken pipe (And there is that pesky SIGPIPE being ignored problem too.) Note that each project exists in its own group. Now enter the GNU www team. There are a dozen people in the www team such as Therese <th_g> that need to make global changes to all of the web files. How do they access those web pages? If the same access controls were applied to them then they would need to be in ten thousand groups. That is not practical. Sylvain Beucler made this ChangeLog entry concerning the solution implemented for the www team. 2006-05-10 Beuc * Allowing group 'www' to edit GNU projects' webpages: switched from the webgroup model to ACLs: perl -MSavane -e 'print join("\n", GetGroupList("(type=1 or type=3 or type=6) and status=\"A\"","unix_group_name"))' | while read i; do find $i/$i -type d -print0 | xargs -0 setfacl -m default:group:www:rwx -m group:www:rwx; done Note the "+" in the sample listing above. File ACLs are in effect. Looking at a sample of one. root@vcs0:~# getfacl /net/oldvcs/web/coreutils getfacl: Removing leading '/' from absolute path names # file: net/oldvcs/web/coreutils # owner: root # group: coreutils # flags: -s- user::rwx group::rwx group:www:rwx mask::rwx other::r-x default:user::rwx default:group::rwx default:group:www:rwx default:mask::rwx default:other::r-x The file ACL gives additional access to the www group. With this configuration members of the www team can make changes to the web pages. Additionally they make use of the /sources/womb project too, apparently, and it needs the same file ACL configuration. The directories of interest are: root@vcs0:~# ll /srv/cvs/ total 0 lrwxrwxrwx 1 root root 25 Apr 10 16:27 sources -> ../../net/vcs/cvs/sources lrwxrwxrwx 1 root root 20 Apr 10 17:41 web -> ../../net/oldvcs/web vcs:/# du -sh /web /sources /sources/womb 23G /web 29G /sources 3.2M /sources/womb In /sources only /sources/womb uses file ACLs. But being located in /sources I moved them together. However all of /web uses file ACLs. It seems to be a difference between NFSv3 between vcs0-vcs and NFSv4 between vcs0-nfs1. File ACLs work with vcs0-vcs but fail with vcs0-nfs1. Here is a test showing it working and failing. root@vcs0:/# sudo -u th_g tee -a /net/oldvcs/web/test-project/CVSROOT/history < /dev/null root@vcs0:/# echo $? 0 That shows a success. But this fails for the nfs1 copy. root@vcs0:/# sudo -u th_g tee -a /net/vcs/cvs/web/test-project/CVSROOT/history < /dev/null tee: /net/vcs/cvs/web/test-project/CVSROOT/history: Permission denied root@vcs0:/# echo $? 1 Fails. root@nfs1:~# getfacl /srv/vcs/cvs/web/test-project/CVSROOT/history getfacl: Removing leading '/' from absolute path names # file: srv/vcs/cvs/web/test-project/CVSROOT/history # owner: root # group: test-project user::rw- group::rw- group:www:rw- mask::rw- other::r-- This shows that on nfs1 it thinks there are file ACLs. But on vcs0 these do not show up. root@vcs0:/# ll /net/vcs/cvs/web/test-project/CVSROOT/history -rw-rw-r-- 1 root test-project 257 Nov 14 2017 /net/vcs/cvs/web/test-project/CVSROOT/history No "+". root@vcs0:/# getfacl /net/vcs/cvs/web/test-project/CVSROOT/history getfacl: Removing leading '/' from absolute path names # file: net/vcs/cvs/web/test-project/CVSROOT/history # owner: root # group: test-project user::rw- group::rw- other::r-- No file ACL. But... Look at this: root@vcs0:/# nfs4_getfacl /net/vcs/cvs/web/test-project/CVSROOT/history A::OWNER@:rwatTcCy A::GROUP@:rwatcy A:g:1018:rwatcy A::EVERYONE@:rtcy root@vcs0:/# getent group 1018 www:x:1018:amachutechie,andriykopanytsia,araech,...,th_g,... So it seems to know about the file acl and seems to include the www group. But it does not work. However this also fails on nfs1 too. root@nfs1:~# sudo -u th_g tee -a /srv/vcs/cvs/web/test-project/CVSROOT/history < /dev/null tee: /srv/vcs/cvs/web/test-project/CVSROOT/history: Permission denied root@nfs1:~# ll /srv/vcs/cvs/web/test-project/CVSROOT/history -rw-rw-r--+ 1 root test-project 257 Nov 14 2017 /srv/vcs/cvs/web/test-project/CVSROOT/history But: root@nfs1:~# getfacl /srv/vcs/cvs/web/test-project/CVSROOT/history getfacl: Removing leading '/' from absolute path names # file: srv/vcs/cvs/web/test-project/CVSROOT/history # owner: root # group: test-project user::rw- group::rw- group:www:rw- mask::rw- other::r-- root@nfs1:~# id th_g uid=123774(th_g) gid=1003(svusers) groups=1003(svusers),1018(www),1458(audio-video),2312(trans-coord),6337(www-fr) So shouldn't that be working on nfs1 too? This is the point where I am stuck. WAT? Why isn't this working? But on to the second problem which is forcing the workaround. Because we are back on the old vcs and everything is working there. But that system really, really, *REALLY* must be obsoleted. It has severe OS problems. It won't reboot on it's own. Time to retire it. And that system is the last VM on the underlying host. That hardware needs to be repurposed. And there is the upcoming datacenter move which is also applying time pressure. So must apply a workaround for this in order to keep moving forward. We will still need to figure out the file ACL problem because eventually this will be a need for the git repositories too. And this is enough for this message. I'll continue in the next message. Bob