On Mon, Jun 12, 2017 at 9:54 PM, Sam Ruby <ru...@intertwingly.net> wrote: > On Mon, Jun 12, 2017 at 9:44 PM, John D. Ament <johndam...@apache.org> wrote: >> On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote: >> >>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote: >>> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <johndam...@apache.org> >>> wrote: >>> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net> >>> wrote: >>> >> >>> >>> On Mon, Jun 12, 2017 at 7:44 PM, <johndam...@apache.org> wrote: >>> >>> > --- >>> >>> > lib/whimsy/asf/svn.rb | 11 +++++++++++ >>> >>> > www/roster/public_podlings.rb | 7 ++++++- >>> >>> > 2 files changed, 17 insertions(+), 1 deletion(-) >>> >>> > >>> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb >>> >>> > index 134609c..64a596e 100644 >>> >>> > --- a/lib/whimsy/asf/svn.rb >>> >>> > +++ b/lib/whimsy/asf/svn.rb >>> >>> > @@ -141,6 +141,17 @@ module ASF >>> >>> > return revision, content >>> >>> > end >>> >>> > >>> >>> > + def self.updateSimple(path) >>> >>> > + cmd = ['svn', 'update', path, '--non-interactive'] >>> >>> >>> >>> This will undoubtedly fail as the $apache::user (www-data) does not >>> >>> have write access to those directories. >>> >> >>> >> Err so should we run cron as whimsysvn ? >>> > >>> > That's indeed possible, but then it probably can't write to the web >>> directory. >>> > >>> > Also from reading, bad things can happen if two processes are updating >>> > the same directory at the same time. This can be fixed via file >>> > locking. My gitpubsub logic solves this by running the puppet agent >>> > itself, and puppet ensures that there is only one agent running at one >>> > time. >>> > >>> > I learned all this the hard way on the original whimsy_vm where >>> > directories often got 'wedged' and needed manual intervention for >>> > cleanup. That's why I instituted a hard separation between what can >>> > be updated in each process. >>> >>> Adding to my answer: this decision (which can be changed if that what >>> we collectively want to do) was to prefer slightly stale data over >>> data that (at best) might occasionally stop updating, and (at worst) >>> can become corrupt. >>> >>> The /srv/svn files update every 10 minutes. For most purposes, that >>> is fast enough. >>> >>> Programs like the board agenda tool, the secretary mail tool, and now >>> the roster take great care to update svn in separate tmp directories. >>> >> This is a very valuable piece of information. My main concern isn't roster >> but instead the podlings information. >> >> Shane and I were jokingly talking about this on hipchat - we should switch >> all of this to be pubsub. I'm more convinced that this is correct. > > You would still need to use flock(*) or equivalent, but definitely doable. > > The code for pubsub is basically the same for svn as it is for git. > The only real difference is that the notification is 'commit' instead > of 'push'. > > https://github.com/apache/whimsy/blob/master/tools/pubsub.rb > > The other thing to be aware of is that pubsub is only available for > publicly readable sources. So things like foundation and documents > can't be done this way. > >> Where's the logic that clones/svn's in a tmp directory? > > Plenty of places. Here is one: > > https://github.com/apache/whimsy/blob/master/www/roster/views/actions/ppmc.json.rb#L71 > > "git grep tmpdir" to find more.
Another thought that should at least work for the podlings.xml case: podlings_xml = `svn cat https://svn.apache.org/repos/asf/incubator/public/trunk/content/podlings.xml` No flock. No temp dirs. No chance of wedging/corrupting existing directories. >>> - Sam Ruby > > (*) https://ruby-doc.org/core-2.4.0/File.html#method-i-flock - Sam Ruby