On Mon, Jun 12, 2017 at 9:54 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> On Mon, Jun 12, 2017 at 9:44 PM, John D. Ament <johndam...@apache.org> wrote:
>> On Mon, Jun 12, 2017 at 9:24 PM Sam Ruby <ru...@intertwingly.net> wrote:
>>
>>> On Mon, Jun 12, 2017 at 9:06 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>> > On Mon, Jun 12, 2017 at 7:59 PM, John D. Ament <johndam...@apache.org>
>>> wrote:
>>> >> On Mon, Jun 12, 2017 at 7:55 PM Sam Ruby <ru...@intertwingly.net>
>>> wrote:
>>> >>
>>> >>> On Mon, Jun 12, 2017 at 7:44 PM,  <johndam...@apache.org> wrote:
>>> >>> > ---
>>> >>> >  lib/whimsy/asf/svn.rb         | 11 +++++++++++
>>> >>> >  www/roster/public_podlings.rb |  7 ++++++-
>>> >>> >  2 files changed, 17 insertions(+), 1 deletion(-)
>>> >>> >
>>> >>> > diff --git a/lib/whimsy/asf/svn.rb b/lib/whimsy/asf/svn.rb
>>> >>> > index 134609c..64a596e 100644
>>> >>> > --- a/lib/whimsy/asf/svn.rb
>>> >>> > +++ b/lib/whimsy/asf/svn.rb
>>> >>> > @@ -141,6 +141,17 @@ module ASF
>>> >>> >        return revision, content
>>> >>> >      end
>>> >>> >
>>> >>> > +    def self.updateSimple(path)
>>> >>> > +      cmd = ['svn', 'update', path, '--non-interactive']
>>> >>>
>>> >>> This will undoubtedly fail as the $apache::user (www-data) does not
>>> >>> have write access to those directories.
>>> >>
>>> >> Err so should we run cron as whimsysvn ?
>>> >
>>> > That's indeed possible, but then it probably can't write to the web
>>> directory.
>>> >
>>> > Also from reading, bad things can happen if two processes are updating
>>> > the same directory at the same time.  This can be fixed via file
>>> > locking.  My gitpubsub logic solves this by running the puppet agent
>>> > itself, and puppet ensures that there is only one agent running at one
>>> > time.
>>> >
>>> > I learned all this the hard way on the original whimsy_vm where
>>> > directories often got 'wedged' and needed manual intervention for
>>> > cleanup.  That's why I instituted a hard separation between what can
>>> > be updated in each process.
>>>
>>> Adding to my answer: this decision (which can be changed if that what
>>> we collectively want to do) was to prefer slightly stale data over
>>> data that (at best) might occasionally stop updating, and (at worst)
>>> can become corrupt.
>>>
>>> The /srv/svn files update every 10 minutes.  For most purposes, that
>>> is fast enough.
>>>
>>> Programs like the board agenda tool, the secretary mail tool, and now
>>> the roster take great care to update svn in separate tmp directories.
>>>
>> This is a very valuable piece of information.  My main concern isn't roster
>> but instead the podlings information.
>>
>> Shane and I were jokingly talking about this on hipchat - we should switch
>> all of this to be pubsub.  I'm more convinced that this is correct.
>
> You would still need to use flock(*) or equivalent, but definitely doable.
>
> The code for pubsub is basically the same for svn as it is for git.
> The only real difference is that the notification is 'commit' instead
> of 'push'.
>
> https://github.com/apache/whimsy/blob/master/tools/pubsub.rb
>
> The other thing to be aware of is that pubsub is only available for
> publicly readable sources.  So things like foundation and documents
> can't be done this way.
>
>> Where's the logic that clones/svn's in a tmp directory?
>
> Plenty of places.  Here is one:
>
> https://github.com/apache/whimsy/blob/master/www/roster/views/actions/ppmc.json.rb#L71
>
> "git grep tmpdir" to find more.

Another thought that should at least work for the podlings.xml case:

podlings_xml =  `svn cat
https://svn.apache.org/repos/asf/incubator/public/trunk/content/podlings.xml`

No flock.  No temp dirs.  No chance of wedging/corrupting existing directories.

>>> - Sam Ruby
>
> (*) https://ruby-doc.org/core-2.4.0/File.html#method-i-flock

- Sam Ruby

Reply via email to