Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
Thanks a lot! On Tue, 26 Nov 2019 at 17:07, Sam Ruby wrote: > On Tue, Nov 26, 2019 at 10:16 AM sebb wrote: > > > > I have committed some code to extract the form data from ICLAs. > > > > For example: > > > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > > > It would be

Git checkouts could use depth 1

2019-11-26 Thread sebb
Seems to me that the git checkouts don't need full history, and could be checked out with depth 1 In the case of letsencrypt, the full checkout take nearly 9s and depth=1 takes almost 3s. The ratio of sizes on the disk is similar Any objections? S.

Re: Proposal for reducing directory checkouts by using svn ls

2019-11-26 Thread Sam Ruby
On Tue, Nov 26, 2019 at 12:23 PM sebb wrote: > > Update repositories.yml to use depth: empty for the relevant directory trees > The regular update cronjob will thus ensure that the directory has got the > current revision. > > When a listing is needed, Whimsy checks the revision of the checkout an

Proposal for reducing directory checkouts by using svn ls

2019-11-26 Thread sebb
Update repositories.yml to use depth: empty for the relevant directory trees The regular update cronjob will thus ensure that the directory has got the current revision. When a listing is needed, Whimsy checks the revision of the checkout and the revision in the listing file. If there is a mismat

Re: Does Whimsy need to have a copy of Bills?

2019-11-26 Thread Sam Ruby
On Tue, Nov 26, 2019 at 12:06 PM sebb wrote: > > My takeaway is that using 'svn ls' without recursion is OK. +1 > I've just done a check on member_apps and I get: > > $ time svn up member_apps/ > Updating 'member_apps': > At revision 93890. > > real 0m1.573s > user 0m0.055s > sys 0m0.027s > > $

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread Sam Ruby
On Tue, Nov 26, 2019 at 10:16 AM sebb wrote: > > I have committed some code to extract the form data from ICLAs. > > For example: > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > It would be useful if this could somehow be plugged into the workbench. > For example when a

Re: Does Whimsy need to have a copy of Bills?

2019-11-26 Thread sebb
My takeaway is that using 'svn ls' without recursion is OK. I've just done a check on member_apps and I get: $ time svn up member_apps/ Updating 'member_apps': At revision 93890. real 0m1.573s user 0m0.055s sys 0m0.027s $ time svn ls https://svn.apache.org/repos/private/documents/member_apps |

Re: ldap-map - is this still useful?

2019-11-26 Thread Shane Curcuru
sebb wrote on 2019-11-25 5:49AM EST: > The ldap-map.json file is updated by > https://whimsy.apache.org/committers/ldap-map > This is flagged as Alpha code. > > The JSON file does not appear to be used anywhere else, so I wonder if it > is still needed? That was a temporary tool I wrote to let co

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
On Tue, 26 Nov 2019 at 15:21, Dave Fisher wrote: > Have you looked at Apache Tika? > > [This is tangential to my query. The Whimsy host does not currently include a JRE, so I did not look at Java solutions. The code now exists, and works well enough.] I would still have the same issue with Tika:

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread Dave Fisher
Have you looked at Apache Tika? Sent from my iPhone > On Nov 26, 2019, at 9:16 AM, sebb wrote: > > I have committed some code to extract the form data from ICLAs. > > For example: > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > It would be useful if this could so

How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
I have committed some code to extract the form data from ICLAs. For example: https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf It would be useful if this could somehow be plugged into the workbench. For example when a PDF is classified as an ICLA. However I cannot work out how

Introducing setupmymac!

2019-11-26 Thread Sam Ruby
I've automated the setting up of a mac machine to run whimsy either locally or on Docker and tested the instructions after wiping a machine and reinstalling macOS Catalina on it. I encourage everybody to give it a try: https://github.com/apache/whimsy/blob/master/SETUPMYMAC.MD - Sam Ruby