Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
Thanks a lot! On Tue, 26 Nov 2019 at 17:07, Sam Ruby wrote: > On Tue, Nov 26, 2019 at 10:16 AM sebb wrote: > > > > I have committed some code to extract the form data from ICLAs. > > > > For example: > > > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > > > It would be

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread Sam Ruby
On Tue, Nov 26, 2019 at 10:16 AM sebb wrote: > > I have committed some code to extract the form data from ICLAs. > > For example: > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > It would be useful if this could somehow be plugged into the workbench. > For example when a

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
On Tue, 26 Nov 2019 at 15:21, Dave Fisher wrote: > Have you looked at Apache Tika? > > [This is tangential to my query. The Whimsy host does not currently include a JRE, so I did not look at Java solutions. The code now exists, and works well enough.] I would still have the same issue with Tika:

Re: How to plug PDF scraping into Secretary workbench

2019-11-26 Thread Dave Fisher
Have you looked at Apache Tika? Sent from my iPhone > On Nov 26, 2019, at 9:16 AM, sebb wrote: > > I have committed some code to extract the form data from ICLAs. > > For example: > > https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf > > It would be useful if this could so

How to plug PDF scraping into Secretary workbench

2019-11-26 Thread sebb
I have committed some code to extract the form data from ICLAs. For example: https://whimsy.apache.org/secretary/icla-parse/mm/hash/icla.pdf It would be useful if this could somehow be plugged into the workbench. For example when a PDF is classified as an ICLA. However I cannot work out how