Have you looked at Apache Tika? Sent from my iPhone
> On Nov 26, 2019, at 9:16 AM, sebb <seb...@gmail.com> wrote: > > I have committed some code to extract the form data from ICLAs. > > For example: > > https://whimsy.apache.org/secretary/icla-parse/yyyymm/hash/icla.pdf > > It would be useful if this could somehow be plugged into the workbench. > For example when a PDF is classified as an ICLA. > > However I cannot work out how to do this. > > S.