Hi manuel thanks for the details. I think that the framework of christian haidler should be able to read pdf.
Stef On Thu, Nov 2, 2017 at 8:33 PM, Manuel Leuenberger <leuenber...@inf.unibe.ch> wrote: > Hi Stef, > > The PDF integration consists of three parts: > > 1. CERMINE (https://github.com/CeON/CERMINE) is fed with the PDF and outputs > metadata as BibTex and a structured XML (title, authors, affiliations, > abstract, keyword, references, …). This is not perfect, but way better than > any other metadata extractor I could find. > 2. From the metadata I generate hyperlinks that are anchored in the PDF by a > text key. pdf-linker (https://github.com/maenu/pdf-linker) then searches for > the anchors in the PDF text, using heuristics, as PDF has a document model > that is primarily intended for rendering and printing, but not for > processing. The hyperlinks are then inserted using the awesome Apache PDFBox > (https://pdfbox.apache.org/). > 3. Those hyperlinks point to an URI like > “pharo://handle/clickReference.in.?args=1&args=2” to represent a reference 1 > in the paper 2. Now comes the magic part: The OS allows you to register > custom handlers for custom URI schemes like pharo://. For that I created a > simple Objective-C app that handles the event and passes it over as a HTTP > message to a server running in Pharo > (https://github.com/maenu/PharoUriScheme). The OS will even start the > application if it is not yet running. > > While the custom URI scheme approach is super powerful, it has critical > drawbacks. Any application can request to be the receiver of a URI scheme, > just as browser are for http://. Especially on mobile devices with limited > access to the OS, this opens up an attack point for malware apps that > replicate original apps that make use of schemes like facebook:// and > eavesdrop all interactions. If an original app transmits any unencrypted > secrets or user data encoded in those URIs, malware can easily intercept it > without the user noticing the leak. I guess this is the reason why many PDF > viewer just support the standard http:// and mailto:// schemes. E.g., macOS > Preview gives just an audible beep when I click on a pharo:// link, Chromes > viewer doesn’t even bother giving any feedback. Only Adobe Acrobat allows > you to relax security settings to make them work (How could it be someone > else than Adobe, when it’s a security issue? ;). > > I finished basic packaging today and will continue with some READMEs and a > nearly-all-in-one distribution tomorrow, I’ll keep you posted in this > thread. > > Cheers, > Manuel > > On 2 Nov 2017, at 18:08, Stephane Ducasse <stepharo.s...@gmail.com> wrote: > > Hi manuel > > this is super cool :) > Could you describe how you did the pdf integration? > And yes please package it :) > I want to try it. > > Stef > > On Wed, Nov 1, 2017 at 10:16 PM, Manuel Leuenberger > <leuenber...@inf.unibe.ch> wrote: > > Hi everyone, > > I was experimenting in the last few weeks with my take on literature > research. For me, the corpus of scientific papers form an interconnected > graph, not those plain lists and tables we keep in our bibliographies. So, > here is the first prototype that has Google Scholar integration for search, > can fetch PDFs from IEEE and ACM, extracts metadata from PDFs - all this > results in hyperlinked PDFs! > > See a demo here: https://youtu.be/EcK3Pt_WnEw > Also slides from the SCG seminar here: > http://scg.unibe.ch/download/softwarecomposition/2017-10-31-Leuenberger-ILE.pdf > > I plan on packaging it, so that those who are interested can check it out > themselves (help wanted!). Currently, it only works on macOS. > > What do you think of my approach? Which use cases should be added? > > Cheers, > Manuel > > >