Hi manuel

thanks for the details. I think that the framework of christian
haidler should be able to read pdf.

Stef

On Thu, Nov 2, 2017 at 8:33 PM, Manuel Leuenberger
<leuenber...@inf.unibe.ch> wrote:
> Hi Stef,
>
> The PDF integration consists of three parts:
>
> 1. CERMINE (https://github.com/CeON/CERMINE) is fed with the PDF and outputs
> metadata as BibTex and a structured XML (title, authors, affiliations,
> abstract, keyword, references, …). This is not perfect, but way better than
> any other metadata extractor I could find.
> 2. From the metadata I generate hyperlinks that are anchored in the PDF by a
> text key. pdf-linker (https://github.com/maenu/pdf-linker) then searches for
> the anchors in the PDF text, using heuristics, as PDF has a document model
> that is primarily intended for rendering and printing, but not for
> processing. The hyperlinks are then inserted using the awesome Apache PDFBox
> (https://pdfbox.apache.org/).
> 3. Those hyperlinks point to an URI like
> “pharo://handle/clickReference.in.?args=1&args=2” to represent a reference 1
> in the paper 2. Now comes the magic part: The OS allows you to register
> custom handlers for custom URI schemes like pharo://. For that I created a
> simple Objective-C app that handles the event and passes it over as a HTTP
> message to a server running in Pharo
> (https://github.com/maenu/PharoUriScheme). The OS will even start the
> application if it is not yet running.
>
> While the custom URI scheme approach is super powerful, it has critical
> drawbacks. Any application can request to be the receiver of a URI scheme,
> just as browser are for http://. Especially on mobile devices with limited
> access to the OS, this opens up an attack point for malware apps that
> replicate original apps that make use of schemes like facebook:// and
> eavesdrop all interactions. If an original app transmits any unencrypted
> secrets or user data encoded in those URIs, malware can easily intercept it
> without the user noticing the leak. I guess this is the reason why many PDF
> viewer just support the standard http:// and mailto:// schemes. E.g., macOS
> Preview gives just an audible beep when I click on a pharo:// link, Chromes
> viewer doesn’t even bother giving any feedback. Only Adobe Acrobat allows
> you to relax security settings to make them work (How could it be someone
> else than Adobe, when it’s a security issue? ;).
>
> I finished basic packaging today and will continue with some READMEs and a
> nearly-all-in-one distribution tomorrow, I’ll keep you posted in this
> thread.
>
> Cheers,
> Manuel
>
> On 2 Nov 2017, at 18:08, Stephane Ducasse <stepharo.s...@gmail.com> wrote:
>
> Hi manuel
>
> this is super cool :)
> Could you describe how you did the pdf integration?
> And yes please package it :)
> I want to try it.
>
> Stef
>
> On Wed, Nov 1, 2017 at 10:16 PM, Manuel Leuenberger
> <leuenber...@inf.unibe.ch> wrote:
>
> Hi everyone,
>
> I was experimenting in the last few weeks with my take on literature
> research. For me, the corpus of scientific papers form an interconnected
> graph, not those plain lists and tables we keep in our bibliographies. So,
> here is the first prototype that has Google Scholar integration for search,
> can fetch PDFs from IEEE and ACM, extracts metadata from PDFs - all this
> results in hyperlinked PDFs!
>
> See a demo here: https://youtu.be/EcK3Pt_WnEw
> Also slides from the SCG seminar here:
> http://scg.unibe.ch/download/softwarecomposition/2017-10-31-Leuenberger-ILE.pdf
>
> I plan on packaging it, so that those who are interested can check it out
> themselves (help wanted!). Currently, it only works on macOS.
>
> What do you think of my approach? Which use cases should be added?
>
> Cheers,
> Manuel
>
>
>

Reply via email to