The other great tool for processing pdfs is PDFBox. It’s a java jar file, so 
you need to have a reasonably recent JVM installed.

https://pdfbox.apache.org/ <https://pdfbox.apache.org/>

I’m still using version2.
Here are the commandline tools.
https://pdfbox.apache.org/2.0/commandline.html 
<https://pdfbox.apache.org/2.0/commandline.html>


I have a series of scripts for various types of manipulation and text 
extraction.

It is worth noting that people who naïvely believe that text extraction from a 
PDF is simple will get burnt if they are not checking the results. PDF files 
are not obliged to store their text in any particular order. Mostly they do, 
until they don’t.

I can post my scripts if anyone is interested.

There are also a number of other tools for linux that i access on  my Mac 
through MacPorts.

God bless you.

—
Peter West
p...@pbw.id.au
“The kingdom of heaven is like treasure hidden in a field, which a man found 
and covered up. Then in his joy he goes and sells all that he has and buys that 
field.”

> On 7 Aug 2022, at 2:52 am, Tom Browder <tom.brow...@gmail.com> wrote:
> 
> On Sat, Aug 6, 2022 at 11:43 AM Glenn Fowler <gfowl...@outlook.com> wrote:
>> 
>> My scripts are in PowerShell. For GhostScript I'm just using CLI:
> 
> Thanks, Glenn, that's close to what I've found for Linux:
> 
>    $ gs -sDEVICE=txtwrite -o output.txt input.pdf
> 
> It just needs some tweaking and post-conversion parsing (very bank
> specific). I'll see how my current PDF statements look after text
> conversion.
> 
> But I'll also keep looking at YNAB for a more general solution.
> 
> Cheers!
> 
> -Tom
> _______________________________________________
> gnucash-user mailing list
> gnucash-user@gnucash.org
> To update your subscription preferences or to unsubscribe:
> https://lists.gnucash.org/mailman/listinfo/gnucash-user
> -----
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.

_______________________________________________
gnucash-user mailing list
gnucash-user@gnucash.org
To update your subscription preferences or to unsubscribe:
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.

Reply via email to