Both the page and character index are clamped to the number of pages and characters on a page so you could set both to very high numbers. Adding character counts to the documentPages property might be useful here too.
Cheers Monte > On 13 Dec 2021, at 11:17 am, Paul Dupuis via use-livecode > <use-livecode@lists.runrev.com> wrote: > > Thank you Monte, > > We've just started to make a map from XPDF APIs to the PDF Widget APIs, so > I'll make sure that gets done soon and add any missing capabilities as > requests to the LC Quality Center. > > With regard to the hilitedRange and hilitedRangeText properties, can you just > advise on the correct use to get a PDF's text? i.e can you use a range of 1 > to -1 to get the whole document text or would that just be the current page > text? > > Thanks in advance, > > > On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote: >> Hi Folks >> >> Currently you can extract text in the widget by setting the hilitedRange and >> getting the hilitedRangeText. It wouldn’t be that hard to add extracted text >> to the documentPages property. The PDF widget was built to meet the >> requirements for a client rather than to match the features of XPDF so it’s >> worthwhile anyone still using XPDF to take the time to audit their use and >> see if there’s any extra features required. If so please create feature >> requests for them. While XPDF will continue to function we intend to stop >> including it in LiveCode. >> >> Cheers >> >> Monte >> >>> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode >>> <use-livecode@lists.runrev.com> wrote: >>> >>> I suspect it is for backward compatibility. >>> >>> When I turned over the XPDF external to Livecode, I asked that they >>> maintain it for a couple years. I had expected we'd migrate out apps to the >>> PDF widget by then, but business factors mean we're only now just starting >>> a migration. >>> >>> That's why I jumped in on this thread - we HAVE to have the ability to >>> extract text and images from the PDF widget (as you can with the External) >>> - to migrate to the Widget. >>> >>> I suspect many other commercial developers who used the External still have >>> active code using it that they have not migrated yet OR the issue of the >>> undocumented (or, even worse, missing) properties of the widget most likely >>> would have been raised before now. >>> >>> To migrate, all the command and functions of the External need to be mapped >>> to the properties of the Widget. We have probably a couple hundred calls to >>> the External in our code all of which need to be mapped, updated, and >>> tested - so no trivial task. >>> >>> >>> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: >>>> Ah, i thought you were referring only to XPDF. >>>> Btw. do you have an idea why both, XPDF external and PDF widget, are >>>> maintained? Wouldn't it make sense to have only one pdf solution included? >>>> Or am i missing something? >>>> >>>> Regards, >>>> Matthias >>>> >>>> >>>>> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode >>>>> <use-livecode@lists.runrev.com>: >>>>> >>>>> Yes, I am familiar with the XPDF external (based on Google's PDFium >>>>> library), having designed it and paid Monte to code it and then turned it >>>>> over to LiveCode. >>>>> >>>>> I was referring to the PDF Widget (also based on Google's PDFium), which >>>>> should have a comparable property for fetching the text of a page. The LC >>>>> dictionary does not list any property for returning the page text, so I >>>>> assume that is a Dictionary/Documentation error and that Monte can tell >>>>> us the correct property of the PDF widget that will return the text of a >>>>> page. >>>>> >>>>> >>>>> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: >>>>>> Paul, >>>>>> >>>>>> here on mac OS the dictionary of LC 10 DP1 definitely lists the function >>>>>> XPDFViewer_Text(viewerName, pageNumber). >>>>>> Btw. checking this showed me that this function seems to be deprecated >>>>>> and instead the command >>>>>> XPDFViewer_Unicode viewerName, pageNumber, variableName >>>>>> should be used. >>>>>> >>>>>> >>>>>>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode >>>>>>> <use-livecode@lists.runrev.com>: >>>>>>> >>>>>>> There must be an undocumented property for the text of a page - there >>>>>>> was a function to return the full text of a page in the External (XPDF) >>>>>>> and to get the full text of the PDF file, you just stepped through the >>>>>>> pages (1..N) getting and concatenating the page text. >>>>>>> >>>>>>> Monte? LC 10.0.0 Dictionary does not list a property for the page text. >>>>>>> >>>>>>> >>>>>>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a PDF file with text and pictures, but I just want the text. >>>>>>>> >>>>>>>> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file >>>>>>>> with Preview on MacOS. >>>>>>>> >>>>>>>> I have a business licence and want to use the PDF widget but I cannot >>>>>>>> find a way to do it. >>>>>>>> >>>>>>>> Can someone help me out? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Torsten >>>>>>>> _______________________________________________ >>>>>>>> use-livecode mailing list >>>>>>>> use-livecode@lists.runrev.com >>>>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>>>> subscription preferences: >>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>>>> _______________________________________________ >>>>>>> use-livecode mailing list >>>>>>> use-livecode@lists.runrev.com >>>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>>> subscription preferences: >>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>>> _______________________________________________ >>>>>> use-livecode mailing list >>>>>> use-livecode@lists.runrev.com >>>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>>> subscription preferences: >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>>> _______________________________________________ >>>>> use-livecode mailing list >>>>> use-livecode@lists.runrev.com >>>>> Please visit this url to subscribe, unsubscribe and manage your >>>>> subscription preferences: >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>>> _______________________________________________ >>>> use-livecode mailing list >>>> use-livecode@lists.runrev.com >>>> Please visit this url to subscribe, unsubscribe and manage your >>>> subscription preferences: >>>> http://lists.runrev.com/mailman/listinfo/use-livecode >>> >>> _______________________________________________ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode