Text extraction and clip area

2016-12-01 Thread Andrea Vacondio
Hi, I had a couple of issues with text extraction and I tried to dig a bit into the code. As far as I can see the "current clipping area" is never used during text extraction, is this correct? My issue is with a form xobject where the bounding box clips out part of the text but that text is returne

RE: Text extraction and clip area

2016-12-01 Thread fx YAN BING
Hi, this is Yan from Japan. I'm also a user of PDFBox. About your problem, I've not understood clearly. Do you want to process the contents inside a form? I can give a sample code used in my project. It use PDFStreamEngine to get form objects in PDF. I hope it can help you. -Original Me

Re: Text extraction and clip area

2016-12-01 Thread Andrea Vacondio
Hi Yan, thanks for the answer but that is not my issue. What I'm saying is that I think the PDFBox components responsible for extracting text (PDFTextStripper and PDFTextStripperByArea) don't consider the current clipping area, the way PageDrawer does.. or at least that's how it looks to me, so I'm

Licensing of PDF box

2016-12-01 Thread Bhushan More
Hi. I have two queries related to licensing of PDFbox library for use in android app 1) DIs this library free for commercial use in android app? 2) PDFbox internally uses some fonts file like Arial, If we use this lib and related font will it cause any font file licensing issue? Thanks, Bhush

Re: Licensing of PDF box

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 18:54 schrieb Bhushan More: Hi. I have two queries related to licensing of PDFbox library for use in android app 1) DIs this library free for commercial use in android app? Read it here: https://www.apache.org/licenses/LICENSE-2.0 Although you won't go anywhere with Apache

question on getting location of bookmark

2016-12-01 Thread howardg
I want to find the x,y position of a bookmark in the pdf to be able to insert an image at that location. I can get the bookmark with the code below and looping through to find the one I need with getTitle PDOutlineItem current = outline.getFirstChild(); But, how would I use PDOutlineItem to

Re: Text extraction and clip area

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 10:01 schrieb Andrea Vacondio: Hi, I had a couple of issues with text extraction and I tried to dig a bit into the code. As far as I can see the "current clipping area" is never used during text extraction, is this correct? Yes. -

Re: question on getting location of bookmark

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 21:00 schrieb howa...@tlcc.com: I want to find the x,y position of a bookmark in the pdf to be able to insert an image at that location. I can get the bookmark with the code below and looping through to find the one I need with getTitle PDOutlineItem current = outline.getFir

Re: question on getting location of bookmark

2016-12-01 Thread howardg
thanks, can you elaborate? The API doc for getDestination shows that returns PDDestination and I don't see any methods that return the coordinates? Howard Howard Greenberg, CPA, CISA IBM Certified Application Developer/Instructor - IBM Notes and Domino The Learning Continuum Company, Ltd. 888

Re: question on getting location of bookmark

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 21:12 schrieb howa...@tlcc.com: thanks, can you elaborate? The API doc for getDestination shows that returns PDDestination and I don't see any methods that return the coordinates? There are 7 derived classes. That's why I mentioned that there are several types. For example, PD

Re: question on getting location of bookmark

2016-12-01 Thread howardg
thanks, I am missing something... My code uses getDestination() and recasts as PDPageXYZDestination. However, the dest is null. I must be missing something? PDDocumentOutline outline = pdfDoc .getDocumentCatalog().getDocumentOutline(); if( outline != null ){

Re: question on getting location of bookmark

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 21:38 schrieb howa...@tlcc.com: thanks, I am missing something... My code uses getDestination() and recasts as PDPageXYZDestination. However, the dest is null. I must be missing something? There's anther possibilty, that there is an /A entry, i.e. an action. Call getAction().

Re: question on getting location of bookmark

2016-12-01 Thread howardg
thanks, I changed my code as shown below. However I can't cast the destination to the PDPageXYZDestination and it appears that it is a PDPageFitWidthDestination class. This class does not have the getLeft() and the page number returns -1 PDDocumentOutline outli

Re: question on getting location of bookmark

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 22:08 schrieb howa...@tlcc.com: thanks, I changed my code as shown below. However I can't cast the destination to the PDPageXYZDestination and it appears that it is a PDPageFitWidthDestination class. This class does not have the getLeft() and the page number returns -1 That

Re: question on getting location of bookmark

2016-12-01 Thread howardg
Sorry for not getting this but I don't understand how I would insert an image without getting the left coordinate? Howard From: Tilman Hausherr To: users@pdfbox.apache.org Date: 12/01/2016 04:24 PM Subject:Re: question on getting location of bookmark Am 01.12.2016 um 22:08

Re: question on getting location of bookmark

2016-12-01 Thread Tilman Hausherr
Am 01.12.2016 um 22:28 schrieb howa...@tlcc.com: Sorry for not getting this but I don't understand how I would insert an image without getting the left coordinate? Choose a coordinate yourself. The problem is that you expected that a bookmark would always point to a specific location. It ain'

Licensing of PDbox for Android

2016-12-01 Thread Bhushan More
Hi. I have two queries related to licensing of PDFbox library for use in android app 1) Is this library free for commercial use in android app? 2) PDFbox internally uses some fonts file like Arial, If we use this lib and related font will it cause any font file licensing issue? Thanks, Bhusha