
Tilman Hausherr commented on TIKA-1533:

The links no longer work but it's a known problem. To solve it, we'd need an 
algorithm that divides pages into text blocks and then stores this information 
as "beads" in the PDF ("beads" is a concept in PDF to mark text blocks and we 
support reading it). Such algorithms exist because all OCR tools can do this.

My quick thought would be to use the shapes of the glyphs, make them a bit 
larger, join all shapes but keep only the outsides, then find out how many such 
shapes exist, and do a rectangular bound on these shapes.

> PDF parse failing to capture right order of text (2 columns)
> ------------------------------------------------------------
>                 Key: TIKA-1533
>                 URL: https://issues.apache.org/jira/browse/TIKA-1533
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6, 1.7
>         Environment: Java 8, Mac OS X
>            Reporter: Tamara
>            Priority: Major
> When I am converting a document with two columns the order of the columns are 
> inverted in the text file. I only could notice because it is an index list. 
> The page I start to see the problem is the page 303, to look in the converted 
> text look for 362. In the second file I have the same problem the page is 341.
> I have tried: setSortByPosition(true) and the columns got scrambled.
> I have tried to copy and paste from the pdf preview and the copy is as it 
> should.
> And I have tried to use PDFXStream and it parses in the right way.
> Here are the files I have seen the issue:
> http://www.sbu.se/upload/Publikationer/Content0/1/Autismspektrumtillst%C3%A5nd_fulltext.pdf
> http://www.sbu.se/upload/publikationer/content0/1/forstamningssyndrom_fulltext.pdf
> The problem is in this sequence, in the first file:
> 362 A u t i s m s p e k t r u m t i l l s tå n d –  d i Ag n o s t i k  o c h 
> i n s At s e r ,  
> vå r d e n s  o r g A n i s At i o n o c h pAt i e n t e n s  d e l A k t i g 
> h e t
> autism and mental retardation: a case study 
> of 14 siblings from five families. J Pediatr 
> Nurs 2007;22:410-8.
> 9. Stoner JB, Angell ME. Parent perspec-
> tives on role engagement: An investigation  
> of parents of children with ASD and their 
> self-reported roles with education profes-
> sionals. Focus Autism Other Dev Disabl 
> 2006;21:177-89.
> 10. Stoner JB, Angell ME, House JJ,  
> Bock SJ. Transitions: Perspectives from 
> parents of young children with Autism 
> Spectrum Disorder (ASD). J Dev Phys 
> Disabil 2007;19:23-39.
> 11. Kuhaneck HM, Burroughs T, Wright J, 
> Lemanczyk T, Darragh AR. A qualitative 
> study of coping in mothers of children with 
> an autism spectrum disorder. Phys Occup 
> Ther Pediatr 2010;30:340-50.
> 12. Sivberg B. Coping strategies and paren-
> tal attitudes, a comparison of parents with 
> children with autistic spectrum disorders 
> and parents with non-autistic children.  
> Int J Circumpolar Health 2002;61  
> Suppl 2:36-50.
> 13. Antonovsky A. Hälsans mysterium. 
> Stockholm: Natur och Kultur. ISBN  
> 91-27-02193-9; 1991.
> 14. Antonvsky A. Hälsans mysterium: 
> Stockholm: Natur och Kultur; 2005.
> 15. Shu BC. Quality of life of family care- 
> givers of children with autism: The moth-
> er’s perspective. Autism 2009;13:81-91.
> 16. Cullen LA, Barlow JH, Cushway D. 
> Positive touch, the implications for parents 
> Referenser
> 1. Clarke J, van Amerom G. Asperger’s 
> syndrome: differences between parents’ 
> understanding and those diagnosed. Soc 
> Work Health Care 2008;46:85-106.
> 2. Tobias A. Supporting students with 
> autistic spectrum disorder (ASD) at 
> secondary school: A parent and student 
> perspective. Educational Psychology in 
> Practice 2009;25:151-65.
> 3. Müller E, Schuler A, Yates GB. Social 
> challenges and supports from the perspec-
> tive of individuals with Asperger syndrome 
> and other autism spectrum disabilities. 
> Autism 2008;12:173-90.
> 4. Carbone PS, Behl DD, Azor V,  
> Murphy NA. The medical home for  
> children with autism spectrum disorders:  
> parent and pediatrician perspectives.  
> J Autism Dev Disord 2010;40:317-24.
> 5. Benderix Y, Nordstrom B, Sivberg B. 
> Parents’ experience of having a child with 
> autism and learning disabilities living in  
> a group home: a case study. Autism 2006; 
> 10:629-41.
> 6. Renty J, Roeyers H. Satisfaction with 
> formal support and education for children 
> with autism spectrum disorder: the voices 
> of the parents. Child Care Health Dev 
> 2006;32:371-85.
> 7. Farrugia D. Exploring stigma: medical  
> knowledge and the stigmatisation of parents  
> of children diagnosed with autism spec- 
> trum disorder. Sociol Health Illn 2009; 
> 31:1011-27.
> 8. Benderix Y, Sivberg B. Siblings’ experi-
> ences of having a brother or sister with 
> ----------------------------------------------------------
> Second File: pg 341 or 344
> Postnatal Depression Scale, and assessment 
> of risk factors for postnatal depression. J 
> Affect Disord 2003;76:151-6.
> 66. Adouard F, Glangeaud-Freudenthal 
> NM, Golse B. Validation of the Edinburgh 
> postnatal depression scale (EPDS) in a 
> sample of women with high-risk pregnan-
> cies in France. Arch Womens Ment Health 
> 2005;8:89-95.
> 67. Bunevicius A, Kusminskas L,  
> Bunevicius R. Validation of the Lithuanian  
> version of the Edinburgh Postnatal Depres- 
> sion Scale. Medicina (Kaunas) 2009;45: 
> 544-8.
> 68. Chaudron LH, Szilagyi PG, Tang W, 
> Anson E, Talbot NL, Wadkins HI, et al. 
> Accuracy of depression screening tools for 
> identifying postpartum depression among 
> urban mothers. Pediatrics 2010;125:e609-
> 17. Epub 2010 Feb 15.
> 69. Matthey S, Barnett B, Kavanagh DJ, 
> Howie P. Validation of the Edinburgh 
> Postnatal Depression Scale for men, and 
> comparison of item endorsement with their 
> partners. J Affect Disord 2001;64:175-84.
> 70. Rowe HJ, Fisher JR, Loh WM. The  
> Edinburgh Postnatal Depression Scale 
> detects but does not distinguish anxiety  
> disorders from depression in mothers of  
> infants. Arch Womens Ment Health 2008; 
> 11:103-8.
> 71. Boyce PM, Stubbs J, Todd AL. The  
> Edinburgh Postnatal Depression Scale: 
> Validation for an Australian sample.  
> Aust N Z J Psychiatry 1993;27:472-476.
> 72. Leonardoua AA, Zervas YM,  
> Papageorgiou CC, Marks MN, Tsartsara 
> EC, Antsaklis A, et al. Validation of the 
> Edinburgh Postnatal Depression Scale 
> Postnatal Depression Scale: validation in 
> a Norwegian community sample. Nord J 
> Psychiatry 2001;55:113-7.
> 59. Navarro P, Ascaso C, Garcia-Esteve 
> L, Aguado J, Torres A, Martin-Santos R. 
> Postnatal psychiatric morbidity: a vali- 
> dation study of the GHQ-12 and the  
> EPDS as screening tools. General  
> Hospital Psychiatry 2007;29:1-7.
> 60. Phillips J, Charles M, Sharpe L,  
> Matthey S. Validation of the subscales  
> of the Edinburgh Postnatal Depression 
> Scale in a sample of women with unsettled 
> infants. J Affect Disord 2009;118:101-12.
> 61. Beck CT, Gable RK. Comparative ana- 
> lysis of the performance of the Postpartum  
> Depression Screening Scale with two other  
> depression instruments. Nurs Res 2001;50: 
> 242-50.
> 62. Bunevicius A, Kusminskas L, Pop VJ, 
> Pedersen CA, Bunevicius R. Screening for 
> antenatal depression with the Edinburgh 
> Depression Scale. J Psychosom Obstet 
> Gynaecol 2009;30:238-43.
> 63. Aydin N, Inandi T, Yigit A,  
> Hodoglugil NN. Validation of the Turkish 
> version of the Edinburgh Postnatal Depres-
> sion Scale among women within their first 
> postpartum year. Soc Psychiatry Psychiatr 
> Epidemiol 2004;39:483-6.
> 64. Garcia-Esteve L, Ascaso C, Ojuel J,  
> Navarro P. Validation of the Edinburgh 
> Postnatal Depression Scale (EPDS) in  
> Spanish mothers. J Affect Disord 2003; 
> 75:71-6.
> 65. Berle JO, Aarre TF, Mykletun A,  
> Dahl AA, Holsten F. Screening for  
> postnatal depression. Validation of the  
> Norwegian version of the Edinburgh 

This message was sent by Atlassian Jira

Reply via email to