Hi, I tried configuring the tika configuration using the config file and importing it to the program where I am parsing the text, but that didn't work and I am still getting the same error/result. Basically, I want my program (using tika for parsing) to consider any kind of data that is provided as a simple "text" and nothing else.
Could you please suggest a path forward how I can solve this? -Kashif On Sun, Mar 17, 2024 at 10:23 PM Tilman Hausherr <[email protected]> wrote: > Hi, > > The best would of course be that you don't make it look as if your text > files are something else. > > The second best: fine tune the tika configuration > https://tika.apache.org/2.9.1/configuring.html > > Tilman > > On 17.03.2024 17:46, Kashif Khan wrote: > > Do you think it is an issue to be fixed? And also, is there a workaround > for this to work? > > On Sun, Mar 17, 2024, 5:03 PM Tilman Hausherr <[email protected]> > wrote: > >> The first one is recognized as image/x-portable-graymap because "P2" is a >> magic number for that type. >> >> "P1" is a magic number for image/x-portable-bitmap. >> >> Tilman >> >> On 16.03.2024 12:37, Kashif Khan wrote: >> >> Hello Tim/Forum, >> >> While I am trying to parse the below content the result is null/empty: >> *"P2P He has Asthma"* >> OR >> *"P18-8610 He has Asthma"* >> OR >> *"P2P Scheduled as He had breathing issues *for the last* 1 year."* >> >> Whereas, the below gets parsed without any issues: >> *"He has Asthma"* >> *"Appointment Scheduled as He had breathing issues for last 1 year."* >> >> Could you please help in understand the exact issue and help with the >> resolution? >> >> -Kashif Khan >> [email protected] >> >> >> >
