"ignored"?
This would strip out the attr_ fields so they wouldn't even be
indexed...if you don't want them.
As for the HTML file, it looks like Tika is failing to strip out the
style section. Try running the file alone with tika-app: java -jar
tika-app.jar -t inputfi
be
indexed...if you don't want them.
As for the HTML file, it looks like Tika is failing to strip out the
style section. Try running the file alone with tika-app: java -jar
tika-app.jar -t inputfile.html. If you are finding the noise there.
Please open an issue on our JIRA:
https://issues.
ilar. Not very helpful, I know.
Regards,
Alex.
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/
On 27 May 2016 at 23:48, Simon Blandford wrote:
Hi Timothy,
Thanks for responding.
java -jar tika-app-1.13.jar -t
"/home/user/Documents
his
would strip out the attr_ fields so they wouldn't even be indexed...if you don't want
them.
As for the HTML file, it looks like Tika is failing to strip out the style
section. Try running the file alone with tika-app: java -jar tika-app.jar -t
inputfile.html. If you are finding
Hi,
I am using Solr 6.0 on Ubuntu 14.04.
I am ending up with loads of junk in the text body. It starts like,
The JSON entry output of a search result shows the indexed text starting
with...
body_txt_en: " stream_size 36499 X-Parsed-By
org.apache.tika.parser.DefaultParser X-Parsed-By"
An