The Apache Tika project is pleased to announce the release of Apache Tika 4.0.0-beta-1. The release contents have been pushed out to the main Apache release site and to the Maven Central sync.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 4.0.0-beta-1 includes a switch to Markdown as the default content handler, new runnable zip distributions for tika-app, tika-server and tika-eval (with drop-in pf4j pipes plugins) in place of shaded jars, and a new maxPages option in PDFParserConfig to cap PDF page processing. This release also includes dependency upgrades, including Jetty 12.x and CXF 4.1.x. Details can be found in the changes file: https://www.apache.org/dist/tika/4.0.0-beta-1/CHANGES-4.0.0-beta-1.txt and in our draft 4.x docs site: https://tika.apache.org/docs/4.0.0-SNAPSHOT/ Apache Tika is available on the download page: https://tika.apache.org/download.html Apache Tika will be available shortly in binary form or for use using Maven 2 from the Central Repository: https://repo1.maven.org/maven2/org/apache/tika/ When downloading, please remember to verify the downloads using signatures found: https://www.apache.org/dist/tika/KEYS For more information on Apache Tika, visit the project home page: https://tika.apache.org/ -- Tim Allison, on behalf of the Apache Tika community
