Re: Tika 2.0?

2017-09-11 Thread Bob Paulin
Just so it's clear are we going to: 1) Rename the 2.0 branch over to master or 2) Re-apply the changes on master.  I recall Chris' preference was 1 which would be quicker.  However there is very likely missed patches.  2 will be more time consuming but it would be more likely to include all the

Re: Tika 2.0?

2017-09-11 Thread Chris Mattmann
+1000 On 9/11/17, 12:03 PM, "Allison, Timothy B." wrote: Y, well, I didn't say _which_ September... Given my limited availability to work on this in Sept and POI's decision to move to Java 1.8, I propose releasing Tika 1.17 after the release of POI 3.17 and PDFBox 2.0.8. This w

Re: Integrating Tika with Apache Beam

2017-09-11 Thread Mattmann, Chris A (3010)
Amazing work, thank you Sergey!! ++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion La

RE: Tika 2.0?

2017-09-11 Thread Allison, Timothy B.
Y, well, I didn't say _which_ September... Given my limited availability to work on this in Sept and POI's decision to move to Java 1.8, I propose releasing Tika 1.17 after the release of POI 3.17 and PDFBox 2.0.8. This would be the last version of Tika at the Java 1.7 level, and then we bump

Re: Integrating Tika with Apache Beam

2017-09-11 Thread Sergey Beryozkin
Hi Tim Thanks, the code, especially the one dealing with adapting the Tika events to the Bean pipeline will most likely need to be improved :-), I've tried to make sure it all can be configured as much as possible (point to the loc of the TikaConfig if needed, etc), but it's only a start... I

RE: Integrating Tika with Apache Beam

2017-09-11 Thread Allison, Timothy B.
What great news! Thank you, Sergey!!! -Original Message- From: Sergey Beryozkin [mailto:sberyoz...@gmail.com] Sent: Monday, September 11, 2017 9:18 AM To: Allison, Timothy B. ; dev@tika.apache.org Subject: Re: Integrating Tika with Apache Beam Hi Tim, All It took it some time, but fina

Re: Integrating Tika with Apache Beam

2017-09-11 Thread Sergey Beryozkin
Hi Tim, All It took it some time, but finally Beam TikaIO component is in its 2.2.0-SNAPSHOT master, https://github.com/apache/beam/tree/master/sdks/java/io/tika I've created a basic project which can help with running it quickly: https://github.com/sberyozkin/beamTikaExample One can just b