PJ I invite you to join and comment on the Tika lists. We already are working
on standards in a number of the areas below, including even beyond some of
the basic things you cite. For example we are already doing Sentiment Analysis,
Deep Learning, and other NLP and have these integrated into Tika as part of a
Broader ecosystem.

 

Feel free to join the discussion at d...@tika.apache.org. 

 

You can also read more about it under Advanced Content Integration here:

 

https://wiki.apache.org/tika/#Advanced_Content_Extraction_with_Tika_-_Integration

 

Look also at NER, Object Detection ,Text Captioning and Computer Vision.

 

Regarding participation in this committee at ECM, I’m definitely interested 
if it’s worthwhile.

 

Chris Mattmann

 

 

 

From: Piergiorgio Lucidi <piergior...@apache.org>
Reply-To: "dev@community.apache.org" <dev@community.apache.org>
Date: Tuesday, May 22, 2018 at 4:30 PM
To: "dev@community.apache.org" <dev@community.apache.org>
Subject: ASF involvement in the new ECM Standard Committee

 

Hi,

 

I'm directly involved in the new committee dedicated to design the new

white papers about the ECM / Content Services guidelines and toolkits. The

main goal of these documents is to suggest best practices, guidelines and,

starting from this year, Open Source technology stacks to use in the

enterprise context.

 

I worked during the last three years contributing in the AIIM committee

with Betsy Fanning but now we will have a new home with a new team.

Yesterday I had a very interesting discussion with Robert Blatt about the

new direction to follow for the next development. The Open Source topic

will be the most relevant one in the next iteration of our work and we are

discussing about a potential white paper totally dedicated to the Open

Source alternatives in the market.

 

Even if I'm currently contributing as an individual in this committee, it

seems that we could be involved as a foundation in this project. I think

that It could be a good opportunity to spread our brand also on

collaboration like this. We know best practices, approaches and technology

stack where we have a huge amount of experience, skills and projects.

 

I'm wondering if the ASF was never been involved in this kind of

contributions or if it can be any problem with our involvement on this in

terms of brand. I have to ask more details about this program but in the

meanwhile I would like to receive some feedbacks from you. I'm asking also

because Robert Blatt is very interested to involve us officially in the

program.

 

I would like to thank Shane for sharing the framework published by Mozilla

some days ago in our ComDev room on HipChat.

Mozilla described a very interesting report adding also some technology

stacks:

https://blog.mozilla.org/blog/2018/05/15/whats-your-open-source-strategy-here-are-10-answers/

 

Specifically we are talking about areas such as: Content, Search and

Capture and even if OCR is not present in our projects, we have some native

integrations for example with Tesseract on Tika. It can be interesting to

understand which Apache projects can be combined with external libraries to

build a custom Capture Services solution.

 

For example considering the involvement of Tesseract, it could be the

following proposal:

 

   - Apache ManifoldCF for crawling any source content repository (API ->

   contents as images or PDF)

   - Apache PDFBox for extracting images from PDF

   - Apache ManifoldCF for injecting contents in Solr

   - Tesseract for extracting text from images (configured inside Apache

   Tika)

   - Apache Solr for indexing extracted text

 

We could also try to design a section totally dedicated to the Apache

technology stacks:

 

   - Apache Content Services (JackRabbit, ...)

   - Apache Search Services (Lucene, Solr, ManifoldCF)

   - Apache Semantic Services (UIMA, Stanbol, ...)

   - Apache BigData Services (Hadoop, ...)

   - Apache DevOps Services (Mesos, ...)

   - Apache Libraries Services (Commons, ...)

   - ... and so on :-P

 

This potential work can be useful internally for us to create our new

Apache brochures dedicated to specific areas of our proposal.

I'm not talking about something that is totally focused only on

technologies but also on best practices, approaches and the good path for a

natural adoption.

 

I'm trying to understand if contributing on one side (ECM Standards) can

help me to design and improve our Apache brochures.

On the other hand the Apache areas can be also useful for the new white

papers.

 

Please let me know what you think.

Thank you.

 

Cheers,

PJ

 

-- 

Piergiorgio

 

Reply via email to