Hi,

I'm directly involved in the new committee dedicated to design the new
white papers about the ECM / Content Services guidelines and toolkits. The
main goal of these documents is to suggest best practices, guidelines and,
starting from this year, Open Source technology stacks to use in the
enterprise context.

I worked during the last three years contributing in the AIIM committee
with Betsy Fanning but now we will have a new home with a new team.
Yesterday I had a very interesting discussion with Robert Blatt about the
new direction to follow for the next development. The Open Source topic
will be the most relevant one in the next iteration of our work and we are
discussing about a potential white paper totally dedicated to the Open
Source alternatives in the market.

Even if I'm currently contributing as an individual in this committee, it
seems that we could be involved as a foundation in this project. I think
that It could be a good opportunity to spread our brand also on
collaboration like this. We know best practices, approaches and technology
stack where we have a huge amount of experience, skills and projects.

I'm wondering if the ASF was never been involved in this kind of
contributions or if it can be any problem with our involvement on this in
terms of brand. I have to ask more details about this program but in the
meanwhile I would like to receive some feedbacks from you. I'm asking also
because Robert Blatt is very interested to involve us officially in the
program.

I would like to thank Shane for sharing the framework published by Mozilla
some days ago in our ComDev room on HipChat.
Mozilla described a very interesting report adding also some technology
stacks:
https://blog.mozilla.org/blog/2018/05/15/whats-your-open-source-strategy-here-are-10-answers/

Specifically we are talking about areas such as: Content, Search and
Capture and even if OCR is not present in our projects, we have some native
integrations for example with Tesseract on Tika. It can be interesting to
understand which Apache projects can be combined with external libraries to
build a custom Capture Services solution.

For example considering the involvement of Tesseract, it could be the
following proposal:

   - Apache ManifoldCF for crawling any source content repository (API ->
   contents as images or PDF)
   - Apache PDFBox for extracting images from PDF
   - Apache ManifoldCF for injecting contents in Solr
   - Tesseract for extracting text from images (configured inside Apache
   Tika)
   - Apache Solr for indexing extracted text

We could also try to design a section totally dedicated to the Apache
technology stacks:

   - Apache Content Services (JackRabbit, ...)
   - Apache Search Services (Lucene, Solr, ManifoldCF)
   - Apache Semantic Services (UIMA, Stanbol, ...)
   - Apache BigData Services (Hadoop, ...)
   - Apache DevOps Services (Mesos, ...)
   - Apache Libraries Services (Commons, ...)
   - ... and so on :-P

This potential work can be useful internally for us to create our new
Apache brochures dedicated to specific areas of our proposal.
I'm not talking about something that is totally focused only on
technologies but also on best practices, approaches and the good path for a
natural adoption.

I'm trying to understand if contributing on one side (ECM Standards) can
help me to design and improve our Apache brochures.
On the other hand the Apache areas can be also useful for the new white
papers.

Please let me know what you think.
Thank you.

Cheers,
PJ

-- 
Piergiorgio

Reply via email to