Hi, I'm directly involved in the new committee dedicated to design the new white papers about the ECM / Content Services guidelines and toolkits. The main goal of these documents is to suggest best practices, guidelines and, starting from this year, Open Source technology stacks to use in the enterprise context.
I worked during the last three years contributing in the AIIM committee with Betsy Fanning but now we will have a new home with a new team. Yesterday I had a very interesting discussion with Robert Blatt about the new direction to follow for the next development. The Open Source topic will be the most relevant one in the next iteration of our work and we are discussing about a potential white paper totally dedicated to the Open Source alternatives in the market. Even if I'm currently contributing as an individual in this committee, it seems that we could be involved as a foundation in this project. I think that It could be a good opportunity to spread our brand also on collaboration like this. We know best practices, approaches and technology stack where we have a huge amount of experience, skills and projects. I'm wondering if the ASF was never been involved in this kind of contributions or if it can be any problem with our involvement on this in terms of brand. I have to ask more details about this program but in the meanwhile I would like to receive some feedbacks from you. I'm asking also because Robert Blatt is very interested to involve us officially in the program. I would like to thank Shane for sharing the framework published by Mozilla some days ago in our ComDev room on HipChat. Mozilla described a very interesting report adding also some technology stacks: https://blog.mozilla.org/blog/2018/05/15/whats-your-open-source-strategy-here-are-10-answers/ Specifically we are talking about areas such as: Content, Search and Capture and even if OCR is not present in our projects, we have some native integrations for example with Tesseract on Tika. It can be interesting to understand which Apache projects can be combined with external libraries to build a custom Capture Services solution. For example considering the involvement of Tesseract, it could be the following proposal: - Apache ManifoldCF for crawling any source content repository (API -> contents as images or PDF) - Apache PDFBox for extracting images from PDF - Apache ManifoldCF for injecting contents in Solr - Tesseract for extracting text from images (configured inside Apache Tika) - Apache Solr for indexing extracted text We could also try to design a section totally dedicated to the Apache technology stacks: - Apache Content Services (JackRabbit, ...) - Apache Search Services (Lucene, Solr, ManifoldCF) - Apache Semantic Services (UIMA, Stanbol, ...) - Apache BigData Services (Hadoop, ...) - Apache DevOps Services (Mesos, ...) - Apache Libraries Services (Commons, ...) - ... and so on :-P This potential work can be useful internally for us to create our new Apache brochures dedicated to specific areas of our proposal. I'm not talking about something that is totally focused only on technologies but also on best practices, approaches and the good path for a natural adoption. I'm trying to understand if contributing on one side (ECM Standards) can help me to design and improve our Apache brochures. On the other hand the Apache areas can be also useful for the new white papers. Please let me know what you think. Thank you. Cheers, PJ -- Piergiorgio