Hi Chris, first of all thank you so much for your feedback :)
I'm definitely interested to strictly follow Apache Tika, not only for this opportunity but I could have specific capabilities to implement also for some of our clients :-P I didn't know nothing about your involvement in other standards so this is awesome! Inside Apache ManifoldCF we have a Tika Transformer that we can use in the pipeline when we start a scheduled job against content repository for indexing or migration process. I know something about Tika because I typically work also on Solr but probably following the mailing list can give me a wide vision about it. I'll ask if we can bring more people in the committee and I think that your contribution can be absolutely valuable. I'll let you know any update on this. Cheers, PJ 2018-05-23 2:07 GMT+02:00 Chris Mattmann <mattm...@apache.org>: > PJ I invite you to join and comment on the Tika lists. We already are > working > on standards in a number of the areas below, including even beyond some of > the basic things you cite. For example we are already doing Sentiment > Analysis, > Deep Learning, and other NLP and have these integrated into Tika as part > of a > Broader ecosystem. > > > > Feel free to join the discussion at d...@tika.apache.org. > > > > You can also read more about it under Advanced Content Integration here: > > > > https://wiki.apache.org/tika/#Advanced_Content_Extraction_ > with_Tika_-_Integration > > > > Look also at NER, Object Detection ,Text Captioning and Computer Vision. > > > > Regarding participation in this committee at ECM, I’m definitely > interested > if it’s worthwhile. > > > > Chris Mattmann > > > > > > > > From: Piergiorgio Lucidi <piergior...@apache.org> > Reply-To: "dev@community.apache.org" <dev@community.apache.org> > Date: Tuesday, May 22, 2018 at 4:30 PM > To: "dev@community.apache.org" <dev@community.apache.org> > Subject: ASF involvement in the new ECM Standard Committee > > > > Hi, > > > > I'm directly involved in the new committee dedicated to design the new > > white papers about the ECM / Content Services guidelines and toolkits. The > > main goal of these documents is to suggest best practices, guidelines and, > > starting from this year, Open Source technology stacks to use in the > > enterprise context. > > > > I worked during the last three years contributing in the AIIM committee > > with Betsy Fanning but now we will have a new home with a new team. > > Yesterday I had a very interesting discussion with Robert Blatt about the > > new direction to follow for the next development. The Open Source topic > > will be the most relevant one in the next iteration of our work and we are > > discussing about a potential white paper totally dedicated to the Open > > Source alternatives in the market. > > > > Even if I'm currently contributing as an individual in this committee, it > > seems that we could be involved as a foundation in this project. I think > > that It could be a good opportunity to spread our brand also on > > collaboration like this. We know best practices, approaches and technology > > stack where we have a huge amount of experience, skills and projects. > > > > I'm wondering if the ASF was never been involved in this kind of > > contributions or if it can be any problem with our involvement on this in > > terms of brand. I have to ask more details about this program but in the > > meanwhile I would like to receive some feedbacks from you. I'm asking also > > because Robert Blatt is very interested to involve us officially in the > > program. > > > > I would like to thank Shane for sharing the framework published by Mozilla > > some days ago in our ComDev room on HipChat. > > Mozilla described a very interesting report adding also some technology > > stacks: > > https://blog.mozilla.org/blog/2018/05/15/whats-your-open- > source-strategy-here-are-10-answers/ > > > > Specifically we are talking about areas such as: Content, Search and > > Capture and even if OCR is not present in our projects, we have some native > > integrations for example with Tesseract on Tika. It can be interesting to > > understand which Apache projects can be combined with external libraries to > > build a custom Capture Services solution. > > > > For example considering the involvement of Tesseract, it could be the > > following proposal: > > > > - Apache ManifoldCF for crawling any source content repository (API -> > > contents as images or PDF) > > - Apache PDFBox for extracting images from PDF > > - Apache ManifoldCF for injecting contents in Solr > > - Tesseract for extracting text from images (configured inside Apache > > Tika) > > - Apache Solr for indexing extracted text > > > > We could also try to design a section totally dedicated to the Apache > > technology stacks: > > > > - Apache Content Services (JackRabbit, ...) > > - Apache Search Services (Lucene, Solr, ManifoldCF) > > - Apache Semantic Services (UIMA, Stanbol, ...) > > - Apache BigData Services (Hadoop, ...) > > - Apache DevOps Services (Mesos, ...) > > - Apache Libraries Services (Commons, ...) > > - ... and so on :-P > > > > This potential work can be useful internally for us to create our new > > Apache brochures dedicated to specific areas of our proposal. > > I'm not talking about something that is totally focused only on > > technologies but also on best practices, approaches and the good path for a > > natural adoption. > > > > I'm trying to understand if contributing on one side (ECM Standards) can > > help me to design and improve our Apache brochures. > > On the other hand the Apache areas can be also useful for the new white > > papers. > > > > Please let me know what you think. > > Thank you. > > > > Cheers, > > PJ > > > > -- > > Piergiorgio > > > > -- Piergiorgio Lucidi Open Source Evangelist and Digital Transformation Specialist Member / Mentor / PMC Member / Committer @ The Apache Software Foundation Community Star / Wiki Gardener / Global Forum Moderator @ Alfresco Author and Technical Reviewer @ Packt Publishing Technical Advisory Group Member @ Microsoft Top Community Contributor @ Crafter Project Leader / Committer @ JBoss https://www.open4dev.com