Hi Konstantin, Thanks for the thoughtful and detailed writeup.
And yes, +1 to all 5 top-level suggestions. — Ken > On Mar 29, 2017, at 10:39am, Konstantin Gribov <gros...@gmail.com> wrote: > > Hi, folks. > > Currently we have something like contribution guide parts in several places > (I thought about [1] and [2] and Chris also mentioned [3]) covering > different facets of contributing to Apache Tika. > > One thing which make me upset is that we have very inconsistent codebase > with different style, formatting, dependency management. It seems > inevitable on some stage of any popular open source project developed by > many contributors. But we can make it more consistent with moderate effort > for maintaining status quo after. > > I propose: > > 1. make one source of truth about contribution guide and then > automatically mirror it to README.md/CONTRIBUTING.md for github, publish on > tika.a.o etc; > 2. add info about logging in tika-core and other packages to these > contribution guide to make all contributions consistent with current policy > (with examples how logging should be used in different modules): > 1. JUL in tika-core > 2. SLF4J in `private static final Logger LOG` field in all other > modules; > 3. Allow to use logging backend (log4j) in tests (e.g. for tuning log > levels for upstream libraries) and standalone application (e.g. > to support > `--quiet` and `--verbose` CLI keys); > 4. Document logging configuration in case OSGi bundle is used; > 3. add info about dependency handling (e.g. no additional deps in > tika-core policy, exlusion of commons-logging/commons-logging-api/log4j > from dependencies etc); > 4. integrate checkstyle plugin [5], [6] to Maven build to allow > contributors easily check that their code is conformant with simple policy > to start (4 spaces indent, no TABs, spaces before opening braces, spaces > after if/else/try/catch/finally, egyptian-style braces); > 5. add documentation about checkstyle [5] configuration in IDE to > simplify it's usage (I can write one for JetBrains IDEA at least). > > Main point are to bring Tika codebase to more consistent and clear state, > simplify its maintainance and make it easier for contributors to make clean > and pretty patches. Checkstyle configuration should be as simple as it can > be to real to refactor. > > Also, these items should be integrated gradually, step by step. > > What do you think, folks? > Would it be good thing for Tika and its community? > Would it bring any serios challenges of which I've forgot? > > [1]: http://tika.apache.org/contribute.html > [2]: https://wiki.apache.org/tika/DeveloperResources > [3]: https://github.com/apache/tika/#contributing-via-github > [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue > [5]: http://checkstyle.sourceforge.net/ > [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/ > > > > -- > > Best regards, > Konstantin Gribov -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr