Hi Konstantin,

Thanks for the thoughtful and detailed writeup.

And yes, +1 to all 5 top-level suggestions.

— Ken

> On Mar 29, 2017, at 10:39am, Konstantin Gribov <gros...@gmail.com> wrote:
> 
> Hi, folks.
> 
> Currently we have something like contribution guide parts in several places
> (I thought about [1] and [2] and Chris also mentioned [3]) covering
> different facets of contributing to Apache Tika.
> 
> One thing which make me upset is that we have very inconsistent codebase
> with different style, formatting, dependency management. It seems
> inevitable on some stage of any popular open source project developed by
> many contributors. But we can make it more consistent with moderate effort
> for maintaining status quo after.
> 
> I propose:
> 
>   1. make one source of truth about contribution guide and then
>   automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
>   tika.a.o etc;
>   2. add info about logging in tika-core and other packages to these
>   contribution guide to make all contributions consistent with current policy
>   (with examples how logging should be used in different modules):
>      1. JUL in tika-core
>      2. SLF4J in `private static final Logger LOG` field in all other
>      modules;
>      3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
>      levels for upstream libraries) and standalone application (e.g.
> to support
>      `--quiet` and `--verbose` CLI keys);
>      4. Document logging configuration in case OSGi bundle is used;
>   3. add info about dependency handling (e.g. no additional deps in
>   tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
>   from dependencies etc);
>   4. integrate checkstyle plugin [5], [6] to Maven build to allow
>   contributors easily check that their code is conformant with simple policy
>   to start (4 spaces indent, no TABs, spaces before opening braces, spaces
>   after if/else/try/catch/finally, egyptian-style braces);
>   5. add documentation about checkstyle [5] configuration in IDE to
>   simplify it's usage (I can write one for JetBrains IDEA at least).
> 
> Main point are to bring Tika codebase to more consistent and clear state,
> simplify its maintainance and make it easier for contributors to make clean
> and pretty patches. Checkstyle configuration should be as simple as it can
> be to real to refactor.
> 
> Also, these items should be integrated gradually, step by step.
> 
> What do you think, folks?
> Would it be good thing for Tika and its community?
> Would it bring any serios challenges of which I've forgot?
> 
> [1]: http://tika.apache.org/contribute.html
> [2]: https://wiki.apache.org/tika/DeveloperResources
> [3]: https://github.com/apache/tika/#contributing-via-github
> [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
> [5]: http://checkstyle.sourceforge.net/
> [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
> 
> 
> 
> -- 
> 
> Best regards,
> Konstantin Gribov

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply via email to