Hello all, I would like to start off with an introduction. My name is Andrew Evans. I have 3 years of programming / dev experience in Java, Scala, Python, PostgreSQL, Spring (Boot, REST) ; etc. I am working on a startup as well to bring power to medium size datasets and build mobile applications capable of utilizing text and numeric data as one to make better predictions.
Between this and full time work, I have started several open sourced (currently BSD 2 claused) projects which could greatly benefit the community and empower everyone using big data with a simplified pipeline for ETL and Acquisition as well as a Scala/ Java version of Fabric for simplified system administration. I could really use some help making the following projects better and have full SRS and SDS documents available. OpenETL - A pipeline built around Pentaho and adding data Quality Assurance and some other basics such as initial SQL importing, communications, file system management, and large document parsing as needed. https://github.com/asevans48/OpenETL Acquisition Tools - A set of tools for acquiring and parsing data initially from any source over networks or via file systems with an aim of also including images and NLP. The current system is parallizable and threadable with a few tools to improve acquisition and initial intake. https://github.com/asevans48/AcquisitionTools ScalaFabric - Actually much broader but still fairly simple. It includes wrappers around the AWS SDK and Mesos SDK as well as interaction with the REST templates for Marathon and Chronos using Apache Http Components. A pipeline is in place to allow entire clusters to be generated from a single line of code and serialized clases or Json objects using FasterXML at the moment. https://github.com/asevans48/ScalaFabric Potentially, all three coudl be wrapped into a single environment with the last providing Carte or acquisition node support. I have the program set up to be able to support multiple clusters. If anyone is interested in helping, please let me know. Even a fork of one or more of the projects would be nice. I would be happy to shoot the SRS, SDS, and other docs over and get you integrated into the Scrum board at SeeNowDo. It is also possible to generate Java Docs from the code. I do dream of one day making all three Apache level projects. Thank you for your time, Andrew Evans Java Dev @ Hygenics Data, LLC Co-Founder and Dev @ SimplrTek, LLC and its subsidiaries