Yikes, apologies for the formatting. It looked fine in Gmail when I sent it alas.
I must let the proposers respond to the technical questions but I think I can make the general observation that would-be contributors proposing and performing work on new and better Apache ecosystem integrations would be excellent for the health of the new podling and the ecosystem at large. > On May 14, 2016, at 5:32 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote: > > Super excited to see this proposal! This will finally allow us to have > an ASF managed > backend for next generation data-driven apps that I see emerging quite > rapidly. > > The proposal looks great to me (although I'd recommend calling Scala > as an implementation > language more prominently since it may attract additional developers > with affinity to it). > > I do have two questions about technology: > 1. do you think it would be possible to leverage Apache Beam (incubating) > for abstracting away dependency on execution frameworks? My > understanding > is that PredictionIO currently only run on Spark. > 2. is there a potential integration with Apache Zeppelin possible? > > Thanks, > Roman. > >> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <apurt...@apache.org> wrote: >> Greetings, >> >> It is my pleasure to >> >> propose the PredictionIO project for incubation at the Apache Software >> Foundation. >> >> PredictionIO is a >> popular >> open >> >> source Machine Learning Server built on top of a state-of-the-art open >> source stack, including several Apache technologies, that >> >> enables developers to manage and deploy production-ready predictive >> services for various kinds of machine learning tasks >> , with more than 400 production deployments around the world and a growing >> contributor community. >> >> >> The text of the proposal is included below and is also available at >> https://wiki.apache.org/incubator/PredictionIO >> >> Best regards, >> Andrew Purtell >> >> >> = PredictionIO Proposal = >> >> === Abstract === >> PredictionIO is an open source Machine Learning Server built on top of >> state-of-the-art open source stack, that enables developers to manage and >> deploy production-ready predictive services for various kinds of machine >> learning tasks. >> >> === Proposal === >> The PredictionIO platform consists of the following components: >> >> * PredictionIO framework - provides the machine learning stack for >> building, evaluating and deploying engines with machine learning >> algorithms. It uses Apache Spark for processing. >> >> * Event Server - the machine learning analytics layer for unifying events >> from multiple platforms. It can use Apache HBase or any JDBC backends >> as its data store. >> >> The PredictionIO community also maintains a >> >> Template Gallery, a place to >> publish and download (free or proprietary) engine templates for different >> types of machine learning applications, and is a complemental part of the >> project. At this point we exclude the Template Gallery from the proposal, >> as it has a separate set of contributors and we’re not familiar with an >> Apache approved mechanism to maintain such a gallery. >> >> You can find the Template Gallery at https://templates.prediction.io/ >> >> === Background === >> PredictionIO was started with a mission to democratize and bring machine >> learning to the masses. >> >> Machine learning has traditionally been a luxury for big companies like >> Google, Facebook, and Netflix. There are ML libraries and tools lying >> around the internet but the effort of putting them all together as a >> production-ready infrastructure is a very resource-intensive task that is >> remotely reachable by individuals or small businesses. >> >> PredictionIO is a production-ready, full stack machine learning system that >> allows organizations of any scale to quickly deploy machine learning >> capabilities. It comes with official and community-contributed machine >> learning engine templates that are easy to customize. >> >> === Rationale === >> As usage and number of contributors to PredictionIO has grown bigger and >> more diverse, we have sought for an independent framework for the project >> to keep thriving. We believe the Apache foundation is a great fit. Joining >> Apache would ensure that tried and true processes and procedures are in >> place for the growing number of organizations interested in contributing >> to PredictionIO. PredictionIO is also a good fit for the Apache foundation. >> PredictionIO was built on top of several Apache projects (HBase, Spark, >> Hadoop). We are familiar with the Apache process and believe that the >> democratic and meritocratic nature of the foundation aligns with the >> project goals. >> >> === Initial Goals === >> The initial milestones will be to move the existing codebase to Apache and >> integrate with the Apache development process. Once this is accomplished, >> we plan for incremental development and releases that follow the Apache >> guidelines, as well as growing our developer and user communities. >> >> === Current Status === >> PredictionIO has undergone nine minor releases and many patches. >> PredictionIO is being used in production by Salesforce.com as well as many >> other organizations and apps. The PredictionIO codebase is currently >> hosted at GitHub, which will form the basis of the Apache git repository. >> >> ==== Meritocracy ==== >> We plan to invest in supporting a meritocracy. We will discuss the >> requirements in an open forum. We intend to invite additional developers >> to participate. We will encourage and monitor community participation so >> that privileges can be extended to those that contribute. >> >> ==== Community ==== >> Acceptance into the Apache foundation would bolster the already strong >> user and developer community around PredictionIO. That community includes >> many contributors from various other companies, and an active mailing list >> composed of hundreds of users. >> >> ==== Core Developers ==== >> The core developers of our project are listed in our contributors and >> initial PPMC below. Though many are employed at Salesforce.com, there are >> also engineers from ActionML, and independent developers. >> >> === Alignment === >> The ASF is the natural choice to host the PredictionIO project as its goal >> is democratizing Machine Learning by making it more easily accessible to >> every user/developer. PredictionIO is built on top of several top level >> Apache projects as outlined above. >> >> === Known Risks === >> >> ==== Orphaned products ==== >> PredictionIO has a solid and growing community. It is deployed on >> production environments by companies of all sizes to run various kinds of >> predictive engines. >> >> In addition to the community contribution to PredictionIO framework, the >> community is also actively contributing new engines to the Template >> Gallery as well as SDKs and documentation for the project. Salesforce is >> committed to utilize and advance the PredictionIO code base and support >> its user community. >> >> ==== Inexperience with Open Source ==== >> PredictionIO has existed as a healthy open source project for almost two >> years and is the most starred Scala project on GitHub. All of the proposed >> committers have contributed to ASF and Linux Foundation open source >> projects. Several current committers on Apache projects and Apache Members >> are involved in this proposal and intend to provide mentorship. >> >> ==== Homogeneous Developers ==== >> The initial list of committers includes developers from several >> institutions, including Salesforce, ActionML, Channel4, USC as well as >> unaffiliated developers. >> >> ==== Reliance on Salaried Developers ==== >> Like most open source projects, PredictionIO receives substantial support >> from salaried developers. PredictionIO development is partially supported >> by Salesforce.com, but there are many contributors from various other >> companies, and an active mailing list composed of hundreds of users. We >> will continue our efforts to ensure stewardship of the project to be >> independent of salaried developers by meritocratically promoting those >> contributors to committers. >> >> ==== Relationships with Other Apache Product ==== >> PredictionIO relies heavily on top level apache projects such as Apache >> Spark, HBase and Hadoop. However it brings a distinguished functionality, >> rather than just an abstraction - Machine Learning in a plug-and-play >> fashion. >> >> Compared to Apache Mahout, which focuses on the development of a wide >> variety of algorithms, PredictionIO offers a platform to manage the whole >> machine learning workflow, including data collection, data preparation, >> modeling, deployment and management of predictive services in production >> environments. >> >> ==== An Excessive Fascination with the Apache Brand ==== >> PredictionIO is already a widely known open source project. This proposal >> is not for the purpose of generating publicity. Rather, the primary >> benefits to joining Apache are those outlined in the Rationale section. >> >> === Documentation === >> PredictionIO boasts rich and live documentation, included in the code repo >> (docs/manual directory), is built with Middleman, and publicly hosted at >> https://docs.prediction.io >> >> === Initial Source and Intellectual Property Submission Plan === >> Currently, the PredictionIO codebase is distributed under the Apache 2.0 >> License and hosted on GitHub: https://github.com/PredictionIO/PredictionIO >> >> === External Dependencies === >> PredictionIO has the following external dependencies: >> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are needed) >> * Apache Spark 1.3.0 for Hadoop 2.4 >> * Java SE Development Kit 8 >> * and one of the following sets: >> >> * PostgreSQL 9.1 >> >> >> or >> >> >> * MySQL 5.1 >> >> or >> >> >> * Apache HBase 0.98.6 >> >> >> * Elasticsearch 1.4.0 >> >> Upon acceptance to the incubator, we would begin a thorough analysis of >> all transitive dependencies to verify this information and introduce >> license checking into the build and release process by integrating with >> Apache RAT. >> >> === Cryptography === >> PredictionIO does not include cryptographic code. We utilize standard >> JCE and JSSE APIs provided by the Java Runtime Environment. >> >> === Required Resources === >> We request that following resources be created for the project to use >> >> ==== Mailing lists ==== >> >> predictionio-priv...@incubator.apache.org (with moderated subscriptions) >> >> predictionio-dev >> >> predictionio-user >> >> predictionio-commits >> >> We will migrate the existing PredictionIO mailing lists. >> >> ==== Git repository ==== >> The PredictionIO team would like to use Git for source control, due to our >> current use of GitHub. >> >> git://git.apache.org/incubator-predictionio >> >> ==== Documentation ==== >> https://predictionio.incubator.apache.org/docs/ >> >> ==== JIRA instance ==== >> PredictionIO currently uses the GitHub issue tracking system associated >> with its repository: https://github.com/PredictionIO/PredictionIO/issues. >> We will migrate to Apache JIRA. >> >> JIRA PREDICTIONIO >> https://issues.apache.org/jira/browse/PREDICTIONIO >> >> ==== Other Resources ==== >> * TravisCI for builds and test running. >> >> * PredictionIO's documentation, included in the code repo (docs/manual >> directory), is built with Middleman and publicly hosted >> https://docs.prediction.io >> >> * A blog to drive adoption and excitement at https://blog.prediction.io >> >> === Initial Committers === >> >> * Pat Ferrell >> >> * Tamas Jambor >> >> * Justin Yip >> >> * Xusen Yin >> >> * Lee Moon Soo >> >> * Donald Szeto >> >> * Kenneth Chan >> >> * Tom Chan >> >> * Simon Chan >> >> * Marco Vivero >> >> * Matthew Tovbin >> >> * Yevgeny Khodorkovsky >> >> * Felipe Oliveira >> >> * Vitaly Gordon >> >> === Affiliations === >> >> * Pat Ferrell - ActionML >> >> * Tamas Jambor - Channel4 >> >> * Justin Yip - independent >> >> * Xusen Yin - USC >> >> * Lee Moon Soo - NFLabs >> >> * Donald Szeto - Salesforce >> >> * Kenneth Chan - Salesforce >> >> * Tom Chan - Salesforce >> >> * Simon Chan - Salesforce >> >> * Marco Vivero - Salesforce >> >> * Matthew Tovbin - Salesforce >> >> * Yevgeny Khodorkovsky - Salesforce >> >> * Felipe Oliveira - Salesforce >> >> * Vitaly Gordon - Salesforce >> >> === Sponsors === >> >> ==== Champion ==== >> >> Andrew Purtell <apurtell at apache dot org> >> >> ==== Nominated Mentors ==== >> >> * Andrew Purtell <apurtell at apache dot org> >> >> * James Taylor <jtaylor at apache dot org> >> >> * Lars Hofhansl <larsh at apache dot org> >> >> * Suneel Marthi <smarthi at apache dot org> >> >> * Xiangrui Meng <meng at apache dot org> >> >> * Luciano Resende <lresende at apache dot org> >> >> ==== Sponsoring Entity ==== >> >> Apache Incubator PMC > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org