Yikes, apologies for the formatting. It looked fine in Gmail when I sent it 
alas. 

I must let the proposers respond to the technical questions but I think I can 
make the general observation that would-be contributors proposing and 
performing work on new and better Apache ecosystem integrations would be 
excellent for the health of the new podling and the ecosystem at large. 


> On May 14, 2016, at 5:32 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> 
> Super excited to see this proposal! This will finally allow us to have
> an ASF managed
> backend for next generation data-driven apps that I see emerging quite 
> rapidly.
> 
> The proposal looks great to me (although I'd recommend calling Scala
> as an implementation
> language more prominently since it may attract additional developers
> with affinity to it).
> 
> I do have two questions about technology:
>   1. do you think it would be possible to leverage Apache Beam (incubating)
>       for abstracting away dependency on execution frameworks? My 
> understanding
>       is that PredictionIO currently only run on Spark.
>   2. is there a potential integration with Apache Zeppelin possible?
> 
> Thanks,
> Roman.
> 
>> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <apurt...@apache.org> wrote:
>> Greetings,
>> 
>> It is my pleasure to
>> 
>> propose the PredictionIO project for incubation at the Apache Software
>> Foundation.
>> 
>> PredictionIO is a
>> popular
>> open
>> 
>> source Machine Learning Server built on top of a state-of-the-art open
>> source stack, including several Apache technologies, that
>> 
>> enables developers to manage and deploy production-ready predictive
>> services for various kinds of machine learning tasks
>> , with more than 400 production deployments around the world and a growing
>> contributor community.
>> 
>> 
>> The text of the proposal is included below and is also available at
>> https://wiki.apache.org/incubator/PredictionIO
>> 
>> Best regards,
>> Andrew Purtell
>> 
>> 
>> = PredictionIO Proposal =
>> 
>> === Abstract ===
>> PredictionIO is an open source Machine Learning Server built on top of
>> state-of-the-art open source stack, that enables developers to manage and
>> deploy production-ready predictive services for various kinds of machine
>> learning tasks.
>> 
>> === Proposal ===
>> The PredictionIO platform consists of the following components:
>> 
>> * PredictionIO framework - provides the machine learning stack for
>> building, evaluating and deploying engines with machine learning
>> algorithms. It uses Apache Spark for processing.
>> 
>> * Event Server - the machine learning analytics layer for unifying events
>> from multiple platforms. It can use Apache HBase or any JDBC backends
>> as its data store.
>> 
>> The PredictionIO community also maintains a
>> 
>> Template Gallery, a place to
>> publish and download (free or proprietary) engine templates for different
>> types of machine learning applications, and is a complemental part of the
>> project. At this point we exclude the Template Gallery from the proposal,
>> as it has a separate set of contributors and we’re not familiar with an
>> Apache approved mechanism to maintain such a gallery.
>> 
>> You can find the Template Gallery at https://templates.prediction.io/
>> 
>> === Background ===
>> PredictionIO was started with a mission to democratize and bring machine
>> learning to the masses.
>> 
>> Machine learning has traditionally been a luxury for big companies like
>> Google, Facebook, and Netflix. There are ML libraries and tools lying
>> around the internet but the effort of putting them all together as a
>> production-ready infrastructure is a very resource-intensive task that is
>> remotely reachable by individuals or small businesses.
>> 
>> PredictionIO is a production-ready, full stack machine learning system that
>> allows organizations of any scale to quickly deploy machine learning
>> capabilities. It comes with official and community-contributed machine
>> learning engine templates that are easy to customize.
>> 
>> === Rationale ===
>> As usage and number of contributors to PredictionIO has grown bigger and
>> more diverse, we have sought for an independent framework for the project
>> to keep thriving. We believe the Apache foundation is a great fit. Joining
>> Apache would ensure that tried and true processes and procedures are in
>> place for the growing number of organizations interested in contributing
>> to PredictionIO. PredictionIO is also a good fit for the Apache foundation.
>> PredictionIO was built on top of several Apache projects (HBase, Spark,
>> Hadoop). We are familiar with the Apache process and believe that the
>> democratic and meritocratic nature of the foundation aligns with the
>> project goals.
>> 
>> === Initial Goals ===
>> The initial milestones will be to move the existing codebase to Apache and
>> integrate with the Apache development process. Once this is accomplished,
>> we plan for incremental development and releases that follow the Apache
>> guidelines, as well as growing our developer and user communities.
>> 
>> === Current Status ===
>> PredictionIO has undergone nine minor releases and many patches.
>> PredictionIO is being used in production by Salesforce.com as well as many
>> other organizations and apps. The PredictionIO codebase is currently
>> hosted at GitHub, which will form the basis of the Apache git repository.
>> 
>> ==== Meritocracy ====
>> We plan to invest in supporting a meritocracy. We will discuss the
>> requirements in an open forum. We intend to invite additional developers
>> to participate. We will encourage and monitor community participation so
>> that privileges can be extended to those that contribute.
>> 
>> ==== Community ====
>> Acceptance into the Apache foundation would bolster the already strong
>> user and developer community around PredictionIO. That community includes
>> many contributors from various other companies, and an active mailing list
>> composed of hundreds of users.
>> 
>> ==== Core Developers ====
>> The core developers of our project are listed in our contributors and
>> initial PPMC below. Though many are employed at Salesforce.com, there are
>> also engineers from ActionML, and independent developers.
>> 
>> === Alignment ===
>> The ASF is the natural choice to host the PredictionIO project as its goal
>> is democratizing Machine Learning by making it more easily accessible to
>> every user/developer. PredictionIO is built on top of several top level
>> Apache projects as outlined above.
>> 
>> === Known Risks ===
>> 
>> ==== Orphaned products ====
>> PredictionIO has a solid and growing community. It is deployed on
>> production environments by companies of all sizes to run various kinds of
>> predictive engines.
>> 
>> In addition to the community contribution to PredictionIO framework, the
>> community is also actively contributing new engines to the Template
>> Gallery as well as SDKs and documentation for the project. Salesforce is
>> committed to utilize and advance the PredictionIO code base and support
>> its user community.
>> 
>> ==== Inexperience with Open Source ====
>> PredictionIO has existed as a healthy open source project for almost two
>> years and is the most starred Scala project on GitHub. All of the proposed
>> committers have contributed to ASF and Linux Foundation open source
>> projects. Several current committers on Apache projects and Apache Members
>> are involved in this proposal and intend to provide mentorship.
>> 
>> ==== Homogeneous Developers ====
>> The initial list of committers includes developers from several
>> institutions, including Salesforce, ActionML, Channel4, USC as well as
>> unaffiliated developers.
>> 
>> ==== Reliance on Salaried Developers ====
>> Like most open source projects, PredictionIO receives substantial support
>> from salaried developers. PredictionIO development is partially supported
>> by Salesforce.com, but there are many contributors from various other
>> companies, and an active mailing list composed of hundreds of users. We
>> will continue our efforts to ensure stewardship of the project to be
>> independent of salaried developers by meritocratically promoting those
>> contributors to committers.
>> 
>> ==== Relationships with Other Apache Product ====
>> PredictionIO relies heavily on top level apache projects such as Apache
>> Spark, HBase and Hadoop. However it brings a distinguished functionality,
>> rather than just an abstraction - Machine Learning in a plug-and-play
>> fashion.
>> 
>> Compared to Apache Mahout, which focuses on the development of a wide
>> variety of algorithms, PredictionIO offers a platform to manage the whole
>> machine learning workflow, including data collection, data preparation,
>> modeling, deployment and management of predictive services in production
>> environments.
>> 
>> ==== An Excessive Fascination with the Apache Brand ====
>> PredictionIO is already a widely known open source project. This proposal
>> is not for the purpose of generating publicity. Rather, the primary
>> benefits to joining Apache are those outlined in the Rationale section.
>> 
>> === Documentation ===
>> PredictionIO boasts rich and live documentation, included in the code repo
>> (docs/manual directory), is built with Middleman, and publicly hosted at
>> https://docs.prediction.io
>> 
>> === Initial Source and Intellectual Property Submission Plan ===
>> Currently, the PredictionIO codebase is distributed under the Apache 2.0
>> License and hosted on GitHub: https://github.com/PredictionIO/PredictionIO
>> 
>> === External Dependencies ===
>> PredictionIO has the following external dependencies:
>> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are needed)
>> * Apache Spark 1.3.0 for Hadoop 2.4
>> * Java SE Development Kit 8
>> * and one of the following sets:
>> 
>>   * PostgreSQL 9.1
>> 
>> 
>> or
>> 
>> 
>> * MySQL 5.1
>> 
>> or
>> 
>> 
>> * Apache HBase 0.98.6
>> 
>> 
>> * Elasticsearch 1.4.0
>> 
>> Upon acceptance to the incubator, we would begin a thorough analysis of
>> all transitive dependencies to verify this information and introduce
>> license checking into the build and release process by integrating with
>> Apache RAT.
>> 
>> === Cryptography ===
>> PredictionIO does not include cryptographic code. We utilize standard
>> JCE and JSSE APIs provided by the Java Runtime Environment.
>> 
>> === Required Resources ===
>> We request that following resources be created for the project to use
>> 
>> ==== Mailing lists ====
>> 
>> predictionio-priv...@incubator.apache.org (with moderated subscriptions)
>> 
>> predictionio-dev
>> 
>> predictionio-user
>> 
>> predictionio-commits
>> 
>> We will migrate the existing PredictionIO mailing lists.
>> 
>> ==== Git repository ====
>> The PredictionIO team would like to use Git for source control, due to our
>> current use of GitHub.
>> 
>> git://git.apache.org/incubator-predictionio
>> 
>> ==== Documentation ====
>> https://predictionio.incubator.apache.org/docs/
>> 
>> ==== JIRA instance ====
>> PredictionIO currently uses the GitHub issue tracking system associated
>> with its repository: https://github.com/PredictionIO/PredictionIO/issues.
>> We will migrate to Apache JIRA.
>> 
>> JIRA PREDICTIONIO
>> https://issues.apache.org/jira/browse/PREDICTIONIO
>> 
>> ==== Other Resources ====
>> * TravisCI for builds and test running.
>> 
>> * PredictionIO's documentation, included in the code repo (docs/manual
>> directory), is built with Middleman and publicly hosted
>> https://docs.prediction.io
>> 
>> * A blog to drive adoption and excitement at https://blog.prediction.io
>> 
>> === Initial Committers ===
>> 
>> * Pat Ferrell
>> 
>> * Tamas Jambor
>> 
>> * Justin Yip
>> 
>> * Xusen Yin
>> 
>> * Lee Moon Soo
>> 
>> * Donald Szeto
>> 
>> * Kenneth Chan
>> 
>> * Tom Chan
>> 
>> * Simon Chan
>> 
>> * Marco Vivero
>> 
>> * Matthew Tovbin
>> 
>> * Yevgeny Khodorkovsky
>> 
>> * Felipe Oliveira
>> 
>> * Vitaly Gordon
>> 
>> === Affiliations ===
>> 
>> * Pat Ferrell - ActionML
>> 
>> * Tamas Jambor - Channel4
>> 
>> * Justin Yip - independent
>> 
>> * Xusen Yin - USC
>> 
>> * Lee Moon Soo - NFLabs
>> 
>> * Donald Szeto - Salesforce
>> 
>> * Kenneth Chan - Salesforce
>> 
>> * Tom Chan - Salesforce
>> 
>> * Simon Chan - Salesforce
>> 
>> * Marco Vivero - Salesforce
>> 
>> * Matthew Tovbin - Salesforce
>> 
>> * Yevgeny Khodorkovsky - Salesforce
>> 
>> * Felipe Oliveira - Salesforce
>> 
>> * Vitaly Gordon - Salesforce
>> 
>> === Sponsors ===
>> 
>> ==== Champion ====
>> 
>> Andrew Purtell <apurtell at apache dot org>
>> 
>> ==== Nominated Mentors ====
>> 
>> * Andrew Purtell <apurtell at apache dot org>
>> 
>> * James Taylor <jtaylor at apache dot org>
>> 
>> * Lars Hofhansl <larsh at apache dot org>
>> 
>> * Suneel Marthi <smarthi at apache dot org>
>> 
>> * Xiangrui Meng <meng at apache dot org>
>> 
>> * Luciano Resende <lresende at apache dot org>
>> 
>> ==== Sponsoring Entity ====
>> 
>> Apache Incubator PMC
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to