Re: Process for being part of the Incubator?

2022-09-21 Thread Pierce, Marlon
Hi Alan,

For starters, review these links to see if the ASF is a good fit for Open 
OnDemand:

* https://incubator.apache.org/
* https://incubator.apache.org/cookbook/
* https://apache.org/theapacheway/

A couple of key points from our experience:

* You’ll need to find a champion and mentors from the ASF community. Finding a 
good champion is key, and you’ll also want a diverse group of mentors. Mentors 
don’t need to have any direct interest in Open OnDemand. They’ll be coaching 
you on Apache governance processes for your community, plus they’ll bring 
diverse (non-university, for example) perspectives.

* Read the license agreement links in “Importing the Initial Code” in 
https://incubator.apache.org/cookbook/ carefully, along with any other legal 
and licensing documentation. You’ll need this to be reviewed and approved by 
your university lawyers. If any of the federal funding sources for OOD had an 
open source licensing requirement, this can really smooth the path (or at least 
it worked for us).

* It’s a tight schedule, but going to ApacheCon North America 
(https://www.apachecon.com/acna2022/) is a great way to get to know the 
community and attract some interest from potential champions and mentors.

Marlon



From: Chalker, Alan 
Date: Wednesday, September 21, 2022 at 2:13 AM
To: general@incubator.apache.org 
Subject: [External] Process for being part of the Incubator?
This message was sent from a non-IU address. Please exercise caution when 
clicking links or opening attachments from external sources.
---

Hi:

I'm hoping this is the correct address to start a discussion about potentially 
becoming an incubator project?

I'm co-PI on the Open OnDemand project 
(https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fopenondemand.org%2F&data=05%7C01%7Cmarpierc%40iu.edu%7C39b35f5e9f21425ed6e108da9b985435%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637993375847698218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fQEe%2BbOGGVfalCVZuHfAp%2FvraIhE2YqtVJm02UuK1l0%3D&reserved=0),
 which is an open source project currently in use at nearly 400 research 
computing centers all over the world.  The project has been in development for 
about a decade now thanks to several NSF grants.  We are looking at submitting 
within the next month another proposal to the NSF's Pathways to Enable 
Open-Source Ecosystems (POSE) solicitation 
(https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nsf.gov%2Fpubs%2F2022%2Fnsf22572%2Fnsf22572.htm&data=05%7C01%7Cmarpierc%40iu.edu%7C39b35f5e9f21425ed6e108da9b985435%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637993375847854431%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v%2FNu6K%2FljHxb5aexxXQjPmT5H%2BgC%2Bu5dXHkOxonKnTo%3D&reserved=0),
 which is targeted at helping established open-source projects like ours 
transition into a robust 'ecosystem'.  A key element of that is putting in 
place things like governance, sustainability, and community engagement models.

I was made aware of the Apache Incubator because Airavata is one of your 
projects. While I realize Open OnDemand might fall a bit outside your typical 
project, what I'm really hoping for is to find some sort of entity that we 
could utilize a significant portion of the $1.5M available budget from the POSE 
program as consultants to help with the overall process of transitioning Open 
OnDemand to a true open source ecosystem, since my colleagues and I have no 
experience whatsoever in that, nor do we necessarily have sufficient time 
available to execute such tasks.

Is this something you all would be interested in or could provide some guidance 
on?  Thanks.



---
Alan Chalker, Ph.D.
al...@osc.edu
614-247-8672

My working hours may not be your working hours.  Please do not feel obligated 
to reply outside of your normal working hours.



[DISCUSS] Incubating Proposal for Datark

2022-09-21 Thread Yu Li
Hi All,

I would like to propose Datark [1] as a new apache incubator project, and
you can find the proposal [2] of Datark for more details.

Datark is an intermediate (shuffle and spilled) data service for big data
compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
performance, stability, and flexibility. It aims at enabling computing
engines to fully embrace the disaggregated architecture. In a lot of cases,
intermediate data depends on large local disks, and is often a major cause
of inefficiency, instability, and inflexibility in the lifecycle of a
distributed job. Datark solves the problems through the following core
designs:

1. Push-based shuffle plus partition data aggregation to turn random IO
access into sequential access.
2. FileSystem-like API to support writing spilled data.
3. Hierarchical storage from memory to DFS/object store to enable fast
cache and massive storage space.
4. Engine-irrelevant APIs for easy integrating to various engines.
5. Extended fault tolerance and data replication to increase reliability

Datark is currently adopted in the production environment at both Alibaba
and many other companies, serving petabytes of data per day. Beyond that,
it has more open source users including Shopee, NetEase, Bilibily, BOSS,
and Synnex. Most of these users have made contributions to the project,
forming an active community with dozens of developers.

The proposed initial committers are interested in joining ASF to reinforce
extensive collaboration and build a more vibrant community. We believe the
Datark project will provide tremendous value for the community if it is
introduced into the Apache incubator.

I will help this project as the champion and many thanks to our four other
mentors:

* Becket Qin (j...@apache.org)
* Duo Zhang (zhang...@apache.org)
* Lidong Dai (lidong...@apache.org)
* Willem Jiang (ningji...@apache.org)

FWIW, although with different solutions, the issues Datark aims to resolve
have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
this during the discussion phase of Uniffle incubation (when we were also
preparing for the incubation) and had some open and friendly discussion to
see whether there could be a joint force [4], and finally decided to
develop independently for the time being [5].

Look forward to your feedback. Thanks.

Best Regards,
Yu

[1] https://github.com/alibaba/RemoteShuffleService
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
[3] https://uniffle.apache.org/
[4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
[5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw