Owen, Given that the majority of this code was produced as a work-for-hire, wouldn't it be good to get some sign off from Cisco on this migration?
On Tue, Dec 1, 2015 at 3:55 AM, Owen O'Malley <omal...@apache.org> wrote: > Hi all, > > We'd like to start a discussion proposing creating Metron as an incubator > podling. The proposal is on the wiki here: > https://wiki.apache.org/incubator/MetronProposal > > I would call your attention to the background section in particular. The > condensed version is that the original code base (OpenSOC) was created by a > company (Cisco) that put it on github as ALv2, but then hasn't been working > on it. We posted a message > <https://groups.google.com/d/msg/opensoc-support/rFlW2uSSvmU/Sw_cO-T2AAAJ> > to the OpenSOC support group a month ago proposing a move to Apache and got > a single positive response. > > The text of the proposal is included below for easy quoting during > discussion. > > Thanks, > Owen > > = Apache Metron Proposal = > > == Abstract == > > The Metron project is an open source project dedicated to providing an > extensible and scalable advanced security analytics tool. It has strong > foundations in the Apache Hadoop ecosystem. > > == Proposal == > > Metron integrates a variety of open source big data technologies in order > to offer a centralized tool for security monitoring and analysis. Metron > provides capabilities for log aggregation, full packet capture indexing, > storage, advanced behavioral analytics and data enrichment, while applying > the most current threat-intelligence information to security telemetry > within a single platform. > > Metron can be divided into 4 areas: > > 1. '''A mechanism to capture, store, and normalize any type of security > telemetry at extremely high rates.''' Because security telemetry is > constantly being generated, it requires a method for ingesting the data at > high speeds and pushing it to various processing units for advanced > computation and analytics. > 1. '''Real time processing and application of enrichments''' such as > threat intelligence, geolocation, and DNS information to telemetry being > collected. The immediate application of this information to incoming > telemetry provides the context and situational awareness, as well as the > “who” and “where” information that is critical for investigation. > 1. '''Efficient information storage''' based on how the information will > be used: > a. Logs and telemetry are stored such that they can be efficiently > mined and analyzed for concise security visibility > a. The ability to extract and reconstruct full packets helps an analyst > answer questions such as who the true attacker was, what data was leaked, > and where that data was sent > a. Long-term storage not only increases visibility over time, but also > enables advanced analytics such as machine learning techniques to be used > to create models on the information. Incoming data can then be scored > against these stored models for advanced anomaly detection. > 1. '''An interface that gives a security investigator a centralized view > of data and alerts passed through the system.''' Metron’s interface > presents alert summaries with threat intelligence and enrichment data > specific to that alert on one single page. Furthermore, advanced search > capabilities and full packet extraction tools are presented to the analyst > for investigation without the need to pivot into additional tools. > > Big data is a natural fit for powerful security analytics. The Metron > framework integrates a number of elements from the Hadoop ecosystem to > provide a scalable platform for security analytics, incorporating such > functionality as full-packet capture, stream processing, batch processing, > real-time search, and telemetry aggregation. With Metron, our goal is to > tie big data into security analytics and drive towards an extensible > centralized platform to effectively enable rapid detection and rapid > response for advanced security threats. > > == Background == > > OpenSOC was developed by Cisco over the last two years and pushed out to > Github (https://github.com/OpenSOC/opensoc) under the ALv2. However, the > development was mostly closed and has largely stopped. As evidence of the > inactivity, users have complained that pull requests are not answered for a > while > https://groups.google.com/d/msg/opensoc-support/R2W-ZFux8Vk/Y-5tL-EmAAAJ. > Finally, no public releases of OpenSOC have been made. From an Apache point > of view, the current community is not viable. > > However, some of the developers of the project have left Cisco and have > found interest from several others that would like to work together to form > an active and open community at Apache starting from the current OpenSOC > code base. A message to the current support group proposing moving to > Apache got a single positive response. > https://groups.google.com/d/msg/opensoc-support/rFlW2uSSvmU/09PIsWL4AAAJ > > Because Cisco is not currently interested in being involved, the project > expects to change their name. The project would like to use Metron, > although we will perform a podling name search to check for conflicts. > Metron, meaning measure, is half of the greek root for the word > 'telemetry.' Metron is also a DC Comics character who “... wanders in > search of greater knowledge beyond his own”. > > > == Rationale == > Metron strives to move the state of the art in security analytics forward. > We want to move away from the proprietary nature of legacy security point > tools and develop an open platform where people can contribute and share > datasets, machine learning models, telemetry parsers, sources of telemetry > enrichment, and threat intelligence feeds. Cyber security is too large of > a problem for a single corporation to tackle on its own and the current > tooling is too fragmented and proprietary for us to be able to rally around > a single tool or vendor. > > In addition to being open and facilitating advancement in security > analytics, Metron has several advantages over a conventional Security > Information Management System (SIEM). > > * Metron uses all open source stack under the hood and runs on commodity > hardware. This means Metron is much cheaper to run then the competition. > In security cost plays a major factor because the cost of your > countermeasure for monitoring and reacting to a threat should not exceed > the cost of what is being protected. By driving down the cost of security > the economics works for more assets to be monitored, which means more > secure data centers. > * Metron, being in the open, allows additional vetting and scrutiny by > the open source community for all of its components. This is a better > model for a security-oriented tool than doing it closed source. All the > problems should be flushed out and fixed in the open. The closed source > competition does not have this kind of rigor, is motivated by marketing and > sales, and thus, does not inspire confidence when it comes to security. > * Being Hadoop-based, Metron can process unprecedented volumes of > streaming data via Apache Storm. When an organization is hit with malware > or malicious behavior most commonly this happens as a part of a global > malware campaign, signatures for which are known and are available from > third party threat intelligence feeds. Having the ability to take in all > the feeds and reference them against every telemetry message processed by > Metron in real time does not only facilitate detection of such campaigns, > it changes the economics for the “bad guys”. If you have to customize your > malware for each of your targets these global attacks become a lot more > expensive and non viable for them. > * Metron strives to shift conventional SOC workflows away from being > rules-driven to a more data-driven approach that incorporates machine > learning and a higher degree of automation and autonomous detection. The > modern threat landscape is too dynamic to be manageable via static rules > alone, which is what conventional SIEMs rely on. Rule bases tend to bloat, > and if improperly maintained turn themselves into sources of false positive > alerts. > > The ability to analyze and model large volumes of data at rest and then > being able to push up the output of that into a stream processor is > essential in disrupting the > > == Current Status == > > As stated in the background section, the current community isn’t healthy, > which is why we are proposing moving to Apache Incubator. In this section, > we will describe the current state of the OpenSOC project. > > === Meritocracy === > The OpenSOC development is controlled by Cisco and pull requests are being > ignored. The development list is private and requests to join are rejected > because there is no activity on it. The goal of moving to Apache is to form > a meritocracy where a variety of individuals, regardless of their current > employer, come together and work together. We understand that diversity, > open development, and open governance are critical to being a successful > Apache project. > > === Community === > The OpenSOC project is not responding to pull requests or making releases. > The easiest solution would be to create a variety of forks of the project > on github, but that would further fracture the community and prevent it > from reaching critical mass. Our prefered solution is to build a single > large diverse and open community at Apache. > > === Core Developers === > The core developers of Metron are James Sirota, Charles Porter, and Mark > Bittmann. None of them have experience running an open source project, but > they are eager to learn. > > === Alignment === > The ASF is a natural host for Metron given that it is already the home of > Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data > projects. Metron leverages many of Apache open-source products. We are very > interested in a place to develop our community and integrations with the > other Apache big data projects. > > == Known Risks == > > === Orphaned Products === > > The current product developers are all salaried developers at a small > number of companies and thus there is a risk of becoming an orphaned > product. However, the companies view Metron as very important to their > product offering and plan to ramp up their work in the space. The project > is unique in the product space and thus has strong potential to become a > sustainable community. > > === Inexperience with Open Source === > The vast majority of the developers are inexperienced with open source > development and the Apache Way. One of the major hurdles to graduation from > the Apache Incubator will be demonstrating that they have learned the > Apache Way and are applying it to how the project is managed. Vinod Kumar > Vavilapalli is an Apache Member and plans on actively working as a > committer in the project. They also have the other mentors to help them > learn as they progress. > > === Homogenous Developers === > The developers are employed by four diverse companies (B23, Hortonworks, > Mantech, and Rackspace), They are distributed across the United States. We > hope to attract additional diversity as an Apache project. > > === Reliance on Salaried Developers === > Metron is currently being developed exclusively by salaried developers, but > the goal of coming to Apache is to form a community of users and developers > that is much more diverse including non-salaried developers. > > === Relationships with Other Apache Products === > Metron has a strong relationship and dependency with Apache Flume, Hadoop, > HBase, Hive, Kafka, Spark, and Storm. Being part of Apache’s Incubation > community could help with a closer collaboration among these projects and > as well as others. > > We note that although there is a superficial resemblance to Apache Eagle, > which does security analysis of Hadoop audit events, the projects are > significantly different. In particular, Metron is focused on analyzing > network packet traffic and thus has a very different scope and scale of > events than Eagle. > > === An Excessive Fascination with the Apache Brand === > > While the Apache brand is important, we are much more interested in finding > a home for the project that encourages open development and open > governance. We want to form the new community using the Apache Way with its > strong focus on meritocracy, organizational independence, and open > development. > > == Documentation == > The current information on the OpenSOC project is here: > http://opensoc.github.io/ > A slide deck presenting background material is here: > http://www.slideshare.net/JamesSirota/cisco-opensoc > > == Initial Source == > The initial code is on github: http://opensoc.github.io/ > > == External Dependencies == > Metron has the following external dependencies: > * Apache Flume > * Apache Hadoop > * Apache HBase > * Apache Hive > * Apache Kafka > * Apache Spark > * Apache Storm > * ElasticSearch > * MySQL > > The project understands that it will need to support alternatives for MySQL > that are licensed under a ALv2 compatible license. > > == Cryptography == > Metron will eventually support encryption on the wire, but this is not one > of the initial goals, and we do not expect Metron to be a controlled export > item due to the use of encryption. Metron supports but does not require the > Kerberos authentication mechanism to access secured Hadoop services. > > == Required Resources == > > === Mailing List === > > * metron-private for private PMC discussions > * metron-dev for developers > * metron-commits for all commits > * metron-users for all users > > === Version Control === > Git is the preferred source control system. > > === Issue Tracking === > > * JIRA (METRON) > > === Other Resources === > The existing code already has unit tests so we will make use of existing > Apache continuous testing infrastructure. The resulting load should not be > very large. > > == Initial Committers == > * Jim Baker < jim.baker at rackspace dot com > > * Mark Bittmann < mark at b23 dot io > > * Sheetal Dolas < sheetal at hortonworks dot com > > * Discovery Gerdes < discovery.gerdes at rackspace dot com > > * Andrew Hartnett < andrew.hartnett at rackspace dot com > > * Dave Hirko < dave at b23 dot io > > * Paul Kehrer < paul.kehrer at rackspace dot com > > * Brad Kolarov < brad at b23 dot io > > * Kiran Komaravolu <kkomaravolu at hortonworks dot com > > * Ryan Merriman < rmerriman at hortonworks dot com > > * Michael Perez <michael.perez at hortonworks dot com> > * Charles Porter <Charles.Porter at mcs dot mantech dot com > > * Sean Schulte < sean.schulte at rackspace dot com > > * James Sirota < jsirota at hortonworks dot com > > * Casey Stella < cstella at hortonworks dot com > > * Bryan Taylor < bryan.taylor at rackspace dot com > > * Ray Urciuoli < Ray.Urciuoli at mcs dot mantech dot com > > * Vinod Kumar Vavilapalli < vinodkv at apache dot org > > * George Vetticaden < gvetticaden at hortonworks dot com > > * Oskar Zabik < oskar.zabik at rackspace dot com > > > == Affiliations == > The initial committers are employees of: > * Jim Baker - Rackspace > * Mark Bittmann - B23 > * Sheetal Dolas - Hortonworks > * Discovery Gerdes - Rackspace > * Andrew Hartnett - Rackspace > * Dave Hirko - B23 > * Paul Kehrer - Rackspace > * Brad Kolarov - B23 > * Kiran Komaravolu - Hortonworks > * Ryan Merriman - Hortonworks > * Michael Perez - Hortonworks > * Charles Porter - Mantech > * Sean Schulte - Rackspace > * James Sirota - Hortonworks > * Casey Stella - Hortonworks > * Bryan Taylor - Rackspace > * Ray Urciuoli - Mantech > * Vinod Kumar Vavilapalli - Hortonworks > * George Vetticaden - Hortonworks > * Oskar Zabik - Rackspace > > == Sponsors == > > === Champion === > * Owen O’Malley - Apache IPMC member > > === Nominated Mentors === > * Chris Mattmann <mattmann at apache dot org > - Apache IPMC member, NASA > * Owen O’Malley <omalley at apache dot org > - Apache IPMC member, > Hortonworks > * Billie Rinaldi < billie at apache dot org > - Apache IPMC member, > Hortonworks > * Vinod Kumar Vavilapalli < vinodkv at apache dot org > - Apache IPMC > member, Hortonworks > > === Sponsoring Entity === > We are requesting the Incubator to sponsor this project. >