Thanks, Vasia, for starting the discussion.

I was expecting more changes from the recent discussion on restructuring
the project, in particular regarding the libraries. Gelly has always
collected algorithms and I have personally taken an algorithms-first
approach for contributions. Is that manageable and maintainable? I'd prefer
to see no limit to good contributions, and if necessary split the codebase
or the project.

If so, then a secondary goal is to make the algorithms user-accessible and
easier to review (especially at scale!). FLINK-4949 rewrites
flink-gelly-examples with modular inputs and algorithms, allows users to
run all existing algorithms, and makes it trivial to create a driver for
new algorithms (and when comparing different implementations).

Regarding BipartiteGraphs, without algorithms or ideas for algorithms it's
not possible to review the structure of the open pull requests.

+1 to evaluating performance and promoting Flink!

Gelly has two shepherds whereas CEP and ML share one committer. New
algorithms in Gelly require new features in the Batch API (Gelly may also
start doing streaming, we're cool kids, too) so we need to find a process
for snuffing ideas early and for the right balance in dependence on core
committers' time. For example, reworking the iteration scheduler to allow
for intermediate outputs and nested iterations. Can this feature be
developed and reviewed within Gelly? Does it need the blessing of a Stephan
or Fabian? I'd like to see contributors and committers less dependent on
the core team and more autonomous.

Greg

On Fri, Feb 24, 2017 at 10:39 AM, Vasiliki Kalavri <
vasilikikala...@gmail.com> wrote:

> Hello squirrels,
>
> this is a discussion thread to organize the Gelly component development for
> release 1.3 and discuss longer-term plans for the library.
>
> I am hoping that with time-based releases, we can distribute the load for
> PR reviewing and make better use of our time, and also point contributors
> to "useful" tickets when they offer to help.
>
> I'm expecting the outcome of this discussion to be:
>
> (1) a set of open PRs to review and try merging for 1.3
> (2) a set of open JIRAs to work-on before feature freeze
> (3) a set of JIRAs and PRs to reorganize/close
> (4) ideas on possible FLIPs
>
> Here's my initial take on things, i.e. features *I* see as important in the
> short-term. Feel free to add/remove/discuss:
>
> Release 1.3
> ==========
> - Bipartite graph support. Initial support has been added, but there
> are unreviewed
> PRs
> <https://github.com/apache/flink/pulls?utf8=%E2%9C%93&q=
> is%3Apr%20is%3Aopen%20bipartite%20>
> and there is no Scala API yet. It would be nice to organize this feature,
> decide what functionality we need and what functionality is already covered
> by the Graph type and have proper bipartite support for 1.3.
> - Driver improvements, i.e. #3294
> <https://github.com/apache/flink/pull/3294>
> - Algorithm improvements, #2733 <https://github.com/apache/flink/pull/2733
> >
> - Affinity Propagation algorithm. This one has been developed using a bulk
> iteration plan and needs a review. The PR is #2885
> <https://github.com/apache/flink/pull/2885>.
> - Object reuse issues, FLINK-5890, FLINK-5891
> - Vertex-centric iteration improvement, i.e. FLINK-5127
>
>
> Roadmap
> ========
> Regarding longer-term plans, I see the following issues as still being
> relevant from the existing roadmap [1]:
> - Extending the iteration functionality to support algorithms, more complex
> than value-propagation, e.g. with nested loops
> - Partitioning methods
> - Partition-centric iterations
> - Performance evaluation
>
> These two lists are by no means complete or final and the goal of this
> thread is to see what the community is interested in, whether these
> features / additions make sense to be worked on, or what features are
> missing.
> So, please provide your feedback!
>
> Cheers,
> -V.
>
> [1]: https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly
>

Reply via email to