Re: MiNiFi commits for h2o

Marc Parisi Mon, 04 May 2020 08:39:46 -0700

Hi Joe,

When merging I did not use the NiFi processors to test as I already have
tooling around H20 driven ML in my home. While I don't admittedly use it
very often, I thought the python processors were quite cool and could
certainly be useful for others. My rationale is that the dependencies are
not built into source or binary for MiNiFi C++. I would agree on the Java
side that this would be unavoidable. While it would be preferential to have
the NiFi analogue I do not think it would be required when considering the
MiNiFi C++ processors; however, James should be the arbiter of that
decision.


While I disagree that the MiNiFi portion should be removed, I don't have a
strong argument to keep or remove the MiNiFI C++ code so I'd be happy to
revert if the community feels the need to.

Thanks,
Marc

On Mon, May 4, 2020 at 10:50 AM Joe Witt <[email protected]> wrote:

> Team
>
> The below two commits raise serious concerns.
>
> I want to be clear here and point out that h2o is cool.  Having such
> integration is a neat concept and idea and one that certainly warrants
> determination on how best to do so.
>
> My issue is with these two commits as it relates to licensing and
> maintenance.
>
> License:
> We should support vendors like this wanting to bring their capabilities
> into NiFi generally sure. But the licensing and mode of use is critical.
> In talking about this with the contributor for NiFi as well it is clear
> that at least some or important portions of this require the user to have a
> 'driverless ai license' so they can include their jar or build pipelines or
> whatnot.  Thus it isn't usable without that first. So it might be the case
> that this stuff is source dependent only on ASF compatible licenses in
> terms of source - but it certainly doesn't seem to be true in terms of
> binary dependencies. Where in the PR or JIRA is there any discussion or
> review of the licensing?  I see plenty from James to believe this needs to
> be reverted.  Any contrary info?
> https://github.com/apache/nifi/pull/4242#issuecomment-622654527
>
>
> Maintenance:
> We should find a way for vendors to offer extension points like this
> without having to take on the burden of maintenance. How can we possibly do
> this well?  We're learning this lesson the hard way in NiFi itself and this
> is why the registry is being formulated.
>
> JIRA/hygiene:
> https://issues.apache.org/jira/browse/MINIFICPP-1199
> and
> https://issues.apache.org/jira/browse/MINIFICPP-1201
> Both are open.  Yet we've merged commits that claim to be against each.
>
> I believe both of these commits need to be reverted as I do not believe the
> licensing considerations have been addressed.  I'd like to see discussion
> on the above maintenance concern as well but that is less pressing.
>
>
> Thanks
> Joe
>
>
> commit 7206c62240647520cf35649868d5d87903a256c2
> Author: James Medel <[email protected]>
> Date:   Wed Apr 29 12:38:04 2020 -0700
>
>     MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766)
>
>     MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766)
>
> commit 6e5f96518764df7791519c0ee625a94a207ddc69
> Author: James Medel <[email protected]>
> Date:   Wed Apr 29 12:37:00 2020 -0700
>
>     MINIFI-1199: Integrate H2O Driverless AI PSP in MiNiFi (#763)
>
>     * MINIFI-1199: Integrate H2O Driverless AI in MiNiFi
>
>     MiNiFi C++ and H2O Driverless AI Integration via Custom Python
> Processors:
>     Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's
> Python Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the
> Python Processors to execute the Python Scoring Pipeline scorer to do batch
> scoring and real-time scoring for one or more predicted labels on test data
> in the incoming flow file content. I would like to contribute my processors
> to MiNiFi C++ as a new feature.
>
>     * Update H2oPspScoreBatches processor
>
>     This update includes passing "index=False" to
> "batch_scores_df.to_string(index=False)". By updating this line of code, we
> tell pandas to not include the DataFrame index when converting the
> DataFrame to a string. The reason for this update is that we don't want
> this extra column pandas tries to add to our predictions frame, we only
> want the predictions. Thus, later when the predictions get saved to a csv
> file, it will only include the predictions.
>
>     * Moved ConvertDsToCsv to h2o base dir
>
>     Since this ConvertDsToCsv python processor is used by the
>     Python Scoring Pipeline and MOJO Scoring Pipeline processors,
>     I moved ConvertDsToCsv to h2o base dir for easier access to it.
>

Re: MiNiFi commits for h2o

Reply via email to