Hi Joe, When merging I did not use the NiFi processors to test as I already have tooling around H20 driven ML in my home. While I don't admittedly use it very often, I thought the python processors were quite cool and could certainly be useful for others. My rationale is that the dependencies are not built into source or binary for MiNiFi C++. I would agree on the Java side that this would be unavoidable. While it would be preferential to have the NiFi analogue I do not think it would be required when considering the MiNiFi C++ processors; however, James should be the arbiter of that decision.
While I disagree that the MiNiFi portion should be removed, I don't have a strong argument to keep or remove the MiNiFI C++ code so I'd be happy to revert if the community feels the need to. Thanks, Marc On Mon, May 4, 2020 at 10:50 AM Joe Witt <[email protected]> wrote: > Team > > The below two commits raise serious concerns. > > I want to be clear here and point out that h2o is cool. Having such > integration is a neat concept and idea and one that certainly warrants > determination on how best to do so. > > My issue is with these two commits as it relates to licensing and > maintenance. > > License: > We should support vendors like this wanting to bring their capabilities > into NiFi generally sure. But the licensing and mode of use is critical. > In talking about this with the contributor for NiFi as well it is clear > that at least some or important portions of this require the user to have a > 'driverless ai license' so they can include their jar or build pipelines or > whatnot. Thus it isn't usable without that first. So it might be the case > that this stuff is source dependent only on ASF compatible licenses in > terms of source - but it certainly doesn't seem to be true in terms of > binary dependencies. Where in the PR or JIRA is there any discussion or > review of the licensing? I see plenty from James to believe this needs to > be reverted. Any contrary info? > https://github.com/apache/nifi/pull/4242#issuecomment-622654527 > > > Maintenance: > We should find a way for vendors to offer extension points like this > without having to take on the burden of maintenance. How can we possibly do > this well? We're learning this lesson the hard way in NiFi itself and this > is why the registry is being formulated. > > JIRA/hygiene: > https://issues.apache.org/jira/browse/MINIFICPP-1199 > and > https://issues.apache.org/jira/browse/MINIFICPP-1201 > Both are open. Yet we've merged commits that claim to be against each. > > I believe both of these commits need to be reverted as I do not believe the > licensing considerations have been addressed. I'd like to see discussion > on the above maintenance concern as well but that is less pressing. > > > Thanks > Joe > > > commit 7206c62240647520cf35649868d5d87903a256c2 > Author: James Medel <[email protected]> > Date: Wed Apr 29 12:38:04 2020 -0700 > > MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766) > > MINIFI-1201: Integrate H2O Driverless AI MSP in MiNFi (#766) > > commit 6e5f96518764df7791519c0ee625a94a207ddc69 > Author: James Medel <[email protected]> > Date: Wed Apr 29 12:37:00 2020 -0700 > > MINIFI-1199: Integrate H2O Driverless AI PSP in MiNiFi (#763) > > * MINIFI-1199: Integrate H2O Driverless AI in MiNiFi > > MiNiFi C++ and H2O Driverless AI Integration via Custom Python > Processors: > Integrates MiNiFi C++ with H2O's Driverless AI by Using Driverless AI's > Python Scoring Pipeline and MiNiFi's Custom Python Processors. Uses the > Python Processors to execute the Python Scoring Pipeline scorer to do batch > scoring and real-time scoring for one or more predicted labels on test data > in the incoming flow file content. I would like to contribute my processors > to MiNiFi C++ as a new feature. > > * Update H2oPspScoreBatches processor > > This update includes passing "index=False" to > "batch_scores_df.to_string(index=False)". By updating this line of code, we > tell pandas to not include the DataFrame index when converting the > DataFrame to a string. The reason for this update is that we don't want > this extra column pandas tries to add to our predictions frame, we only > want the predictions. Thus, later when the predictions get saved to a csv > file, it will only include the predictions. > > * Moved ConvertDsToCsv to h2o base dir > > Since this ConvertDsToCsv python processor is used by the > Python Scoring Pipeline and MOJO Scoring Pipeline processors, > I moved ConvertDsToCsv to h2o base dir for easier access to it. >
