Github user baolsen commented on the issue:
https://github.com/apache/nifi/pull/1615
Hi @bbende, my pleasure!
Thanks for the quick response.
The intention was to use the Get for another processor I am writing which
just needs single row lookups (didn't at first realise that FetchHBaseRow was
also doing single row lookups).
I had assumed that a Get would be more efficient than a Scan for fetching
single rows.
However, upon further reading it seems that the HBase client API uses a
Scan implementation for Gets as well.
https://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_scanning.html
There are some stack overflow questions regarding Get performance being
poorer than Scan, especially when using a key prefix in the scan as opposed to
a full rowkey.
https://www.quora.com/What-is-the-difference-between-get-and-scan-in-HBase
It's a little unclear what scenarios cause this performance difference, or
whether one approach is more performant in general eg. when the rowkey is a
full rowkey as in our case.
In summary, seeing as the HBase client API uses a Scanner under the hood
when doing a Get, there should be no real benefit to having a Get added to the
code (at least not without doing some practical benchmarks).
I'll use the Scan for my processor instead since it already has the
functionality I need. Will add the processor as a separate PR.
Can I close this PR, or does that need to be done on your side?
Also, I see that the automated checks have failed (looks like other
components' tests). Is this something I should worry about for my next PR? :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---