wa-ooo edited a comment on pull request #92:
URL: https://github.com/apache/sqoop/pull/92#issuecomment-800087918


   > Hi hong ,
   > 
   > I've reviewed your changes (both Github and issues.apache.org), but TBH in 
the current state I'm concerned both about the intention of the change, and the 
correctness as well.
   > 
   > First of all:
   > Could you please provide a bit more detail around what performance gain do 
you expect from this change and how did you measure it? Could you please 
provide also some automated testcase which would show the effect of this gain, 
and would ensure we don't loose it in the future?
   > 
   > On the front of correctness:
   > SQOOP-3149 introduced the line you'd like to remove, and if I do remember 
correctly absolutely intentionally. Because of this reason:
   > Could you please provide automated test cases which ensures that 
SQOOP-3149 changes won't be undone by your change (so we keep the current 
correctness around NULL column updates)?
   > 
   > Many thanks in advance,
   > Attila Szabo
   ----------
   hi @maugly24 
          thk for review this pr
          our production environment was upgraded from CDH-5.13.0 to CDH-6.3.2, 
and it was found that the task of importing data from RDM into HBase in 6.3.2 
cluster took 3\~4 hours longer (\~ 50 million records). The record output in MR 
log was much more than that in 5.13.0. This problem can be difficult to detect 
when importing small tables, and the larger the data volume, the more 
significant the delay.So I compared the changes of Hbase-import-job in SQOOP 
between the two versions and found the problem here.
           I think this is an easy-fix for HBase developers, so there is not 
much description in the issue.
   This change is also easy to understand, since it was added to the 
mutationList when PUT was initialized, and no subsequent PUT needs to be added 
again. Otherwise, the PUT will be recorded repeatedly in the generated HFILE.
           I looked at SQOOP-3149, and there is no explanation for this line


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to