> On Sept. 28, 2017, 6:16 p.m., Szabolcs Vasas wrote:
> > Hi Chris,
> > 
> > Thank you for your patch!
> > Generally it looks good to me however I am not that familiar with the 
> > mainframe world, can you please give us a bit more context about the 
> > datasets? As far as I understand from the code we have different data set 
> > types (e.g. sequential, partitioned) but is GDG and tape orthogonal 
> > properties to these?
> > I have seen that whitespace changes were introduced in many lines (they are 
> > marked with arrows and red bars in the review) can you please fix those as 
> > they make the review much harder?
> > 
> > Regards,
> > Szabolcs

Thanks for the review Szabolcs.

I'm not a mainframe expert myself, here's what I know from working with the 
mainframe team:-

Partitioned datasets - usually reside on disk only and not on tape, usually 
want to Sqoop all files in the folder. This was the default behaviour when this 
module was first written.
Sequential datasets - can reside on either disk or tape, usually just 1 file to 
Sqoop.
Generation Data Groups - can reside on either disk or tape, consists of 
multiple files, module only Sqoops the latest file. This dataset seems to 
behave like Syslog's log rolling.

In the above datasets, the FTP listing changes depending whether they reside on 
disk or tape. In the case of partitioned datasets, the FTP listing differs from 
Sequential and GDG. You can see samples for GDG and Sequential on disk and tape 
in TestMainframeFTPFileEntryParser, the partitioned datasets was before I 
started on the mainframe module so the default behaviour stays.

I have also cleaned up the whitespace in the latest revision, in any case, I 
have uploaded a new diff, hopefully it should address all of the feedback :)


> On Sept. 28, 2017, 6:16 p.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/util/TestMainframeFTPClientUtils.java
> > Lines 355 (patched)
> > <https://reviews.apache.org/r/62653/diff/3/?file=1838806#file1838806line355>
> >
> >     I would verify here that the mockFTPClient.listFiles() is invoked since 
> > (AFAIK) in this test case we want to make sure that for partitioned 
> > datasets we use the default FTP list parsing.

Thanks. I have put in a mock in the unit test that will throw the exception if 
initiateListParsing is called.


- Chris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62653/#review186571
-----------------------------------------------------------


On Sept. 28, 2017, 9:28 a.m., Chris Teoh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62653/
> -----------------------------------------------------------
> 
> (Updated Sept. 28, 2017, 9:28 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3225
>     https://issues.apache.org/jira/browse/SQOOP-3225
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> There were some cases where the FTP listing parser was not used for mainframe 
> datasets, only the default was used. This caused some datasets to not be seen 
> by Sqoop as the default parser couldn't find them. This patch addresses this 
> behaviour where it is used to parse FTP listings for sequential and gdg 
> datasets on disk and tape, only partitioned datasets are excluded as their 
> FTP listing looks very different and is handled by the default FTP listing 
> parser.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/util/MainframeFTPClientUtils.java f61b9838 
>   src/test/org/apache/sqoop/util/TestMainframeFTPClientUtils.java d87c75df 
> 
> 
> Diff: https://reviews.apache.org/r/62653/diff/3/
> 
> 
> Testing
> -------
> 
> Unit testing.
> Functional testing on a real mainframe with a previous combined patch.
> 
> 
> Thanks,
> 
> Chris Teoh
> 
>

Reply via email to