-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/#review147596
-----------------------------------------------------------



@lhaiesp: Your patch looks awesome. Happy to review again once you have 
addressed the comments. It will be great if you can add some unit test for 
HdfsSystemConsumer. Some of the documentation that you will have to include for 
this feature will be:
* Add newly introduced configs to configuration-table.html
* Add newly introduced metrics to metrics-table.html (Pending SAMZA-702 commit)
* Add a webpage for describing the behavior of HDFS systemconsumer (or more 
generically, consuming from Bounded input sources) and how to use the HDFS 
consumer

You can choose to keep the documentation as a part of this RB or create a 
follow-up JIRA for documentation and assign it to your self. Ideally, we don't 
want to have a lot of gap between code and documentation. 

Thanks for such a thorough work!


samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
(line 51)
<https://reviews.apache.org/r/51142/#comment214793>

    nit: these config keys can be private or packages-specific



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
(line 54)
<https://reviews.apache.org/r/51142/#comment214797>

    nit: typo in variable name "CONFOG"



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
(line 149)
<https://reviews.apache.org/r/51142/#comment214803>

    Isn't numTotalEventsCounter a sum of all counters in numEventsCounter in 
the map ? Do we want to maintain a running sum?



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
 (line 32)
<https://reviews.apache.org/r/51142/#comment214794>

    nit: remove unused import



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
 (line 38)
<https://reviews.apache.org/r/51142/#comment214795>

    You can add a private default constructor to ensure that the class doesn't 
get instantiated.



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
 (line 171)
<https://reviews.apache.org/r/51142/#comment214800>

    Question: Is the generateOldestOffset simply returning a string of "0" 
delimited by a comma? The number of "0" matches the number of files in the 
group?



samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/HdfsFileSystemAdapter.java
 (line 39)
<https://reviews.apache.org/r/51142/#comment214796>

    nit: assigned and ununsed
    You can get rid of the constructor here as well.



samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestDirectoryPartitioner.java
 (line 164)
<https://reviews.apache.org/r/51142/#comment214805>

    when using groupPattern, is it required for the name of the file represent 
the file length ? (in suffix string) Or did you just add it for testing?


- Navina Ramesh


On Aug. 29, 2016, 5:27 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51142/
> -----------------------------------------------------------
> 
> (Updated Aug. 29, 2016, 5:27 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-967
>     https://issues.apache.org/jira/browse/SAMZA-967
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Add HDFS System Consumer: 
> 
> 1. System admin, partitioner
> 2. System consumer with metrics
> 
> 
> Diffs
> -----
> 
>   build.gradle 1d4eb74b1294318db8454631ddd0901596121ab2 
>   gradle/dependency-versions.gradle 47c71bfde027835682889407261d4798b629d214 
>   samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
> PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
> PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/FileSystemAdapter.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/HdfsFileSystemAdapter.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/HdfsReaderFactory.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/MultiFileHdfsReader.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
> 92eb4472533db67dca01f075cb460581b4bdac0d 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
>  ef3c20a097ddf2feecaf8b0ad4587ea4bf6570b7 
>   
> samza-hdfs/src/test/java/org/apache/samza/system/hdfs/TestPartitionDesctiptionUtil.java
>  PRE-CREATION 
>   
> samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestDirectoryPartitioner.java
>  PRE-CREATION 
>   
> samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestHdfsFileSystemAdapter.java
>  PRE-CREATION 
>   
> samza-hdfs/src/test/java/org/apache/samza/system/hdfs/reader/TestAvroFileHdfsReader.java
>  PRE-CREATION 
>   samza-hdfs/src/test/resources/partitioner/testfile01 PRE-CREATION 
>   samza-hdfs/src/test/resources/partitioner/testfile02 PRE-CREATION 
>   
> samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
>  261310d03de204718621f601117f016da14841df 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJobFactory.scala 
> 4e328a5f8c2b496a71e36c106339b7af263c96c7 
> 
> Diff: https://reviews.apache.org/r/51142/diff/
> 
> 
> Testing
> -------
> 
> unit tests pass.
> 
> tested by writing a real hdfs samza job and deploying to hadoop cluster.
> 
> 
> Thanks,
> 
> Hai Lu
> 
>

Reply via email to