[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are reading part files from HDFS

ASF GitHub Bot (JIRA) Fri, 06 Jan 2017 06:19:27 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804625#comment-15804625
 ]


ASF GitHub Bot commented on NIFI-2859:
--------------------------------------

Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1383#discussion_r94950985
  
    --- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java
 ---
    @@ -176,7 +176,7 @@ private HDFSListing deserialize(final String 
serializedState) throws JsonParseEx
     
             // Build a sorted map to determine the latest possible entries
             for (final FileStatus status : statuses) {
    -            if (status.getPath().getName().endsWith("_COPYING_")) {
    +            if (status.getPath().getName().endsWith("_COPYING_") || 
status.getPath().getName().startsWith(".")) {
    --- End diff --
    
    I think we should try to be consistent with how ListFile works. It has a 
property to apply a regex to filer filenames and the default value is anything 
that doesn't start with a dot:
    
    ```
    public static final PropertyDescriptor FILE_FILTER = new 
PropertyDescriptor.Builder()
                .name("File Filter")
                .description("Only files whose names match the given regular 
expression will be picked up")
                .required(true)
                .defaultValue("[^\\.].*")
                .addValidator(StandardValidators.REGULAR_EXPRESSION_VALIDATOR)
                .build();
    ```
    This way the user can determine if they want dot files or not.


> List + Fetch HDFS processors are reading part files from HDFS
> -------------------------------------------------------------
>
>                 Key: NIFI-2859
>                 URL: https://issues.apache.org/jira/browse/NIFI-2859
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Mahesh Nayak
>            Assignee: Pierre Villard
>
> Create the following ProcessGroups
> GetFile --> PutHdfs --> PutFile
> ListHDFS --> FetchHdfs --> putFile
> 2. Now start both the processGroups
> 3. Write lots of files into HDFS so that ListHDFS keeps listing and FetchHdfs 
> fetches.
> 4. An exception is thrown because the processor reads the part file from the 
> putHdfs folder
> {code:none}
> java.io.FileNotFoundException: File does not exist: 
> /tmp/HDFSProcessorsTest_visjJMcHORUwigw/.ycnVSpBOzEaoTWk_7f37d5af-d4a4-4521-b60d-c3c11ae19669
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
> {code}
> Note that eventually the file is copied to the output successfully, but at 
> the same time there are some files in the failure/comms failure relationship



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are reading part files from HDFS

Reply via email to