[
https://issues.apache.org/jira/browse/NIFI-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804625#comment-15804625
]
ASF GitHub Bot commented on NIFI-2859:
--------------------------------------
Github user bbende commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1383#discussion_r94950985
--- Diff:
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java
---
@@ -176,7 +176,7 @@ private HDFSListing deserialize(final String
serializedState) throws JsonParseEx
// Build a sorted map to determine the latest possible entries
for (final FileStatus status : statuses) {
- if (status.getPath().getName().endsWith("_COPYING_")) {
+ if (status.getPath().getName().endsWith("_COPYING_") ||
status.getPath().getName().startsWith(".")) {
--- End diff --
I think we should try to be consistent with how ListFile works. It has a
property to apply a regex to filer filenames and the default value is anything
that doesn't start with a dot:
```
public static final PropertyDescriptor FILE_FILTER = new
PropertyDescriptor.Builder()
.name("File Filter")
.description("Only files whose names match the given regular
expression will be picked up")
.required(true)
.defaultValue("[^\\.].*")
.addValidator(StandardValidators.REGULAR_EXPRESSION_VALIDATOR)
.build();
```
This way the user can determine if they want dot files or not.
> List + Fetch HDFS processors are reading part files from HDFS
> -------------------------------------------------------------
>
> Key: NIFI-2859
> URL: https://issues.apache.org/jira/browse/NIFI-2859
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.0.0
> Reporter: Mahesh Nayak
> Assignee: Pierre Villard
>
> Create the following ProcessGroups
> GetFile --> PutHdfs --> PutFile
> ListHDFS --> FetchHdfs --> putFile
> 2. Now start both the processGroups
> 3. Write lots of files into HDFS so that ListHDFS keeps listing and FetchHdfs
> fetches.
> 4. An exception is thrown because the processor reads the part file from the
> putHdfs folder
> {code:none}
> java.io.FileNotFoundException: File does not exist:
> /tmp/HDFSProcessorsTest_visjJMcHORUwigw/.ycnVSpBOzEaoTWk_7f37d5af-d4a4-4521-b60d-c3c11ae19669
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
> {code}
> Note that eventually the file is copied to the output successfully, but at
> the same time there are some files in the failure/comms failure relationship
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)