Hi there, I was having a look on minifi and while the agent is truly an amazing to the NiFi family, playing with it brought back some terrible memories from the past... :-)
In particular the way TailFile is only able to handle fully qualified file paths, instead of adopting a wildcard approach. This issue was raised as part of NIFI-1170 and now with minifi going live I was wondering, should we revisit this? The use case is fairly simple and common place. Many(all) log producers will create dynamically named files following pre-defined naming conventions such as: $ find /tmp/log /tmp/log /tmp/log/host1-2016-05-04.log /tmp/log/firewall1 /tmp/log/firewall1/2016-05-04.log /tmp/log/host3-2016-05-04.log /tmp/log/host2-2016-05-04.log /tmp/log/router1 /tmp/log/router1/2016-05-04.log /tmp/log/host1-2016-05-02.log.gz For users, the main advantage of using such structure is to avoid having to rotate the logs (and HUP or truncate files), instead the log producer dynamically opens a new file and starts writing straight to it, simplifying the life of processes relying on those files. The file matching strategy I had in mind is something simple where a single capture group is used as a base for the filename. public class Main { final static String filenameRegEx = "/tmp/log/((?:[^/]+)?(?:/)?(?:host\\d)?(?:-)?\\d+-\\d+-\\d+.log)"; final static Pattern p = Pattern.compile(filenameRegEx); public static void main(String[] args) throws IOException { String directory = "/tmp/log/"; final File tailDir = new File(directory); listEntry(tailDir.toPath()); } public static void listEntry(Path path) throws IOException { try (final DirectoryStream<Path> dirStream = Files.newDirectoryStream(path)) { for (final Path entry : dirStream) { if (Files.isDirectory(entry)) { listEntry(entry); } if (Files.isRegularFile(entry)) { Matcher m = p.matcher(entry.toString()); if (m.find()) { System.out.print(m.group(1).toString() + "\n"); } } } } } } Resulting in: host1-2016-05-04.log firewall1/2016-05-04.log host3-2016-05-04.log host2-2016-05-04.log router1/2016-05-04.log host1-2016-05-02.log While the logic of file matching is not particularly concerning, I wonder: 1. Should adjust TailFile or to create another processor (e.g. TailDir)? 2. I would also be keen on hearing from you suggestions on how to handle parallel tailing of multiple files. From what I gather, tailFile will not tail the older file. Instead it will either chomp it fully (in case no state exists) or to seek to the last known position, validate and then chomp the remaining data. Cheers