NIFI-1170 (ie TailDir )

Andre Wed, 06 Jul 2016 23:04:51 -0700

Hi there,

I was having a look on minifi and while the agent is truly an amazing to
the NiFi family, playing with it brought back some terrible memories from
the past... :-)


In particular the way TailFile is only able to handle fully qualified file
paths, instead of adopting a wildcard approach.

This issue was raised as part of NIFI-1170 and now with minifi going live I
was wondering, should we revisit this?

The use case is fairly simple and common place. Many(all) log producers
will create dynamically named files following pre-defined naming
conventions such as:


$ find /tmp/log
/tmp/log
/tmp/log/host1-2016-05-04.log
/tmp/log/firewall1
/tmp/log/firewall1/2016-05-04.log
/tmp/log/host3-2016-05-04.log
/tmp/log/host2-2016-05-04.log
/tmp/log/router1
/tmp/log/router1/2016-05-04.log
/tmp/log/host1-2016-05-02.log.gz



For users, the main advantage of using such structure is to avoid having to
rotate the logs (and HUP or truncate files), instead the log producer
dynamically opens a new file and starts writing straight to it, simplifying
the life of processes relying on those files.

The file matching strategy I had in mind is something simple where a single
capture group is used as a base for the filename.

public class Main {
    final static String filenameRegEx =
"/tmp/log/((?:[^/]+)?(?:/)?(?:host\\d)?(?:-)?\\d+-\\d+-\\d+.log)";
    final static Pattern p = Pattern.compile(filenameRegEx);

    public static void main(String[] args) throws IOException {
        String directory = "/tmp/log/";

        final File tailDir = new File(directory);

        listEntry(tailDir.toPath());
    }
    public static void listEntry(Path path) throws IOException {

        try (final DirectoryStream<Path> dirStream =
Files.newDirectoryStream(path)) {
            for (final Path entry : dirStream) {
                if (Files.isDirectory(entry)) {
                    listEntry(entry);
                }
                if (Files.isRegularFile(entry)) {
                    Matcher m = p.matcher(entry.toString());
                    if (m.find()) {
                        System.out.print(m.group(1).toString() + "\n");
                    }
                }
            }
        }
    }
}

Resulting in:

host1-2016-05-04.log
firewall1/2016-05-04.log
host3-2016-05-04.log
host2-2016-05-04.log
router1/2016-05-04.log
host1-2016-05-02.log

While the logic of file matching is not particularly concerning, I wonder:

1. Should adjust TailFile or to create another processor (e.g. TailDir)?

2. I would also be keen on hearing from you suggestions on how to handle
parallel tailing of multiple files.
    From what I gather, tailFile will not tail the older file. Instead it
will either chomp it fully (in case no state exists) or to seek to the last
known position, validate and then chomp the remaining data.



Cheers

NIFI-1170 (ie TailDir )

Reply via email to