Hi Lorenzo, what you could try to do is to derive your own InputFormat (extending FileInputFormat) where you set the field `unsplittable` to true. That way, an InputSplit is the whole file and you can handle the set of new rules as a single record.
Cheers, Till On Mon, Jun 29, 2020 at 3:52 PM Lorenzo Nicora <lorenzo.nic...@gmail.com> wrote: > Hi > > My streaming job uses a set of rules to process records from a stream. > The rule set is defined in simple flat files, one rule per line. > The rule set can change from time to time. A user will upload a new file > that must replace the old rule set completely. > > My problem is with reading and updating the rule set when I have a new one. > I cannot update single rules. I need the whole rule set to validate it and > build the internal representation to broadcast. > > I am reading the file with a *ContinuousFileReaderOperator* and > *InputFormat* (via env.readFile(...) and creating the internal > representation of the rule set I then broadcast. I get new files with > processingMode = PROCESS_CONTINUOUSLY > > How do I know when I have read ALL the records from a physical file, to > trigger validating and building the new Rule Set? > > I've been thinking about a processing-time trigger, waiting a reasonable > time after I read the first rule of a new file, but it does not look safe > if the user, for example, uploads two new files by mistake. > > Cheers > Lorenzo >