Hi,
I have a simple text file that is stored in HDFS which I use in a 
RichFilterFunction by way of DistributedCache file. The file is externally 
edited periodically to have other lines added to it. My FilterFunction also 
implements Runnable whose run method is run as a scheduleAtFixedRate method of 
ScheduledExectutorService which reloads the file and stores the results in a 
List in the Filter class.

I have realized the errors of my ways as the file that is reloaded is the 
cached file that is copied to temporary file location on the node which this 
instance of Filter class is loaded and not the file from HDFS directly (as this 
has been copied when the Flink job started.

Can anyone suggest a solution to this? It is I think a similar problem that Add 
Side Inputs in Flink [1] proposal is trying to address but not finalized yet.
Can anyone see a problem if I have a thread that reloads the HDFS file being in 
the main body of my Flink program and registers the cache file within that 
reload process e.g.

env.registerCachedFile(properties.getProperty("whitelist.location"), WHITELIST);

i.e. does this actually copy the file again from HDFS to temporary files on 
each node? I think I’d have to have the same schedule I have currently that 
reload within my Filter function too though as all the previous process would 
do is to push the HDFS file to temp location and not actually refresh my List.

Any suggestions would be welcome.

Thanks
Conrad

[1] 
https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit#heading=h.pqg5z6g0mjm7


SecureData, combating cyber threats
______________________________________________________________________ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT

Reply via email to