IGFS cache’s HDFS, as like any caching if the underlying store changes you can 
end up with a dirty read/inconsistent view, or you end up having to poll the 
original source, also if you want to pre-cache new data added to the underlying 
the same challenges applies.

This has already been noted a key issue for other tools such as indexers, oozie 
as such a solution has been already implemented in HDFS called iNotify under 
https://issues.apache.org/jira/browse/HDFS-6634 
<https://issues.apache.org/jira/browse/HDFS-6634> 

The idea/proposal here is that IGFS extended to be able to support underlying 
secondary file system updates, with the intent to first support Hadoop File 
system, HDFS iNotify and being able to keep IGFS up to date to underlying file 
system changes and future idea of being able to configure to pre-cache new 
files in certain dirs, such as newly ingested data.

Reply via email to