Because the RemoveOrphanFilesAction uses Filesystem.list, the paths of
files found in the file system can have an authority included in them based
on the core-site.xml. This is determined
when listing the files so the entries stored in the metadata tables do not
necessarily have to match. URIs will have the same scheme and path but can
have a potentially
different authority. This means when doing a string matching join in Spark,
the files found on the system will not match those listed in the metadata
table and the
action will determine that the files are no longer required and delete
them. This leads to removing all the files that are listed with a different
authority.

This will only affect you if you have changed authorities between writing
and running RemoveOrphanFilesAction I believe.
We are doing more investigation but because of the potential for data loss
I thought it important to share with the dev-list.

If your authority has not changed, or will not change there should be no
issues.

Thanks for your time,
Russ

Reply via email to