Hi I have few tar files in HDFS in a single folder. each file has multiple files in it.
tar1:
- f1.txt
- f2.txt
tar2:
- f1.txt
- f2.txt
(each tar file will have exact same number of files, same name)
I am trying to find a way (spark or pig) to extract them to their own
folders.
f1
- tar1_f1.txt
- tar2_f1.txt
f2:
- tar1_f2.txt
- tar1_f2.txt
Any help?
--
Best Regards,
Ayan Guha
