Simplest way would be issuing a os.system with HDFS rm command from driver,
assuming it has hdfs connectivity, like a gateway node. Executors will have
nothing to do with it.
On 12 Jun 2015 08:57, "Siegfried Bilstein" wrote:
> I've seen plenty of examples for creating HDFS files from pyspark but
I've seen plenty of examples for creating HDFS files from pyspark but
haven't been able to figure out how to delete files from pyspark. Is there
an API I am missing for filesystem management? Or should I be including the
HDFS python modules?
Thanks,
Siegfried