I was getting the following error without it:-
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /.gz.parquet (inode ): File does not exist. [Lease. Holder:
DFSClient_NONMAPREDUCE_, pendingcreates: 1]
I think that is due to deadlock.
I am a bit curious: why is the synchronization on finalLock is needed ?
Thanks
> On Oct 23, 2015, at 8:25 AM, Anubhav Agarwal wrote:
>
> I have a spark job that creates 6 million rows in RDDs. I convert the RDD
> into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
> it