relative path in DataFrameWriter and DataStreamWriter

Rozov, Vlad Thu, 09 Jan 2025 13:57:35 -0800

Hi,

I see a difference in how “path" is handled in DataFrameWriter.save(path) and 
DataStreamWriter.start(path) while using relative path (for example 
“test.parquet") to write to parquet files (possibly applies to other file 
formats as well). In case of DataFrameWriter path is relative to the current 
working directory (of the driver). And this is what I would expect it to be. In 
the case of DataStreamWriter only _spark_metadata is written to the directory 
relative to the current working directory of the driver and parquet files are 
written to the directory that is relative to the executor directory. Is this a 
bug caused by relative path being passed to an executor as is or the behavior 
is by design? In the later case, what is the rationale?


I do understand that using relative path is not the best option especially in 
the distributed systems, but I think that relative path is still commonly used 
for testing and prototyping (and in examples).

Thank you,

Vlad

relative path in DataFrameWriter and DataStreamWriter

Reply via email to