from:"Richard Primera"

Spark Dataframe Writer _temporary directory

2018-01-28 Thread Richard Primera

In a situation where multiple workflows write different partitions of the same table. Example: 10 Different processes are writing parquet or orc files for different partitions of the same table foo, at /staging/tables/foo/partition_field=1,/staging/tables/foo/partition_field=2,/staging/tables/fo

Partition Dataframe Using UDF On Partition Column

2017-12-27 Thread Richard Primera

Greetings, In version 1.6.0, is it possible to write a partitioned dataframe into parquet format using a UDF function on the partition column? I'm using pyspark. Let's say I have a dataframe with coumn `date`, of type string or int, which contains values such as `20170825`. Is it possible to def