Re: DataStreamReader cleanSource option

2022-02-03 Thread Jungtaek Lim
Hi, Could you please set the config "spark.sql.streaming.fileSource.cleaner.numThreads" to 0 and see whether it works? (NOTE: will slow down your process since the cleaning phase will happen in the foreground. The default is background with 1 thread. You can try out more threads than 1.) If it doe

Re: DataStreamReader cleanSource option

2022-01-27 Thread Mich Talebzadeh
Hi Gabriela, I don't know about data lake but this is about Spark Structured Streaming. Have both readStream and writeStream working OK, for example can you do df.printSchema() after read? It is advisable to wrap the logic inside try: This is an example of wrapping it data_path = "file://

DataStreamReader cleanSource option

2022-01-27 Thread Gabriela Dvořáková
Hi, I am writing to ask for advice regarding the cleanSource option of the DataStreamReader. I am using pyspark with Spark 3.1. via Azure Synapse. To my knowledge, cleanSource option was introduced in Spark version 3. I'd spent a significant amount of time trying to configure this option with both