Re: DataStreamReader cleanSource option

2022-02-03 Thread Jungtaek Lim
Hi, Could you please set the config "spark.sql.streaming.fileSource.cleaner.numThreads" to 0 and see whether it works? (NOTE: will slow down your process since the cleaning phase will happen in the foreground. The default is background with 1 thread. You can try out more threads than 1.) If it doe

Re: DataStreamReader cleanSource option

2022-01-27 Thread Mich Talebzadeh
Hi Gabriela, I don't know about data lake but this is about Spark Structured Streaming. Have both readStream and writeStream working OK, for example can you do df.printSchema() after read? It is advisable to wrap the logic inside try: This is an example of wrapping it data_path = "file://