Hi all, I am new to pyspark streaming and I was following a tutorial I saw in the internet (https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py). But I replaced the data input with an s3 directory path as:
lines = ssc.textFileStream("s3n://bucket/first/second/third1/") When I run the code and upload a file to s3n://bucket/first/second/third1/ (such as s3n://bucket/first/second/third1/test1.txt), the file gets processed as expected. Now I want it to listen to multiple directories and process files if they get uploaded to any of the directories: for example : [s3n://bucket/first/second/third1/, s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/] I tried to use the pattern similar to sc.TextFile as : lines = ssc.textFileStream("s3n://bucket/first/second/*/") But this actually didn't work. Can someone please explain to me how I could achieve my objective? thanks in advance !!! in4maniac -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org