Hi all, 

I am new to pyspark streaming and I was following a tutorial I saw in the
internet
(https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py).
But I replaced the data input with an s3 directory path as:

lines = ssc.textFileStream("s3n://bucket/first/second/third1/")

When I run the code and upload a file to s3n://bucket/first/second/third1/
(such as s3n://bucket/first/second/third1/test1.txt), the file gets
processed as expected. 

Now I want it to listen to multiple directories and process files if they
get uploaded to any of the directories:
for example : [s3n://bucket/first/second/third1/,
s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/]

I tried to use the pattern similar to sc.TextFile as : 

lines = ssc.textFileStream("s3n://bucket/first/second/*/")

But this actually didn't work. Can someone please explain to me how I could
achieve my objective? 

thanks in advance !!!

in4maniac




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to