Hello, 

I've a question regarding a use case. 
I have an ETL using spark and working great. 
I use cephFS mounted on all spark node to store data. 
However one problem I have is that b2zipping + transfer from source to spark 
storage is really long. 
I would like to be able to process the file as it's written by chunk of 100MB. 
Is there something like that possible in Spark or do I need to use spark 
streaming, and if using spark streaming would it mean my application would need 
to run as a daemon on the spark node ? 

Thank you for your help and ideas. 
Antoine 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to