Dear Spark Community,

Is it possible to convert text files (.log or .txt files) into
sequencefiles in Python?

Using PySpark I can create a parallelized file with
rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile
with rdd.saveAsSequenceFile(). But how can I put the whole content of my
text files into the 'value' of 'key1' ?

I want a sequencefile where the keys are the filenames of the text files
and the values are their content.

Thank you for any help!
Csaba

Reply via email to