pySpark - convert log/txt files into sequenceFile

Csaba Ragany Tue, 28 Oct 2014 07:48:19 -0700

Dear Spark Community,

Is it possible to convert text files (.log or .txt files) into
sequencefiles in Python?


Using PySpark I can create a parallelized file with
rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile
with rdd.saveAsSequenceFile(). But how can I put the whole content of my
text files into the 'value' of 'key1' ?

I want a sequencefile where the keys are the filenames of the text files
and the values are their content.

Thank you for any help!
Csaba

pySpark - convert log/txt files into sequenceFile

Reply via email to