Dear Spark Community, Is it possible to convert text files (.log or .txt files) into sequencefiles in Python?
Using PySpark I can create a parallelized file with rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile with rdd.saveAsSequenceFile(). But how can I put the whole content of my text files into the 'value' of 'key1' ? I want a sequencefile where the keys are the filenames of the text files and the values are their content. Thank you for any help! Csaba