Re: Processing .wav files in PySpark

2015-01-16 Thread Davies Liu
I think you can not use textFile() or binaryFile() or pickleFile() here, it's different format than wav. You could get a list of paths for all the files, then sc.parallelize(), and foreach(): def process(path): # use subprocess to launch a process to do the job, read the stdout as result fil

Processing .wav files in PySpark

2015-01-16 Thread Venkat, Ankam
I need to process .wav files in Pyspark. If the files are in local file system, I am able to process them. Once I store them on HDFS, I am facing issues. For example, I run a sox program on a wav file like this. sox ext2187854_03_27_2014.wav -n stats <-- works fine sox hdfs://xxx:8020/