Re: Processing audio/video/images

2014-06-19 Thread jamal sasha
So.. Here is my experimental code to get a feel of it def read_file(filename): with open(filename) as f: lines = [ line for line in f] return lines files = ["/somepath/.../test1.txt","sompath/.../test2.txt"] test1.txt has foo bar this is test1 test2.txt bar foo this is text2

Re: Processing audio/video/images

2014-06-02 Thread jamal sasha
Phoofff.. (Mind blown)... Thank you sir. This is awesome On Mon, Jun 2, 2014 at 5:23 PM, Marcelo Vanzin wrote: > The idea is simple. If you want to run something on a collection of > files, do (in pseudo-python): > > def processSingleFile(path): > # Your code to process a file > > files = [ "

Re: Processing audio/video/images

2014-06-02 Thread Marcelo Vanzin
The idea is simple. If you want to run something on a collection of files, do (in pseudo-python): def processSingleFile(path): # Your code to process a file files = [ "file1", "file2" ] sc.parallelize(files).foreach(processSingleFile) On Mon, Jun 2, 2014 at 5:16 PM, jamal sasha wrote: > Hi M

Re: Processing audio/video/images

2014-06-02 Thread jamal sasha
Hi Marcelo, Thanks for the response.. I am not sure I understand. Can you elaborate a bit. So, for example, lets take a look at this example http://pythonvision.org/basic-tutorial import mahotas dna = mahotas.imread('dna.jpeg') dnaf = ndimage.gaussian_filter(dna, 8) But except dna.jpeg Lets say

Re: Processing audio/video/images

2014-06-02 Thread jamal sasha
Thanks. Let me go thru it. On Mon, Jun 2, 2014 at 5:15 PM, Philip Ogren wrote: > I asked a question related to Marcelo's answer a few months ago. The > discussion there may be useful: > > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-URI-td1054.html > > > > On 06/02/2014 06:09 PM, Mar

Re: Processing audio/video/images

2014-06-02 Thread Philip Ogren
I asked a question related to Marcelo's answer a few months ago. The discussion there may be useful: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-URI-td1054.html On 06/02/2014 06:09 PM, Marcelo Vanzin wrote: Hi Jamal, If what you want is to process lots of files in parallel, the b

Re: Processing audio/video/images

2014-06-02 Thread Marcelo Vanzin
Hi Jamal, If what you want is to process lots of files in parallel, the best approach is probably to load all file names into an array and parallelize that. Then each task will take a path as input and can process it however it wants. Or you could write the file list to a file, and then use sc.te

Processing audio/video/images

2014-06-02 Thread jamal sasha
Hi, How do one process for data sources other than text? Lets say I have millions of mp3 (or jpeg) files and I want to use spark to process them? How does one go about it. I have never been able to figure this out.. Lets say I have this library in python which works like following: import audi