Thanks. I will try it!
On Mon, Mar 3, 2014 at 1:19 PM, Alonso Isidoro Roman <alons...@gmail.com>wrote: > Hi, i am a beginner too, but as i have learned, hadoop works better with > big files, at least with 64MB, 128MB or even more. I think you need to > aggregate all the files into a new big one. Then you must copy to HDFS > using this command: > > hadoop fs -put MYFILE /YOUR_ROUTE_ON_HDFS/MYFILE > > hadoop just copy MYFILE into hadoop distributed file system. > > Can i recommend you what i have done? go to BigDataUniversity.com and take > the Hadoop Fundamentals I course. It is free and very well documented. > > Regards > > Alonso Isidoro Roman. > > Mis citas preferidas (de hoy) : > "Si depurar es el proceso de quitar los errores de software, entonces > programar debe ser el proceso de introducirlos..." > - Edsger Dijkstra > > My favorite quotes (today): > "If debugging is the process of removing software bugs, then programming > must be the process of putting ..." > - Edsger Dijkstra > > "If you pay peanuts you get monkeys" > > > > 2014-03-03 12:10 GMT+01:00 goi cto <goi....@gmail.com>: > > Hi, >> >> I am sorry for the beginners question but... >> I have a spark java code which reads a file (c:\my-input.csv) process it >> and writes an output file (my-output.csv) >> Now I want to run it on Hadoop in a distributed environment >> 1) My inlut file should be one big file or separate smaller files? >> 2) if we are using smaller files, how does my code needs to change to >> process all of the input files? >> >> Will Hadoop just copy the files to different servers or will it also >> split their content among servers? >> >> Any example will be great! >> -- >> Eran | CTO >> > > -- Eran | CTO