Simplest way would be to merge the output files at the end of your job like:
hadoop fs -getmerge /output/dir/on/hdfs/ /desired/local/output/file.txt ​If you want to do it pro grammatically, then you can use the ​ FileUtil.copyMerge API ​.​ like: FileUtil.copyMerge(FileSystem of source(hdfs), /output-location, FileSystem of destination(hdfs), Path to the merged files /merged-ouput, true(to delete the original dir),null) Thanks Best Regards On Sat, Feb 14, 2015 at 2:18 AM, Su She <suhsheka...@gmail.com> wrote: > Thanks Akhil for the suggestion, it is now only giving me one part - xxxx. > Is there anyway I can just create a file rather than a directory? It > doesn't seem like there is just a saveAsTextFile option for > JavaPairRecieverDstream. > > Also, for the copy/merge api, how would I add that to my spark job? > > Thanks Akhil! > > Best, > > Su > > On Thu, Feb 12, 2015 at 11:51 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> For streaming application, for every batch it will create a new directory >> and puts the data in it. If you don't want to have multiple files inside >> the directory as part-xxxx then you can do a repartition before the saveAs* >> call. >> >> messages.repartition(1).saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, >> String.class, (Class) TextOutputFormat.class); >> >> >> Thanks >> Best Regards >> >> On Fri, Feb 13, 2015 at 11:59 AM, Su She <suhsheka...@gmail.com> wrote: >> >>> Hello Everyone, >>> >>> I am writing simple word counts to hdfs using >>> messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, >>> String.class, (Class) TextOutputFormat.class); >>> >>> 1) However, each 2 seconds I getting a new *directory *that is titled >>> as a csv. So i'll have test.csv, which will be a directory that has two >>> files inside of it called part-00000 and part 00001 (something like that). >>> This obv makes it very hard for me to read the data stored in the csv >>> files. I am wondering if there is a better way to store the >>> JavaPairRecieverDStream and JavaPairDStream? >>> >>> 2) I know there is a copy/merge hadoop api for merging files...can this >>> be done inside java? I am not sure the logic behind this api if I am using >>> spark streaming which is constantly making new files. >>> >>> Thanks a lot for the help! >>> >> >> >