Seems like a good idea to collect these questions. Stackoverflow is also a good place for "useful tricks"...
On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Got it! > i will try thanks! :) > > What about writing a section of it in the programming guide? > I found a couple of topic about the readers in the mailing list, it seems > it may be helpful > > > > Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org> > ha scritto: > > Sure, just override the "createInputSplits()" method. Call for each of > your file paths "super.createInputSplits()" and then combine the results > into one array that you return. > > That should do it... > > On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Hi Stephan, thanks for answering, >> right now I am using an extension of the DelimitedInputFormat, is there a >> way to merge it with the option 2? >> >> >> >> Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org> >> ha scritto: >> >> There are two ways you can realize that: >> >> 1) Create multiple sources and union them. This is easy, but probably a >> bit less efficient. >> >> 2) Override the FileInputFormat's createInputSplits method to take a >> union of the paths to create a list of all files and fils splits that will >> be read. >> >> Stephan >> >> >> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni < >> michele1.bert...@mail.polimi.it> wrote: >> >>> Hi everybody, >>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…) >>> and open them as different files? >>> I know i may open the entire directory, but i want to be able to select >>> a subset of files in the directory >>> >>> thanks >> >> >> >> > >