If you want to work without the placeholder, simply do: "env.createInput(new myDelimitedInputFormat(parser)(paths))
The "createInputSplits()" method looks good. Greetings, Stephan On Tue, Jul 14, 2015 at 11:42 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Ok thank you, now I solved it! > > > The problem was in the env.readFile(myInputFormat, path) > > now that path is actually a list of paths what should I pass it? > > > > I solved in this way > > env.readFile(new myDelimitedInputFormat(parser)(paths), paths.head) > > where that paths.head gives to the read file a url that is just a > “placeholder” and seems to be never used, and the custom input format takes > care of creating the split out of the list of dir > > I tried and it works > is it correct way to do that? :) > > > > fyi the create input split is implemented in this way > > override def createInputSplits(minNumSplits : Int) = { > files.flatMap((f) => { > super.setFilePath(f) > super.createInputSplits(minNumSplits) > }).toArray > } > > where paths is a parameter of the input format constructor (as much as > the custom parser as shown above) > > do you think it is useful if a open a stack overflow post of it (maybe > with the custom parser too)? > > > > > cheers > michele > > > Il giorno 14/lug/2015, alle ore 18:50, Stephan Ewen <se...@apache.org> > ha scritto: > > For the approach that I outlined, you need to subclass of the file input > format. > > In that subclass, you store the list of URIs (in a new variable), and > override the "createInputSplits()" method. > > Stephan > > On Tue, Jul 14, 2015 at 6:42 PM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Hi Stephan, I started working on this today, but I am having a problem >> >> Can you be a little more detailed in the procedure? >> actually I don’t understand how to give to the input format the list of >> URI since it will try putting it in a Path variable >> >> createinputsplit does not receive the path but takes a path from that >> variable >> >> >> Thanks, >> Michele >> >> >> Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni < >> michele1.bert...@mail.polimi.it> ha scritto: >> >> Right! >> later I will do the question and quoting your answer with the solution :) >> >> Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen <se...@apache.org> >> ha scritto: >> >> Seems like a good idea to collect these questions. >> >> Stackoverflow is also a good place for "useful tricks"... >> >> On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni < >> michele1.bert...@mail.polimi.it> wrote: >> >>> Got it! >>> i will try thanks! :) >>> >>> What about writing a section of it in the programming guide? >>> I found a couple of topic about the readers in the mailing list, it >>> seems it may be helpful >>> >>> >>> >>> Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org> >>> ha scritto: >>> >>> Sure, just override the "createInputSplits()" method. Call for each of >>> your file paths "super.createInputSplits()" and then combine the results >>> into one array that you return. >>> >>> That should do it... >>> >>> On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> wrote: >>> >>>> Hi Stephan, thanks for answering, >>>> right now I am using an extension of the DelimitedInputFormat, is there >>>> a way to merge it with the option 2? >>>> >>>> >>>> >>>> Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org> >>>> ha scritto: >>>> >>>> There are two ways you can realize that: >>>> >>>> 1) Create multiple sources and union them. This is easy, but probably >>>> a bit less efficient. >>>> >>>> 2) Override the FileInputFormat's createInputSplits method to take a >>>> union of the paths to create a list of all files and fils splits that will >>>> be read. >>>> >>>> Stephan >>>> >>>> >>>> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni < >>>> michele1.bert...@mail.polimi.it> wrote: >>>> >>>>> Hi everybody, >>>>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…) >>>>> and open them as different files? >>>>> I know i may open the entire directory, but i want to be able to >>>>> select a subset of files in the directory >>>>> >>>>> thanks >>>> >>>> >>>> >>>> >>> >>> >> >> >> > >