You are right, the implementation needs a place holder here. The placeholder can probably be a "fake path", like "file:///this/will/never/be/read/anyways", because you override the "createSplits" method...
On Thu, Jul 16, 2015 at 12:03 AM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > uhm, it doesn’t seem to work: it calls the configure() method that checks > if filePath is null and throws an exception > Actually i set that field only during the createInputSplits that is some > steps later > > > > Il giorno 15/lug/2015, alle ore 13:16, Stephan Ewen <se...@apache.org> > ha scritto: > > If you want to work without the placeholder, simply do: "env.createInput(new > myDelimitedInputFormat(parser)(paths)) > > The "createInputSplits()" method looks good. > > Greetings, > Stephan > > > On Tue, Jul 14, 2015 at 11:42 PM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Ok thank you, now I solved it! >> >> >> The problem was in the env.readFile(myInputFormat, path) >> >> now that path is actually a list of paths what should I pass it? >> >> >> >> I solved in this way >> >> env.readFile(new myDelimitedInputFormat(parser)(paths), paths.head) >> >> where that paths.head gives to the read file a url that is just a >> “placeholder” and seems to be never used, and the custom input format takes >> care of creating the split out of the list of dir >> >> I tried and it works >> is it correct way to do that? :) >> >> >> >> fyi the create input split is implemented in this way >> >> override def createInputSplits(minNumSplits : Int) = { >> files.flatMap((f) => { >> super.setFilePath(f) >> super.createInputSplits(minNumSplits) >> }).toArray >> } >> >> where paths is a parameter of the input format constructor (as much as >> the custom parser as shown above) >> >> do you think it is useful if a open a stack overflow post of it (maybe >> with the custom parser too)? >> >> >> >> >> cheers >> michele >> >> >> Il giorno 14/lug/2015, alle ore 18:50, Stephan Ewen <se...@apache.org> >> ha scritto: >> >> For the approach that I outlined, you need to subclass of the file >> input format. >> >> In that subclass, you store the list of URIs (in a new variable), and >> override the "createInputSplits()" method. >> >> Stephan >> >> On Tue, Jul 14, 2015 at 6:42 PM, Michele Bertoni < >> michele1.bert...@mail.polimi.it> wrote: >> >>> Hi Stephan, I started working on this today, but I am having a problem >>> >>> Can you be a little more detailed in the procedure? >>> actually I don’t understand how to give to the input format the list of >>> URI since it will try putting it in a Path variable >>> >>> createinputsplit does not receive the path but takes a path from that >>> variable >>> >>> >>> Thanks, >>> Michele >>> >>> >>> Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> ha scritto: >>> >>> Right! >>> later I will do the question and quoting your answer with the solution :) >>> >>> Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen <se...@apache.org> >>> ha scritto: >>> >>> Seems like a good idea to collect these questions. >>> >>> Stackoverflow is also a good place for "useful tricks"... >>> >>> On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> wrote: >>> >>>> Got it! >>>> i will try thanks! :) >>>> >>>> What about writing a section of it in the programming guide? >>>> I found a couple of topic about the readers in the mailing list, it >>>> seems it may be helpful >>>> >>>> >>>> >>>> Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org> >>>> ha scritto: >>>> >>>> Sure, just override the "createInputSplits()" method. Call for each >>>> of your file paths "super.createInputSplits()" and then combine the results >>>> into one array that you return. >>>> >>>> That should do it... >>>> >>>> On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni < >>>> michele1.bert...@mail.polimi.it> wrote: >>>> >>>>> Hi Stephan, thanks for answering, >>>>> right now I am using an extension of the DelimitedInputFormat, is >>>>> there a way to merge it with the option 2? >>>>> >>>>> >>>>> >>>>> Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org> >>>>> ha scritto: >>>>> >>>>> There are two ways you can realize that: >>>>> >>>>> 1) Create multiple sources and union them. This is easy, but >>>>> probably a bit less efficient. >>>>> >>>>> 2) Override the FileInputFormat's createInputSplits method to take a >>>>> union of the paths to create a list of all files and fils splits that will >>>>> be read. >>>>> >>>>> Stephan >>>>> >>>>> >>>>> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni < >>>>> michele1.bert...@mail.polimi.it> wrote: >>>>> >>>>>> Hi everybody, >>>>>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…) >>>>>> and open them as different files? >>>>>> I know i may open the entire directory, but i want to be able to >>>>>> select a subset of files in the directory >>>>>> >>>>>> thanks >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > >