You are right, the implementation needs a place holder here.
The placeholder can probably be a "fake path", like
"file:///this/will/never/be/read/anyways", because you override the
"createSplits" method...

On Thu, Jul 16, 2015 at 12:03 AM, Michele Bertoni <
michele1.bert...@mail.polimi.it> wrote:

>  uhm, it doesn’t seem to work: it calls the configure() method that checks
> if filePath is null and throws an exception
> Actually i set that field only during the createInputSplits that is some
> steps later
>
>
>
>  Il giorno 15/lug/2015, alle ore 13:16, Stephan Ewen <se...@apache.org>
> ha scritto:
>
>  If you want to work without the placeholder, simply do: "env.createInput(new
> myDelimitedInputFormat(parser)(paths))
>
>  The "createInputSplits()" method looks good.
>
>  Greetings,
> Stephan
>
>
> On Tue, Jul 14, 2015 at 11:42 PM, Michele Bertoni <
> michele1.bert...@mail.polimi.it> wrote:
>
>> Ok thank you, now I solved it!
>>
>>
>>  The problem was in the env.readFile(myInputFormat, path)
>>
>>  now that path is actually a list of paths what should I pass it?
>>
>>
>>
>>  I solved in this way
>>
>>  env.readFile(new myDelimitedInputFormat(parser)(paths), paths.head)
>>
>>  where that paths.head gives to the read file a url that is just a
>> “placeholder” and seems to be never used, and the custom input format takes
>> care of creating the split out of the list of dir
>>
>>  I tried and it works
>> is it correct way to do that? :)
>>
>>
>>
>>  fyi the create input split is implemented in this way
>>
>>  override def createInputSplits(minNumSplits : Int) = {
>>     files.flatMap((f) => {
>>       super.setFilePath(f)
>>       super.createInputSplits(minNumSplits)
>>     }).toArray
>>   }
>>
>>  where paths is a parameter of the input format constructor (as much as
>> the custom parser as shown above)
>>
>>  do you think it is useful if a open a stack overflow post of it (maybe
>> with the custom parser too)?
>>
>>
>>
>>
>>  cheers
>>  michele
>>
>>
>>  Il giorno 14/lug/2015, alle ore 18:50, Stephan Ewen <se...@apache.org>
>> ha scritto:
>>
>>  For the approach that I outlined, you need to subclass of the file
>> input format.
>>
>>  In that subclass, you store the list of URIs (in a new variable), and
>> override the "createInputSplits()" method.
>>
>>  Stephan
>>
>> On Tue, Jul 14, 2015 at 6:42 PM, Michele Bertoni <
>> michele1.bert...@mail.polimi.it> wrote:
>>
>>> Hi Stephan, I started working on this today, but I am having a problem
>>>
>>>  Can you be a little more detailed in the procedure?
>>> actually I don’t understand how to give to the input format the list of
>>> URI since it will try putting it in a Path variable
>>>
>>>  createinputsplit does not receive the path but takes a path from that
>>> variable
>>>
>>>
>>>  Thanks,
>>> Michele
>>>
>>>
>>>  Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni <
>>> michele1.bert...@mail.polimi.it> ha scritto:
>>>
>>>  Right!
>>> later I will do the question and quoting your answer with the solution :)
>>>
>>>  Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen <se...@apache.org>
>>> ha scritto:
>>>
>>>  Seems like a good idea to collect these questions.
>>>
>>>  Stackoverflow is also a good place for "useful tricks"...
>>>
>>> On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni <
>>> michele1.bert...@mail.polimi.it> wrote:
>>>
>>>> Got it!
>>>> i will try thanks! :)
>>>>
>>>>  What about writing a section of it in the programming guide?
>>>> I found a couple of topic about the readers in the mailing list, it
>>>> seems it may be helpful
>>>>
>>>>
>>>>
>>>>  Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org>
>>>> ha scritto:
>>>>
>>>>  Sure, just override the "createInputSplits()" method. Call for each
>>>> of your file paths "super.createInputSplits()" and then combine the results
>>>> into one array that you return.
>>>>
>>>>  That should do it...
>>>>
>>>> On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni <
>>>> michele1.bert...@mail.polimi.it> wrote:
>>>>
>>>>> Hi Stephan, thanks for answering,
>>>>> right now I am using an extension of the DelimitedInputFormat, is
>>>>> there a way to merge it with the option 2?
>>>>>
>>>>>
>>>>>
>>>>>  Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org>
>>>>> ha scritto:
>>>>>
>>>>>  There are two ways you can realize that:
>>>>>
>>>>>  1) Create multiple sources and union them. This is easy, but
>>>>> probably a bit less efficient.
>>>>>
>>>>>  2) Override the FileInputFormat's createInputSplits method to take a
>>>>> union of the paths to create a list of all files and fils splits that will
>>>>> be read.
>>>>>
>>>>>  Stephan
>>>>>
>>>>>
>>>>> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni <
>>>>> michele1.bert...@mail.polimi.it> wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…)
>>>>>> and open them as different files?
>>>>>> I know i may open the entire directory, but i want to be able to
>>>>>> select a subset of files in the directory
>>>>>>
>>>>>> thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Reply via email to