For the approach that I outlined, you need to subclass of the file input
format.

In that subclass, you store the list of URIs (in a new variable), and
override the "createInputSplits()" method.

Stephan

On Tue, Jul 14, 2015 at 6:42 PM, Michele Bertoni <
michele1.bert...@mail.polimi.it> wrote:

>  Hi Stephan, I started working on this today, but I am having a problem
>
>  Can you be a little more detailed in the procedure?
> actually I don’t understand how to give to the input format the list of
> URI since it will try putting it in a Path variable
>
>  createinputsplit does not receive the path but takes a path from that
> variable
>
>
>  Thanks,
> Michele
>
>
>  Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni <
> michele1.bert...@mail.polimi.it> ha scritto:
>
>  Right!
> later I will do the question and quoting your answer with the solution :)
>
>  Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen <se...@apache.org>
> ha scritto:
>
>  Seems like a good idea to collect these questions.
>
>  Stackoverflow is also a good place for "useful tricks"...
>
> On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni <
> michele1.bert...@mail.polimi.it> wrote:
>
>> Got it!
>> i will try thanks! :)
>>
>>  What about writing a section of it in the programming guide?
>> I found a couple of topic about the readers in the mailing list, it seems
>> it may be helpful
>>
>>
>>
>>  Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen <se...@apache.org>
>> ha scritto:
>>
>>  Sure, just override the "createInputSplits()" method. Call for each of
>> your file paths "super.createInputSplits()" and then combine the results
>> into one array that you return.
>>
>>  That should do it...
>>
>> On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni <
>> michele1.bert...@mail.polimi.it> wrote:
>>
>>> Hi Stephan, thanks for answering,
>>> right now I am using an extension of the DelimitedInputFormat, is there
>>> a way to merge it with the option 2?
>>>
>>>
>>>
>>>  Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <se...@apache.org>
>>> ha scritto:
>>>
>>>  There are two ways you can realize that:
>>>
>>>  1) Create multiple sources and union them. This is easy, but probably
>>> a bit less efficient.
>>>
>>>  2) Override the FileInputFormat's createInputSplits method to take a
>>> union of the paths to create a list of all files and fils splits that will
>>> be read.
>>>
>>>  Stephan
>>>
>>>
>>> On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni <
>>> michele1.bert...@mail.polimi.it> wrote:
>>>
>>>> Hi everybody,
>>>> is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…)
>>>> and open them as different files?
>>>> I know i may open the entire directory, but i want to be able to select
>>>> a subset of files in the directory
>>>>
>>>> thanks
>>>
>>>
>>>
>>>
>>
>>
>
>
>

Reply via email to