Hi Stephan, I started working on this today, but I am having a problem

Can you be a little more detailed in the procedure?
actually I don’t understand how to give to the input format the list of URI 
since it will try putting it in a Path variable

createinputsplit does not receive the path but takes a path from that variable


Thanks,
Michele


Il giorno 26/giu/2015, alle ore 12:28, Michele Bertoni 
<michele1.bert...@mail.polimi.it<mailto:michele1.bert...@mail.polimi.it>> ha 
scritto:

Right!
later I will do the question and quoting your answer with the solution :)

Il giorno 26/giu/2015, alle ore 12:27, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> ha scritto:

Seems like a good idea to collect these questions.

Stackoverflow is also a good place for "useful tricks"...

On Fri, Jun 26, 2015 at 12:25 PM, Michele Bertoni 
<michele1.bert...@mail.polimi.it<mailto:michele1.bert...@mail.polimi.it>> wrote:
Got it!
i will try thanks! :)

What about writing a section of it in the programming guide?
I found a couple of topic about the readers in the mailing list, it seems it 
may be helpful



Il giorno 26/giu/2015, alle ore 12:21, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> ha scritto:

Sure, just override the "createInputSplits()" method. Call for each of your 
file paths "super.createInputSplits()" and then combine the results into one 
array that you return.

That should do it...

On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni 
<michele1.bert...@mail.polimi.it<mailto:michele1.bert...@mail.polimi.it>> wrote:
Hi Stephan, thanks for answering,
right now I am using an extension of the DelimitedInputFormat, is there a way 
to merge it with the option 2?



Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> ha scritto:

There are two ways you can realize that:

1) Create multiple sources and union them. This is easy, but probably a bit 
less efficient.

2) Override the FileInputFormat's createInputSplits method to take a union of 
the paths to create a list of all files and fils splits that will be read.

Stephan


On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni 
<michele1.bert...@mail.polimi.it<mailto:michele1.bert...@mail.polimi.it>> wrote:
Hi everybody,
is there a way to specify a list of URI (“hdfs://file1”,”hdfs://file2”,…) and 
open them as different files?
I know i may open the entire directory, but i want to be able to select a 
subset of files in the directory

thanks







Reply via email to