On 6/15/12, Ruslan Al-Fakikh <ruslan.al-fak...@jalent.ru> wrote: > I didn't know InputFormat and LineReader could help, though I didn't > look at them closely. I was thinking about implementing a > Table-Generating Function (UDTF) if there is no an already implemented > solution.
Both is possible, InputFormat and/or UD(T)F. It all depends on what you need. I actually use both - in Input format I load lists of allowed values to check the data and in UDF I query some other database for values necessary only in some queries. Generally, I'd use InputFormat for situations where all jobs over given table would require the additional data from RDBMS. Oppositely, in situations where only few jobs out of many requires the RDBMS connection, I would use UDF. I think that the difference in performance between the two is rather small, if any. Also UDF is easier to write, so it might be the "weapon of choice", at least if you don't already use custom InputFormat. Jan