Hi I am trying to write a custom UDF in PIG to load a Video file. I am trying to extend class PigTextInputFormat and use my class and control its split and supply a custom record reader
As Video file is unstructured, i do not know where Video file would get split and if individual frames in Video file will cross the boundary in different Splits. Following are my queries a) If I want to split on my custom requirement. (I had overridden computeSplitSize and printed in it) . It is getting called because my command is getting printed , but it is not splitting as per my return value and it is splitting on block size only. Please guide me which function I need to override to control to split if I want. b) If I let data split at block size and last record of my unstructrued data cross boundary of splits and I supply my own RecordReader , then do I have to write special code in my custom RecordReader to fetch the remaining record (which crossed boundary in other split) from other split or will framework automatically handle it. Thanks and Regards Aniruddh
