Thanks! Regarding 1), where there is a UDF in the filter step on a partition field. The UDF is not first evaluated before and then the result passed to the load function?
A separate question: In a LoadFunc, is there a way to get a reference to the logical query plan? Thanks again. On Thu, Mar 14, 2013 at 1:51 PM, Rohini Palaniswamy <rohini.adi...@gmail.com> wrote: > Jeff, > > 1) It should not. If it does push, then it is a bug in pig. > > 2) I think it should be fine. > > 3) Look at PColFilterExtractor and PartitionFilterOptimizer > > Regards, > > Rohini > > > On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan <quaintena...@gmail.com> wrote: > >> I am writing a loader for a storage format, which partitions by a >> particular field in the record. So I would like to implement something >> which can push down filters on the partitioned field so that the >> record reader does not need to read files that are outside the >> filtered range. In the interface "LoadMetadata", the >> "getPartitionKeys" and "setPartitionFilter" functions seem to support >> what I need (where Pig should pass the filtering expression on the >> declared partition keys to "setPartitionFilter", but I have a couple >> of questions. I'm going to reference the following example, where >> timestamp is the partition key. >> >> a = load 'stored_data' using CustomLoader(); >> b = filter a by timestamp = CUSTOM_UDF(date, month); >> >> 1. Would partitioning work in this case where the partition key filter >> includes a UDF? >> >> 2. Does the partition statement need to be directly after the load >> statement? What I mean is, if I declare a variable c between a and b >> which does some other operation on a, will Pig pass the filter >> expression of b when loading a? >> >> 3. Can you point out roughly where this "setPartitionFilter" function >> is called in Pig code during the load process? I couldn't seem to find >> it through a search of the Pig source. >> >> Thanks a lot! >>