Re: Finding records with a given prefix

2010-11-02 Thread Dmitriy Ryaboy
You don't really have to mess with that -- you can just have your UDF initialized with the prefix file location. So, your udf would have: private String prefixPath; // needed by Pig public MyUDF() {} // use this constructor public MyUDF(String path) { this.prefixPath = path; } // in the eval,

Re: Finding records with a given prefix

2010-11-02 Thread Joe Ciaramitaro
Thanks for the quick response.. I have some follow ups though :) -- Not quite as bad(computationally expensive) as a regular expression, just something that would allow me to check String.startWith... but same basic idea Prefixes is small enough to fit into memory, but it's not clear to me how

Re: Finding records with a given prefix

2010-11-02 Thread Alan Gates
Basically you want to join on a regular expression, correct? Unfortunately Map Reduce (and thus Pig) is spectacularly bad at non- equijoins. Is 'prefixes' small enough to fit in memory? If so, you could write a UDF that loaded it into memory and did the comparison. This way the join woul

Finding records with a given prefix

2010-11-02 Thread Joe Ciaramitaro
Hi all, I have 2 data files. One which contains a number of records, and one which contains a number of prefixes. A = load 'data' AS (id, name) B = load 'prefixes' AS (prefix) I'd like to pull records in A whose name begins with prefix The prefixes are of varying lengths I've been scouring t