You don't really have to mess with that -- you can just have your UDF
initialized with the prefix file location.
So, your udf would have:
private String prefixPath;
// needed by Pig
public MyUDF() {}
// use this constructor
public MyUDF(String path) {
this.prefixPath = path;
}
// in the eval,
Thanks for the quick response.. I have some follow ups though :) --
Not quite as bad(computationally expensive) as a regular expression, just
something that would allow me to check String.startWith... but same basic idea
Prefixes is small enough to fit into memory, but it's not clear to me how
Basically you want to join on a regular expression, correct?
Unfortunately Map Reduce (and thus Pig) is spectacularly bad at non-
equijoins. Is 'prefixes' small enough to fit in memory? If so, you
could write a UDF that loaded it into memory and did the comparison.
This way the join woul
Hi all,
I have 2 data files. One which contains a number of records, and one which
contains a number of prefixes.
A = load 'data' AS (id, name)
B = load 'prefixes' AS (prefix)
I'd like to pull records in A whose name begins with prefix
The prefixes are of varying lengths
I've been scouring t