On 1 March 2011 17:56, Dmitriy Ryaboy <[email protected]> wrote:
> Hi Dan,
> iirc, registering a jar does not put it on the Pig client classpath, it just
> tells Pig to ship the jar. You want to put it on the PIG_CLASSPATH before
> you invoke pig.

Perfect, that was exactly it. It's running now :)

Would it make sense for REGISTER to augment the classpath? Or maybe
better, for the error message to mention the role of PIG_CLASSPATH?

cheers,

Dan

> On Tue, Mar 1, 2011 at 5:57 AM, Dan Brickley <[email protected]> wrote:
>>
>> I'm trying to use InvokeForString to call a simple static method that
>> wraps http://mzsanford.github.com/twitter-text-java/docs/api/index.html
>> https://github.com/twitter/twitter-text-java ... specifically the
>> Extractor class extractURLs method.  In fact since the logical result
>> is a list of URLs perhaps I should be writing proper Pig-centric
>> wrapper that returns a tuple, but for now I thought a stringified list
>> would be ok for my immediate purposes. That purpose being pulling out
>> all the URLs from a corpus of tweets, so we can expand the bit.ly and
>> other short urls...
>>
>> So - I built the extra class (src below) and packaged it inside the
>> twitter-text jar, and verify it's in there and usable as follows:
>>
>> danbri$ java -cp
>> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar
>> tv.notube.TwitterExtractor "hello http://example.com/
>> http://example.org/ world"
>> URLs: [http://example.com/, http://example.org/]
>>
>> Then from the same directory, I try run this as a Pig job:
>>
>> tw06 = load '/user/danbri/twitter/tweets2009-06.tab.txt.lzo' AS (
>> when: chararray, who: chararray, msg: chararray);
>> REGISTER twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar;
>> DEFINE ExtractURLs InvokeForString('tv.notube.TwitterExtractor.urls',
>> 'String');
>> urls = FOREACH tw06 GENERATE ExtractURLs(msg);
>> x = SAMPLE urls 0.001;
>> dump x;
>>
>> ...but we don't get past InvokeForString,
>>
>> 2011-03-01 14:50:31,033 [main] ERROR org.apache.pig.tools.grunt.Grunt
>> - ERROR 1000: Error during parsing. could not instantiate
>> 'InvokeForString' with arguments '[tv.notube.TwitterExtractor.urls,
>> String]'
>> Details at logfile: /home/danbri/twitter/pig_1298987430385.log
>> ...->
>> Caused by: java.lang.reflect.InvocationTargetException
>> Caused by: java.lang.ClassNotFoundException: tv.notube.TwitterExtractor
>>
>> I checked that Pig is finding the jar by mis-spelling the filename in
>> the "REGISTER" line (which as expected causes things to fail earlier).
>> Also double-check that the class is in the jar,
>> danbri$ jar -tvf
>> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar | grep tv
>>     0 Tue Mar 01 12:03:04 CET 2011 tv/
>>     0 Tue Mar 01 12:03:04 CET 2011 tv/notube/
>>  1114 Tue Mar 01 13:40:30 CET 2011 tv/notube/TwitterExtractor.class
>>
>> ...so I'm finding myself stuck. I'm sure the answer is staring me in
>> the face, but I can't see it. Perhaps I should just do things properly
>> with "extends EvalFunc<String>" and return the tuples separately
>> anyway...
>>
>> Thanks for any pointers,
>>
>> Dan
>>
>>
>> package tv.notube;
>> import com.twitter.Extractor;
>> import java.util.List;
>> class TwitterExtractor {
>>
>>  public static void main (String[] args) {
>>    String in = args[0];
>>        System.out.println("URLs: " + urls(in));
>>  }
>>
>>  public static String urls(String tweet) {
>>    Extractor ex = new Extractor();
>>    List urls = ex.extractURLs(tweet);
>>    String o = urls.toString();
>>    return o;
>>  }
>> }
>
>

Reply via email to