patches accepted :-) D
On Tue, Mar 1, 2011 at 10:02 AM, Dan Brickley <[email protected]> wrote: > On 1 March 2011 17:56, Dmitriy Ryaboy <[email protected]> wrote: > > Hi Dan, > > iirc, registering a jar does not put it on the Pig client classpath, it > just > > tells Pig to ship the jar. You want to put it on the PIG_CLASSPATH before > > you invoke pig. > > Perfect, that was exactly it. It's running now :) > > Would it make sense for REGISTER to augment the classpath? Or maybe > better, for the error message to mention the role of PIG_CLASSPATH? > > cheers, > > Dan > > > On Tue, Mar 1, 2011 at 5:57 AM, Dan Brickley <[email protected]> wrote: > >> > >> I'm trying to use InvokeForString to call a simple static method that > >> wraps http://mzsanford.github.com/twitter-text-java/docs/api/index.html > >> https://github.com/twitter/twitter-text-java ... specifically the > >> Extractor class extractURLs method. In fact since the logical result > >> is a list of URLs perhaps I should be writing proper Pig-centric > >> wrapper that returns a tuple, but for now I thought a stringified list > >> would be ok for my immediate purposes. That purpose being pulling out > >> all the URLs from a corpus of tweets, so we can expand the bit.ly and > >> other short urls... > >> > >> So - I built the extra class (src below) and packaged it inside the > >> twitter-text jar, and verify it's in there and usable as follows: > >> > >> danbri$ java -cp > >> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar > >> tv.notube.TwitterExtractor "hello http://example.com/ > >> http://example.org/ world" > >> URLs: [http://example.com/, http://example.org/] > >> > >> Then from the same directory, I try run this as a Pig job: > >> > >> tw06 = load '/user/danbri/twitter/tweets2009-06.tab.txt.lzo' AS ( > >> when: chararray, who: chararray, msg: chararray); > >> REGISTER twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar; > >> DEFINE ExtractURLs InvokeForString('tv.notube.TwitterExtractor.urls', > >> 'String'); > >> urls = FOREACH tw06 GENERATE ExtractURLs(msg); > >> x = SAMPLE urls 0.001; > >> dump x; > >> > >> ...but we don't get past InvokeForString, > >> > >> 2011-03-01 14:50:31,033 [main] ERROR org.apache.pig.tools.grunt.Grunt > >> - ERROR 1000: Error during parsing. could not instantiate > >> 'InvokeForString' with arguments '[tv.notube.TwitterExtractor.urls, > >> String]' > >> Details at logfile: /home/danbri/twitter/pig_1298987430385.log > >> ...-> > >> Caused by: java.lang.reflect.InvocationTargetException > >> Caused by: java.lang.ClassNotFoundException: tv.notube.TwitterExtractor > >> > >> I checked that Pig is finding the jar by mis-spelling the filename in > >> the "REGISTER" line (which as expected causes things to fail earlier). > >> Also double-check that the class is in the jar, > >> danbri$ jar -tvf > >> twitter-text-1.3.1-plus-tv.notube.TwitterExtractor.jar | grep tv > >> 0 Tue Mar 01 12:03:04 CET 2011 tv/ > >> 0 Tue Mar 01 12:03:04 CET 2011 tv/notube/ > >> 1114 Tue Mar 01 13:40:30 CET 2011 tv/notube/TwitterExtractor.class > >> > >> ...so I'm finding myself stuck. I'm sure the answer is staring me in > >> the face, but I can't see it. Perhaps I should just do things properly > >> with "extends EvalFunc<String>" and return the tuples separately > >> anyway... > >> > >> Thanks for any pointers, > >> > >> Dan > >> > >> > >> package tv.notube; > >> import com.twitter.Extractor; > >> import java.util.List; > >> class TwitterExtractor { > >> > >> public static void main (String[] args) { > >> String in = args[0]; > >> System.out.println("URLs: " + urls(in)); > >> } > >> > >> public static String urls(String tweet) { > >> Extractor ex = new Extractor(); > >> List urls = ex.extractURLs(tweet); > >> String o = urls.toString(); > >> return o; > >> } > >> } > > > > >
