Re: Pig FILTER with INDEXOF not working

2011-04-22 Thread Aniket Mokashi
I think the fix is- tuple.set(0, new DataByteArray(url)); to tuple.set(0, url); Thanks, Aniket On Fri, April 22, 2011 8:30 pm, Steve Watt wrote: > Richard, if you're coming to OSCON or Hadoop Summit, please let me know > so I can buy you a beer. Thanks for the help. This now works for with the >

Re: Pig FILTER with INDEXOF not working

2011-04-22 Thread Dmitriy Ryaboy
If the expected return type of your loader is (String, String, String) you should just put Strings into the tuple (no conversion to DataByteArrays) and report your schema to Pig via an implementation of LoadMetadata.getSchema() D On Fri, Apr 22, 2011 at 5:30 PM, Steve Watt wrote: > Richard, if

Re: Pig FILTER with INDEXOF not working

2011-04-22 Thread Steve Watt
Richard, if you're coming to OSCON or Hadoop Summit, please let me know so I can buy you a beer. Thanks for the help. This now works for with the excite log using PigStorage(); It is however still not working with my custom LoadFunc and data. For reference, I am using Pig 0.8. I have written a cus

Re: Looking up two fields in a relation with another relation

2011-04-22 Thread Mridul Muralidharan
Hi Daniel, I did test to see see that it was fixed, and the description (as in the jira) did not directly seem to apply to this issue (when I did a cursory search) - hence the query. Since the columns were getting re-aliased (and after a join in one case), I was not expecting initial aliase

Re: Pig FILTER with INDEXOF not working

2011-04-22 Thread Richard Ding
raw = LOAD 'tutorial/excite.log' USING PigStorage('\t') AS (user, time, query:chararray); queries = FILTER raw BY (INDEXOF(query,'yahoo') >= 0); dump queries; On 4/22/11 2:25 PM, "Steve Watt" wrote: Hi Folks I've done a load of a dataset and I am attempting to filter out unwanted records by c

Pig FILTER with INDEXOF not working

2011-04-22 Thread Steve Watt
Hi Folks I've done a load of a dataset and I am attempting to filter out unwanted records by checking that one of my tuple fields contains a particular string. I've distilled this issue down to the sample excite.log that ships with Pig for easy recreation. I've read through the INDEXOF code and I

Re: Looking up two fields in a relation with another relation

2011-04-22 Thread Daniel Dai
Hi, Mridul, Sorry I was confused when you say "alias re-use" :). PIG-1705 happens if the same column is eventually used twice in a relation. Here in z {m::k, m::v, y::aa, y::data}, both m::k and y::aa can be traced back to m.k. I did tried PIG-1705 and verified that is the cause. The patch is n

Re: Getting the total Mapred time

2011-04-22 Thread Dmitriy Ryaboy
I may be misunderstanding what you are asking. The tricky part is measuring MR time *without* wait time, which one cannot control (it depends mostly on the size and utilization level of your cluster). This tricky bit is what PigStats helps you with. If you just want to measure the full time, includ

Re: Question about bags and UDFs

2011-04-22 Thread Mark Laczin
Follow-up question, how do you add it to the cache in a pig script, and once it's in there can you access it from the UDF using regular Java file I/O? That is, it is as simple as saying: copyFromLocal $localFilePath udfFile.txt DEFINE someudf org.someudf CACHE('udfFile.txt#udfFile.txt'); And the

Re: Question about bags and UDFs

2011-04-22 Thread Mark Laczin
I think I may have to go with your second option - but thanks for the info, I'll keep an eye on 0.9.0. On Thu, Apr 21, 2011 at 4:16 PM, Alan Gates wrote: > Starting with Pig 0.9 (not yet released but you can build it off the > branch) a UDF can specify a file to put in the distributed cache. Yo

Re: Looking up two fields in a relation with another relation

2011-04-22 Thread Mridul Muralidharan
Alias vs relation difference. The bug is about alias issue, not relation iirc. Everything comes from limited number of relations which are loaded anyway :-) - Mridul On Friday 22 April 2011 06:40 AM, Jianyong Dai wrote: m is actually reused. z is joining two relations both stemming from m.