Re: Load Hive query result with array field into pig

2014-03-21 Thread Jeff Storey
Sorry for another post on this thread. I had an error in my pigscript that had the wrong unicode character to split on. Using STRSPLIT worked well. On Fri, Mar 21, 2014 at 8:46 AM, Jeff Storey wrote: > Correction - it looks like the query uses \u002 to separate array elements > and \u

Re: Load Hive query result with array field into pig

2014-03-21 Thread Jeff Storey
,element2,element3)anotherfield This loads properly when I use LOAD '/my/tsvfile' USING PigStorage('\t') AS (elements:tuple(),afield:chararray); On Fri, Mar 21, 2014 at 8:38 AM, Jeff Storey wrote: > I'm executing a hive query in which one of the fields an array and

Load Hive query result with array field into pig

2014-03-21 Thread Jeff Storey
I'm executing a hive query in which one of the fields an array and writing it to a file using: INSERT OVERWRITE '/path/to/output' SELECT ... This query works well. I would like to load this data into pig, but I'm quite sure how to get the array properly into pig. My output file from the query do

Re: Improving self join time

2014-03-20 Thread Jeff Storey
stop there though? > Doesn't the outer query fetch the ids of the tags that the inner query > identified? > > > > On Thu, Mar 20, 2014 at 9:54 AM, Jeff Storey wrote: > >> I don't think this quite fits here..I think the inner query will give me >> a

Re: Improving self join time

2014-03-20 Thread Jeff Storey
values > select >count(*) as cnt, >value > from > foo > group by > value > having >count(*) > 1 > ) z > join foo a on (a.value = z.value) > ; > > table foo is your table elements >

Improving self join time

2014-03-20 Thread Jeff Storey
I have a table with 10 million rows and 2 columns - id (int) and element (string). I am trying to do a self join that finds any ids where the element values are the same, and my query looks like: select e1.id, e1.tag, e2.id as id2, e2.tag as tag2 from elements e1 JOIN elements e2 on e1.element = e