Sorry for another post on this thread. I had an error in my pigscript that
had the wrong unicode character to split on. Using STRSPLIT worked well.
On Fri, Mar 21, 2014 at 8:46 AM, Jeff Storey wrote:
> Correction - it looks like the query uses \u002 to separate array elements
> and \u
,element2,element3)anotherfield
This loads properly when I use LOAD '/my/tsvfile' USING PigStorage('\t') AS
(elements:tuple(),afield:chararray);
On Fri, Mar 21, 2014 at 8:38 AM, Jeff Storey wrote:
> I'm executing a hive query in which one of the fields an array and
I'm executing a hive query in which one of the fields an array and writing
it to a file using:
INSERT OVERWRITE '/path/to/output' SELECT ...
This query works well. I would like to load this data into pig, but I'm
quite sure how to get the array properly into pig.
My output file from the query do
stop there though?
> Doesn't the outer query fetch the ids of the tags that the inner query
> identified?
>
>
>
> On Thu, Mar 20, 2014 at 9:54 AM, Jeff Storey wrote:
>
>> I don't think this quite fits here..I think the inner query will give me
>> a
values
> select
>count(*) as cnt,
>value
> from
> foo
> group by
> value
> having
>count(*) > 1
> ) z
> join foo a on (a.value = z.value)
> ;
>
> table foo is your table elements
>
I have a table with 10 million rows and 2 columns - id (int) and element
(string). I am trying to do a self join that finds any ids where the
element values are the same, and my query looks like:
select e1.id, e1.tag, e2.id as id2, e2.tag as tag2 from elements e1 JOIN
elements e2 on e1.element = e