If you really only need to consider adjacent rows, it might just be easier to write a UDF or use streaming, where your code remembers the last record seen and emits a new record if you want to do the join with the current record.
On Sat, Feb 2, 2013 at 1:21 PM, Martijn van Leeuwen <icodesh...@gmail.com>wrote: > Hi all, > > I new to Apache Hive and I am doing some test to see if it fits my needs, > one of the questions I have if it is possible to "peek" for the next row in > order to find out if the values should be combined. Let me explain by an > example. > > Let say my data looks like this > > Id name offset > 1 Jan 100 > 2 Janssen 104 > 3 Klaas 150 > 4 Jan 160 > 5 Janssen 164 > > An my output to another table should be this > > Id fullname offsets > 1 Jan Janssen [ 100, 160 ] > > I would like to combine the name values from two rows where the offset of > the two rows are no more then 1 character apart. > > Is this type of data manipulation is possible and if it is could someone > point me to the right direction hopefully with some explaination? > > Kind regards > Martijn -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330