You could use collect_set() and GROUP BY. That wouldn't preserve order though.
Phil. On Oct 31, 2012 9:18 PM, "qiaoresearcher" <qiaoresearc...@gmail.com> wrote: > Hi all, > > here is the question. Assume we have a table like: > > ------------------------------------------------------------------------------------------------------------------------------ > user_id || user_visiting_time || user_current_web_page || > user_previous_web_page > user 1 time (1,1) page 1 > page 0 > user 1 time (1,2) page 2 > page 1 > user 1 time (1,3 ) page 3 > page 2 > ..... ...... > .... .... > user n time (n,1) page 1 > page 0 > user n time (n,2) page 2 > page 1 > user n time (n,3) page 3 > page 2 > > that is, in each row, we know the current web page that user is viewing, > and we know the previous web page the user coming from > > now we want to generate a list for each user that recorded the complete > path the user is taking: > i.e., how can we use hive to generate output like: > > ------------------------------------------------------------------------------------------------ > user 1 : page 1 page 2 page 3 page 4 .......... (till reach the > beginning page of user 1) > user 2: page 1 page 2 page 3 page 4 page 5 ....... ( till reach > the beginning page of user 2) > the web pages viewed by user 1 and user 2 might be different. > > can we generate this using hive? > > thanks, >