It wouldn't retrieve the user's path in a single string, but you could simply select the user id and current page, ordered by the timestamp.
It would require a second step to turn it into the single string path, so that might be a deal-breaker. --Tom On Wed, Oct 31, 2012 at 3:32 PM, Philip Tromans <philip.j.trom...@gmail.com> wrote: > You could use collect_set() and GROUP BY. That wouldn't preserve order > though. > > Phil. > > On Oct 31, 2012 9:18 PM, "qiaoresearcher" <qiaoresearc...@gmail.com> wrote: >> >> Hi all, >> >> here is the question. Assume we have a table like: >> >> ------------------------------------------------------------------------------------------------------------------------------ >> user_id || user_visiting_time || user_current_web_page || >> user_previous_web_page >> user 1 time (1,1) page 1 >> page 0 >> user 1 time (1,2) page 2 >> page 1 >> user 1 time (1,3 ) page 3 >> page 2 >> ..... ...... >> .... .... >> user n time (n,1) page 1 >> page 0 >> user n time (n,2) page 2 >> page 1 >> user n time (n,3) page 3 >> page 2 >> >> that is, in each row, we know the current web page that user is viewing, >> and we know the previous web page the user coming from >> >> now we want to generate a list for each user that recorded the complete >> path the user is taking: >> i.e., how can we use hive to generate output like: >> >> ------------------------------------------------------------------------------------------------ >> user 1 : page 1 page 2 page 3 page 4 .......... (till reach the >> beginning page of user 1) >> user 2: page 1 page 2 page 3 page 4 page 5 ....... ( till reach >> the beginning page of user 2) >> the web pages viewed by user 1 and user 2 might be different. >> >> can we generate this using hive? >> >> thanks,