I don't think there is a better way to implement your query using the standard SQL/Hive.
A python reducer (or a java UDF) is the way to go. I don't think clustering would help since there is no way to specify what you want in HiveQL alone. igor decide.com On Thu, Jan 26, 2012 at 3:23 AM, <mdefoinplatel....@orange.com> wrote: > Dear all,**** > > I am struggling with a Hive query where I am trying to get the last value > for a column.**** > > ** ** > > Let say I have a table T with three columns: user_id, time, colour and I > want to know for each user_id what is its last colour value.**** > > ** ** > > At the moment I am using the following (naïve) query:**** > > ** ** > > SELECT T1.user_id, T1.lastTime, T2.colour FROM (**** > > SELECT user_id, max(Time) AS lastTime FROM T GROUP BY > user_id ) T1**** > > JOIN T T2 ON (T1.user_id=T2.user_id AND T2.time=T1.lastTime)* > *** > > **** > > I am not happy with this query of course since I have to read the table T > twice. **** > > ** ** > > So I have three questions:**** > > **1) **Is there a way to re-write this query in a more efficient way > ?**** > > ** ** > > **2) **Wouldn’t that be better to stream the output of a > distributed by on user_id into a python script and print out only one row > per user_id (assuming sorted by user_id and time) ?**** > > ** ** > > ** ** > > **3) **Couldn’t I create the table T with cluster by on user_id and > sorted by time at the first place and then use this structure to extract > the rows I am interested in ?**** > > ** ** > > ** ** > > Many thanks for your help,**** > > Michael**** > > _________________________________________________________________________________________________________________________ > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > France Telecom - Orange decline toute responsabilite si ce message a ete > altere, deforme ou falsifie. Merci > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorization. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, France Telecom - Orange shall not be liable if this > message was modified, changed or falsified. > Thank you. > >