hive version:0.10.0 hive> from testpoker select transform(ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon) using 'calcpoker.py' as ldate,gameid,userid,pid,win,fold,allin,cardtype,cards ;
03/13/13 1009 185690475 8639 0 1 0 -1 NULL NULL NULL NULL NULL NULL NULL NULL NULL 03/13/13 1009 187270278 92030 0 1 0 -1 NULL NULL NULL NULL NULL NULL NULL NULL NULL 03/13/13 1009 184151687 8639 0 1 0 -1 NULL NULL NULL NULL NULL NULL NULL NULL NULL 03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39 NULL NULL NULL NULL NULL NULL NULL NULL 03/13/13 1009 180286243 92041 0 1 0 -1 NULL NULL NULL NULL NULL NULL NULL NULL NULL the last 8 NULLs is wrong data that I unexpected. I don't know where them come from and how to get rid of them. pls give me some advice. thanks. Andy ----- python file: [hbase@h46 hive-0.10.0]$ cat calcpoker.py #!/usr/bin/env python # coding:utf8 import sys import datetime def calcwin(): for line in sys.stdin: #line = line.strip() (ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon)=line.strip().split() win = '0' if fold=='1': print '%s %s %s %s %s %s %s %s %s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards) continue cw = [] if chipwon == "NULL": print '%s %s %s %s %s %s %s %s %s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards) continue #print "userid win ",userid cw=chipwon.split('|') chipwonv=0 roundbetv=int(roundbet) for v in cw: chipwonv += int(v.split(':')[1]) #print "chipwonv:%d,roundbet:%d"%(chipwonv,roundbetv) if chipwonv > roundbetv: win = '1' #print ' '.join([ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon]) print ' '.join([ldate,gameid,userid,pid,win,fold,allin,cardtype,cards]) #print '%s %s %s %s %s %s %s %s %s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards) calcwin() I test outside of the mapreduce, it's ok: [hbase@h46 hive-0.10.0]$ ./calcpoker.py 03/13/13 14:59:51 00000ab4 1009 185690475 8639 240 1 0 -1 NULL NULL 03/13/13 14:59:51 00000cb4 1009 187270278 92030 600 1 0 -1 NULL NULL 03/13/13 14:59:52 000003d8 1009 184151687 8639 600 1 0 -1 NULL NULL 03/13/13 14:59:52 00000ba8 1009 186012530 8593 154135 0 1 7 8|21|16|42|39 0:73250|1:60500|2:100135 03/13/13 14:59:52 00000a88 1009 180286243 92041 100 1 0 -1 NULL NULL 03/13/13 1009 185690475 8639 0 1 0 -1 NULL 03/13/13 1009 187270278 92030 0 1 0 -1 NULL 03/13/13 1009 184151687 8639 0 1 0 -1 NULL 03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39 03/13/13 1009 180286243 92041 0 1 0 -1 NULL the begin five lines is the source data on hdfs. the last five lines is the result that calcpoker printed.