We are considering whether Hive is the best choice for "sessionizing" a set of 
data given the following parameters:

*         Input data set:  A series of records with userID, startTimstamp, 
EndTimestamp, recordType, etc.

*         Output data set:  Same records (no aggregation) with an added 
SessionId based on time difference between endTime of previous record and 
startTime of current record plus satisfying other criteria of the type 
current.recordType = previousRecordType.  As long as a series of records meet 
the criteria for sessionization they would all have the same SessionId appended 
to each record.

Briefly based on my analysis it appears that this problem would be better 
suited to MapReduce using Java, but would be interested in hearing from those 
with more experience in this area.

J. B. Rawlings

Reply via email to