We are considering whether Hive is the best choice for "sessionizing" a set of data given the following parameters:
* Input data set: A series of records with userID, startTimstamp, EndTimestamp, recordType, etc. * Output data set: Same records (no aggregation) with an added SessionId based on time difference between endTime of previous record and startTime of current record plus satisfying other criteria of the type current.recordType = previousRecordType. As long as a series of records meet the criteria for sessionization they would all have the same SessionId appended to each record. Briefly based on my analysis it appears that this problem would be better suited to MapReduce using Java, but would be interested in hearing from those with more experience in this area. J. B. Rawlings