I think in this case actually the date does matter. Can you think this problem as a regression problem. Given two coordinates date and prices, we are actually trying to find minimum number of line segments that fit the data best. What I mean by "best fit to data" is actually the exact criteria you are looking for: the line segments are chosen such that the prices are close to the average price (which is the line segment representing the date range). A problem called sequence segmentation might be of use to you. Check this paper for detailed discussion of a DP solution;
http://scholar.google.com.tr/scholar?cluster=806851568786670722&hl=tr&as_sdt=2000 On Nov 10, 5:03 pm, leoV <[email protected]> wrote: > The dates are arbitrary subsets. not necessarily consecutive. You can > see it as an initial set with dates and prices, that need to be > published in a limited number of columns, with from-until date. The > offset from the individual price values referenced to the calculated > average of a period must be minimized. > > leo > > On 9 nov, 20:59, Gene <[email protected]> wrote: > > > > > > > > > Should the dates in the classes be consecutive? Or are the classes > > arbitrary subsets? > > > On Nov 9, 11:04 am, leoV <[email protected]> wrote: > > > > I need to classify a number (1000 to 10000) of key-value pairs in a > > > maximum number of classes (usually 25 to 50) in such way that the > > > deviations to the average of each class are minimized. > > > The keys are unique dates and the vale is a decimal numeric value. > > > Any hints or samples or references to methods? > > > > thx > > > > leo -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
