I have about 27,000 survey responses from across about 150 Bus Routes, each 
with potentially 100 stops.  I've recorded the total Ons and Offs for each stop 
on each bus run, as well as the stop pair each survey response corresponds to.

I wish to create weights based on the On and Off stop for each line and 
direction.  This will create a very sparse "half table" (observations by 
From/To) of responses to Rake.  I'm wondering if there is any good "mechanical" 
method for combining Ons and Offs into groups to help "fill out the table."   I 
wish to be sensitive to "distance travelled" when combining pairs.  That is 
when grouping Ons and Offs I want to minimize the range of "stations traversed" 
(actually the range of time on the bus) within each group.

One potential approach to avoid greatly aggregating the data is to create 
"Pseudo-responses" for the missing interchanges and seed the weight table with 
very small values before Raking.  When Raking is done I would scale the weights 
of actual responses to account for the dropped weights of the Pseudo-responses.

Is there any prior art available for me to review?  There are a huge number of 
groupings to be done, so I'm hoping for an algorithm or process that could 
automatically group the stops and report when it fails to find sufficient "near 
neighbors" to aggregate.



Thanks in advance,


Robert Farley
LACMTA
1 Gateway Plaza
Los Angeles, CA 90012-2952
(213)922-2532


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to