Apologies in advance for the long email. If it's any consolation, it took even longer to write.
Background ========== In previous meetings [1][2], an adhoc group of debconf people talked about the ranking process for travel grants. Our rough concensus was (plagiarizing myself): 1) Financial need was at least as important as contributions in ranking people for travel sponsorship 2) Only financial need by itself is not sufficient for sponsorship. I propose formalising this as follows. Each applicant has a level of financial need (self assessment, see summit dropdown) and a score for contribution (subjective by the bursaries committee, hopefully following some to be decided guidelines; this score will be based on the "contributions to Debian" and "contributions to DebConf" fields in summit [3]). Ranking ======= If (a.need >= b.need) and (a.contribution > b.contribution) then it seems clear that 'a' should funded before 'b'. So rather than trying cleverly break ties in other cases, my proposal is to just apply apply this rule, to construct a directed graph with arcs a->b if a should be funded before 'b'. We first fund all the people with no incoming arcs, remove them from the graph, and repeat [4]. Example data, script ==================== This scheme doesn't lend itself well to implementing in a spreadsheet, so I wrote a simple python script to implement it. To see the prettier output, you can run python travel-rank.py --format=dot full.dat | dot -Tpdf > foo.pdf In case that is too much effort, I also attach the pdf file The data file is anonymised data from a previous year [1], to give some idea of how this would work in practice. Since we did not have numbers for financial need that year, I essentially made them up (actually I computed them according to a silly heuristic, but you may as well assume they are random). The file format of "full.dat" is ident, need, score, requested_amount # Comments, Further improvements a) This is quite different than sorting lexicograpically; you can see this because the test data is actually given in lexicographically sorted order, first by need, then by contribution. b) There is no relationship between the two ranking scales; I haven't played with a smaller scale for contributions, but as along as it preserves the same ordering as in this data (i.e. we don't create ties by rounding), the results should be the same. c) It would be reasonable to include lower thresholds for both values, either as a preprocessing step or as part of the ranking script. d) the output graph includes incremental cost and running total for each group e) The output graph leaves out many edges in order to increase readability. Only edges from one level to the next are included, since this is enough to enforce the ranking. f) You can, and probably should sanity check the graph against the test data, by following a path from the top level. Bugs happen. g) this is all experimental, and even if we agree to follow this approach, we should agree to trust the bursaries team to override the results if they don't make sense. h) In case we need to fund a partial "cohort", we need some way to order the cohort. We could sort that cohort according to contribution score. E.g. if our budget is between 5000 and 6500, we would fund 19 but not 7. Or maybe a lottery, e.g. random permutation would be more fair. Congratulations. You either read the whole thing, or skipped to the end. [1]: http://meetbot.debian.net/debconf-team/2015/debconf-team.2015-01-08-20.01.html [2]: http://article.gmane.org/gmane.linux.debian.conference.team/12279 [3]: This idea of combining the two fields subjectively is new in this proposal, and comes out of working with cate to try and streamline the application process. [4]: this is a kind of "Topological Sort", for those interested. [5]: in particular the scores don't match the amounts requested by that person.
#!/usr/bin/python # Copyright 2015 David Bremner <brem...@debian.org> # Licensed under the WTFPL v2 import sys From decimal import * import argparse import copy class person: def __init__(self, ident, need, score, cost): self.ident = ident self.need = need self.score = score self.cost = cost # this function defines if 'self' should be definitely # ranked after 'other' def __lt__(self, other): return (self.need <= other.need) and \ (self.score < other.score) parser = argparse.ArgumentParser() parser.add_argument('--format', metavar = 'fmt', default = 'text', choices = ['text','dot'], help='{text, dot} (default text)') parser.add_argument('input', metavar = 'file', help = 'csv file') args=parser.parse_args() num_nodes = 0 nodes = {} # input format is # int, number, number, number with open(args.input,'r') as file: for line in file: num_nodes += 1 strs = line.split(',') ident = int(strs[0]) numbers = [ Decimal(s) for s in strs[1:] ] nodes[ident] = person(ident, *numbers) # we keep track of *inward* arcs, i.e. all nodes that # dominate a current node. arcs = {} for ident in nodes.iterkeys(): arcs[ident] = set() for other in nodes.iterkeys(): if nodes[ident] < nodes[other]: arcs[ident].add(other) # we make copies here because our ranking algorithm is # destructive remaining = nodes.copy() parents = copy.deepcopy(arcs) levels = {} count = 0 while len(remaining) != 0: # find nodes (people) not dominated by any remaining nodes. sources = set() for node in remaining: if len(parents[node]) == 0: sources.add(node) levels[count]=sources count += 1; # remove the most recently found set of sources from the # graph, and update the parent information of other nodes for node in sources: remaining.pop(node) for other_node in remaining: parents[other_node].discard(node) # From here on, it's all about generating pretty output total_cost = 0 if args.format == 'dot': print('digraph ranking {'); for index in levels.iterkeys(): level_cost=0 print('subgraph { rank = "same";') for ident in levels[index]: level_cost += nodes[ident].cost print("{0:d};".format(ident)) total_cost += level_cost; print 'level{0:d} [shape="record", label="{1:f}|{2:f}"];'.format(index,level_cost,total_cost) print('}') if index+1 in levels: print ('level{0:d} -> level{1:d};'.format(index,index+1)) for current in levels[index]: for child in levels[index+1]: if current in arcs[child]: print("{0:d} -> {1:d};".format(current,child)); print('}'); else: # format == text for index in levels.iterkeys(): print("{%s}\n" % ", ".join(str(x) for x in sorted(levels[index])))
full.dat
Description: Binary data
test.pdf
Description: Adobe PDF document
signature.asc
Description: PGP signature
_______________________________________________ Debconf-team mailing list Debconf-team@lists.debconf.org http://lists.debconf.org/mailman/listinfo/debconf-team