[Debconf-team] proposal for combining ranking factors for travel grants

David Bremner Sat, 21 Feb 2015 05:49:47 -0800

Apologies in advance for the long email. If it's any consolation, it
took even longer to write.


Background
==========

In previous meetings [1][2], an adhoc group of debconf people talked
about the ranking process for travel grants. Our rough concensus was
(plagiarizing myself):

 1) Financial need was at least as important as contributions in ranking
 people for travel sponsorship
 2) Only financial need by itself is not sufficient for sponsorship.

I propose formalising this as follows. Each applicant has a level of
financial need (self assessment, see summit dropdown) and a score for
contribution (subjective by the bursaries committee, hopefully following
some to be decided guidelines; this score will be based on the
"contributions to Debian" and "contributions to DebConf" fields in
summit [3]).

Ranking
=======

If 

         (a.need >= b.need)  and (a.contribution > b.contribution)

then it seems clear that 'a' should funded before 'b'. So rather than
trying cleverly break ties in other cases, my proposal is to just apply
apply this rule, to construct a directed graph with arcs a->b if a should
be funded before 'b'.  We first fund all the people with no incoming
arcs, remove them from the graph, and repeat [4].

Example data, script
====================

This scheme doesn't lend itself well to implementing in a spreadsheet,
so I wrote a simple python script to implement it.

To see the prettier output, you can run

python travel-rank.py --format=dot full.dat | dot -Tpdf > foo.pdf

In case that is too much effort, I also attach the pdf file
The data file is anonymised data from a previous year [1], to give some
idea of how this would work in practice.  Since we did not have
numbers for financial need that year, I essentially made them up
(actually I computed them according to a silly heuristic, but you may as
well assume they are random). The file format of "full.dat" is

ident, need, score, requested_amount

# Comments, Further improvements

a) This is quite different than sorting lexicograpically; you can see
   this because the test data is actually given in lexicographically
   sorted order, first by need, then by contribution.

b) There is no relationship between the two ranking scales; I haven't
   played with a smaller scale for contributions, but as along as it
   preserves the same ordering as in this data (i.e. we don't create
   ties by rounding), the results should be the same.

c) It would be reasonable to include lower thresholds for both values, either
   as a preprocessing step or as part of the ranking script.

d) the output graph includes incremental cost and running total for each
   group

e) The output graph leaves out many edges in order to increase
   readability. Only edges from one level to the next are included,
   since this is enough to enforce the ranking.

f) You can, and probably should sanity check the graph against the test
   data, by following a path from the top level. Bugs happen.

g) this is all experimental, and even if we agree to follow this
   approach, we should agree to trust the bursaries team to override the
   results if they don't make sense.

h) In case we need to fund a partial "cohort", we need some way to order
   the cohort. We could sort that cohort according to contribution
   score. E.g. if our budget is between 5000 and 6500, we would fund 19
   but not 7. Or maybe a lottery, e.g. random permutation would be more
   fair.
   
Congratulations. You either read the whole thing, or skipped to the end.

[1]: 
http://meetbot.debian.net/debconf-team/2015/debconf-team.2015-01-08-20.01.html
[2]: http://article.gmane.org/gmane.linux.debian.conference.team/12279
[3]: This idea of combining the two fields subjectively is new in this
     proposal, and comes out of working with cate to try and streamline the
     application process.
[4]: this is a kind of "Topological Sort", for those interested.
[5]: in particular the scores don't match the amounts requested by
     that person.

#!/usr/bin/python
# Copyright 2015 David Bremner <brem...@debian.org>
# Licensed under the WTFPL v2

import sys
From decimal import *
import argparse
import copy

class person:
    def __init__(self, ident, need, score, cost):
        self.ident = ident
        self.need = need
        self.score = score
        self.cost = cost

    # this function defines if 'self' should be definitely
    # ranked after 'other'
    
    def __lt__(self, other):
        return (self.need <= other.need) and \
            (self.score < other.score)

parser = argparse.ArgumentParser()

parser.add_argument('--format', metavar = 'fmt',
                    default = 'text',
                    choices = ['text','dot'],
                    help='{text, dot} (default text)')

parser.add_argument('input', metavar = 'file',
                    help = 'csv file')

args=parser.parse_args()

num_nodes = 0

nodes = {}

# input format is
#   int, number, number, number
with open(args.input,'r') as file:
    for line in file:
        num_nodes += 1
        strs = line.split(',')
        ident = int(strs[0])
        numbers = [ Decimal(s) for s in strs[1:] ]
        nodes[ident] = person(ident, *numbers)

# we keep track of *inward* arcs, i.e. all nodes that
# dominate a current node. 
arcs = {}

for ident in nodes.iterkeys():
    arcs[ident] = set()
    for other in nodes.iterkeys():
        if nodes[ident] < nodes[other]:
            arcs[ident].add(other)

# we make copies here because our ranking algorithm is
# destructive

remaining = nodes.copy()
parents = copy.deepcopy(arcs)

levels = {}
count = 0

while len(remaining) != 0:

    # find nodes (people) not dominated by any remaining nodes.
    sources = set()
    for node in remaining:
        if len(parents[node]) == 0:
            sources.add(node)

    levels[count]=sources
    count += 1;

    # remove the most recently found set of sources from the
    # graph, and update the parent information of other nodes
    
    for node in sources:
        remaining.pop(node)
        for other_node in remaining:
            parents[other_node].discard(node)

# From here on, it's all about generating pretty output
total_cost = 0
if args.format == 'dot':
    print('digraph ranking {');
    for index in levels.iterkeys():
        level_cost=0
        print('subgraph { rank = "same";')
        for ident in levels[index]:
            level_cost += nodes[ident].cost
            print("{0:d};".format(ident))
        total_cost += level_cost;
        print 'level{0:d} [shape="record", label="{1:f}|{2:f}"];'.format(index,level_cost,total_cost)

        print('}')
        if index+1 in levels:
            print ('level{0:d} -> level{1:d};'.format(index,index+1))
            for current in levels[index]:
                for child in levels[index+1]:
                    if current in arcs[child]:
                        print("{0:d} -> {1:d};".format(current,child));


    print('}');

else:
    # format == text
    for index in levels.iterkeys():
        print("{%s}\n" % ", ".join(str(x) for x in sorted(levels[index])))

full.dat
Description: Binary data

test.pdf
Description: Adobe PDF document

signature.asc
Description: PGP signature

_______________________________________________
Debconf-team mailing list
Debconf-team@lists.debconf.org
http://lists.debconf.org/mailman/listinfo/debconf-team

[Debconf-team] proposal for combining ranking factors for travel grants

Reply via email to