On Sun, Feb 09, 2020 at 07:01:05PM -0700, Sean Whitton wrote: > One key problem with the current workflow is that it makes it very > difficult to avoid reviewing identical files more than once. That would > be a big improvement.
(I was just talking with Michael about this several minutes ago.) Just leaking a part of my WIP work. My core data structure looks like this {path: [hash, stamp, username, status, annotation]} The "hash " field is a salted hash, calculated like this hash(data=read(path), salt=read(neighbor_license())) This data structure is a fine-grained (per-path level) "accept/reject" record. Each path is a node. The "status" of a tree can be automatically computed form its decendant nodes. When a package enters NEW again, files with matching hashes will automatically reuse the last status assigned by human user, where status = either "accept" or "reject". There are still many other aspects from which I can reduce time consumption for human and improve efficiency.