Hi Mojca, On Mon, Apr 15, 2019 at 9:46 PM Mojca Miklavec <mo...@macports.org> wrote:
> Given the current state of the app with sufficient complexity, I > believe that it would be wise to introduce some unit tests to be able > to extensively test what happens with data you import, and to prevent > / detect any breakages in the future. > Thank you. Since, I am currently working on parsing of maintainers I began testing from maintainers only. It helped me make significant improvements to the code which extracted the maintainers ( added to the pull request : https://github.com/macports-gsoc/macports-gsoc-2019-webapp/pull/1 ). [update: this file has further changed since I updated the pull request, logic remains the same, just the JSON object structure has changed] I ran the tests and got desired results. I will show the final code and results in around 24 hours after I get done with my viva voce and extra classes, but below I am discussing the approach. Sorry, if this is not the right way or the presentation is not fine. I created five ports: 1. portA maintainers {@github gmail.com:test1} 2. portB maintainers {@github gmail.com:test2} SAME GITHUB, DIFFERENT EMAIL 3. portC maintainers {@newgithub gmail.com:test2} SAME EMAIL, DIFFERENT GITHUB 4. portD maintainers {gmail.com:test2} EMAIL REPEATED WITHOUT GITHUB 5. portE maintainers {@github} GITHUB REPEATED WITHOUT EMAIL I received 3 unique Github and Email pairs (according to the Logic[1] ) and I am considering each as a different maintainer. [ { "github": "github", "name": "test1", "domain": "gmail.com" }, { "github": "github", "name": "test2", "domain": "gmail.com" }, { "name": "test2", "domain": "gmail.com", "github": "newgithub" } ] Now to each maintainer I added all those ports which had GitHub or Email or both same as that of the unique maintainer. [ { "model": "ports.Maintainer", "pk": 0, "fields": { "github": "github", "name": "test1", "domain": "gmail.com", "ports": [ [ "portA", "portB", "portD" ] } }, { "model": "ports.Maintainer", "pk": 1, "fields": { "github": "github", "name": "test2", "domain": "gmail.com", "ports": [ [ "portA", "portB", "portD", "portC" "portE" ] } }, { "model": "ports.Maintainer", "pk": 2, "fields": { "name": "test2", "domain": "gmail.com", "github": "newgithub", "ports": [ [ "portE", "portB", "portC" ] } } ] For querying we can now use email/ GitHub and show all the ports for all the maintainers received. This should not break because of any inconsistency in the maintainer details. But there is one disadvantage- On the port-detail page, we will now show x maintainers, if the same maintainer provided x different pairs of GitHub and email. However this disadvantage might prove to be helpful in getting rid of the inconsistencies. Thank You [1] Currently I am using the following Logic for adding maintainers (comparing with already parsed maintainers) : - If neither the email nor GitHub is repeated: CREATE NEW - If the email and GitHub both are repeated: SKIP - If the email is repeated and not the GitHub handle (provided) : CREATE NEW with inconsistency flag - If the GitHub handle is repeated and not the email address (provided) : CREATE NEW with inconsistency flag - If the Github handle is repeated and email is not provided: SKIP - If the email address is repeated and GitHub is not provided: SKIP