[11:24] <vr34> Hi! [11:24] <vr34> This is Vaishnavi, outreachy applicant [11:25] <vr34> i found a python implementation for the jwz threading algo [11:25] <jgbarah> Hi, Vaishnavi [11:25] <vr34> https://github.com/akuchling/jwzthreading/blob/master/jwzthreading.py [11:25] <vr34> so what this does is allots a message id for each of the threads, right? [11:26] <jgbarah> That's good. Use it as an inspiration if that suits you. But you need to write your code... [11:26] <vr34> Oh okay [11:27] <jgbarah> However, since this is a microtask, no problem if you sttart with a version which uses this code [11:27] <vr34> what i have done till now is written a script that parses cmd line args and parses and uploads mbox json docs to es [11:27] <jgbarah> the main problem for using it *as such* is that very likely it is suboptimal, since it assumes you have access to all messages [11:28] <vr34> now what's left is - use the threading algo to get message ids and add this to the json documents and then upload again, am i right? [11:28] <jgbarah> which is our case is not real, since you have them in the database, and the idea would be to minimize traffic with it [11:28] <jgbarah> Yes, that is [11:28] <jgbarah> Wo, if you want, try to do it in two phases: [11:29] <jgbarah> in one, you can use the coe you found. Forget about efficiency, and just make it work [11:29] <jgbarah> In a second one, you can check if you can improve performance by using your own code. [11:29] <jgbarah> The first one will tell about how you reuse code, which is important [11:30] <jgbarah> The second one would tell about how you code the algorithm in a certain scenario [11:30] <jgbarah> Both are important... [11:30] <vr34> okay, got it! [11:30] <jgbarah> To be transparent to other pursuing for this project, please send a message to the mailing list, [11:30] <jgbarah> pointing to this implementation you found, and this conversation, please. [11:30] <jgbarah> A log of it would be enough. [11:30] <vr34> i had also sent you a mail with a link to my github repo [11:31] <vr34> Yes sure, will do [11:31] <jgbarah> Of course, the fact that you looked for, and found, that implmentation, will be credited to you [11:31] <jgbarah> I saw it (the message) but still didn't look at the code. Thanks. [11:32] <jgbarah> Are you stumbling on any blocker? [11:32] <vr34> sure, thanks, i'll mail you if i have any further queries [11:33] <vr34> i haven't yet started implementing the algo.. will definitely let you know when i have issues.. thanks a lot! [11:33] <jgbarah> Good. Thanks! Please, keep me updated. [11:33] <vr34> Sure.
On Fri, Apr 14, 2017 at 11:06 AM, Vaishnavi Ramesh Jayaraman < vaishnavi.ur...@gmail.com> wrote: > Hi, > > I have applied to Outreachy for the project - Xen Code Review Dashboard > and based on Jesus' suggestions I have made an initial contribution(There > are more changes to be made which I am still working on.) > > Link to the contribution - https://github.com/vrameshj/ > dashboard/blob/master/tests.py > > I have created a script that accepts the mbox link as a command line > argument, parses it and uploads the JSON documents that are obtained as an > output from Perceval to ElasticSearch. The results can be queried too. > > I am currently working on annotating the threads with their message ids. > > Also, below is the timeline of the work I plan to accomplish:- > > Month 1 - The work during the first month would be centered on getting > extensive information both from mailing lists and git repositories using > Perceval, and then storing it in ElasticSearch. > Month 2 - During the second month, scripts would have to be ported to use > ElasticSearch data instead of SQL. > Month 3- The task would be to improve the dashboard in Kibana. Various > visualizations like pie charts, bar charts and histograms could be added to > help understand the logs better. > > Thanks > Vaishnavi > > > >
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel