Hi Jesus! I appreciate the info on the unicode error. I might have missed it, but I also asked about the general microtask specifications. Here was my original inquiry: > And to clarify, my understanding is that the final result of this task > is an index of Xen data, with two types: commits and messages. > Each commit document should contain its original information > from git, plus the name of the branch it was developed in. And > should only the mbox messages which appear to be associated > with a specific commit exist in the final index? Is there some > key information in messages that is supposed to indicate the > association of a given commit with a git branch? I would be > grateful if you could specify the end goal a little more. :D
Yeah, so overall I'm not sure I understand the relationship of branches to the mailing list messages. Is this to be a simple string parsing task wherein I should scan the message body for the word "branch"? (I am guessing not ;P) I will be happy to get back on developing once I better grasp the goal! :) Thanks! Heather On Sun, Apr 16, 2017 at 4:23 PM, Jesus M. Gonzalez-Barahona < j...@bitergia.com> wrote: > On Thu, 2017-04-13 at 00:47 -0700, Heather Booker wrote: > > Hi, > > > > I submitted an application for this code review dashboard and > > would love to keep working on the microtask once I get some > > more info. :) > > Great! I answered your message, could you progress with the task? > > > I also came up with a general idea of how the project might be > > split up - any feedback on this would be welcome! I wrote: > > > > "As said by Jesus, the big picture of this project will be porting > > everything behind the current code review dashboard to use > > Grimoire Lab tools, from the current state of using > > MetricsGrimoire and custom scripts. I expect this would involve > > Perceval for analyzing data, and Grimoire Elk may be useful in > > further stages, or may be too general - this is something I would > > wish to explore. > > This project will also involve a migration from SQL to Elasticsearch > > - because I believe the relevant data is mostly / all available in > > places online, I am unsure whether this would need to be a direct > > migration. However, looking at the current SQL setup would be > > beneficial to understanding the desired format of the Elasticsearch > > indexes. > > I would love to dive into this project and have 3 main parts - > > getting > > data into ES, turning it into dashboard displays, and then fine > > tuning > > and perhaps augmenting the dashboard to improve its usefulness. > > Getting data into ES may seem simple but I believe that once it > > needs to be used for the dashboard, many realizations will pop up > > - thus I’d like to leave maybe 2-3 weeks for that first step, 6-7 > > weeks > > for the visualizations (which will include querying the data), and > > the > > final 3 weeks for touch ups and improvements." > > The plan could be sound, but would need some tweaks, once your skills > in Python are clear, which could be the main blocker for the first > stages. > > > Does this sound like an accurate summary and reasonable timeline? > > And I am guessing that from Jesus's involvement with the threads > > that Jesus would be the mentor, is that correct? :) > > Yes, I would be ;-) > > Jesus. > > > Thanks! > > > > Heather > > > > > > On Sun, Apr 9, 2017 at 9:50 PM, Heather Booker <heather.j.booker@gmai > > l.com> wrote: > > > Hi Jesus, > > > > > > While using the Elasticsearch python library > > > (https://elasticsearch-py.readthedocs.io/en/master/) to add mbox > > > messages to an index, I would get a UnicodeEncodeError: > > > "'utf-8' codec can't encode character '\udca0' in position 767: > > > surrogates not allowed". > > > > > > Investigating in Grimoire elk https://github.com/grim > > > oirelab/GrimoireELK/blob/96b00bc682485976104a6825ca63ae0 > > > 8639deacc/grimoire_elk/elk/mbox.py#L200 seems to show that > > > perhaps that tool instead uses Latin-1 encoding, but I found that > > > to then produce a serialization error (their custom error message: > > > "Unable to serialize %r (type: %s)"). I suppose this is because > > > now it's bytes; of course, converting back to string after encoding > > > just cycles back to the first error. > > > > > > As somewhat of a Python newbie I don't really know how to tackle > > > this! My thought atm is to splice the offending character out > > > of the message. > > > > > > And to clarify, my understanding is that the final result of this > > > task > > > is an index of Xen data, with two types: commits and messages. > > > Each commit document should contain its original information > > > from git, plus the name of the branch it was developed in. And > > > should only the mbox messages which appear to be associated > > > with a specific commit exist in the final index? Is there some > > > key information in messages that is supposed to indicate the > > > association of a given commit with a git branch? I would be > > > grateful if you could specify the end goal a little more. :D > > > > > > Thanks so much! > > > > > > Heather > > > > > > > > > > > > On Sat, Apr 8, 2017 at 10:02 AM, Jesus M. Gonzalez-Barahona <jgb@bi > > > tergia.com> wrote: > > > > On Fri, 2017-04-07 at 15:49 -0700, Heather Booker wrote: > > > > > Hi Jesus, > > > > > > > > > > Thanks for your reply! > > > > > > > > > > So about the task, instructions say after analyzing mboxes with > > > > > Perceval to > > > > > "store the resulting raw index in ElasticSearch" - what does > > > > raw > > > > > index mean? > > > > > > > > In this context, I mean "storing the JSON documents produced by > > > > Perceval in an ElasticSearch index, as such". ElasticSearch > > > > stores JSON > > > > documents, so it is just uploading the output of Perceval to it. > > > > > > > > > In terms of figuring out the elasticsearch structure, do I want > > > > an > > > > > index > > > > > (xen-devel mbox) with a type (message) and each object from the > > > > > perceval > > > > > output to be one document? Or should it be more fine-grained? > > > > > > > > Exactly. > > > > > > > > Saludos, > > > > > > > > Jesus. > > > > > > > > > Cheers, > > > > > > > > > > Heather > > > > > > > > > > On Thu, Apr 6, 2017 at 7:05 AM, Jesus M. Gonzalez-Barahona <jgb > > > > @biter > > > > > gia.com> wrote: > > > > > > On Wed, 2017-04-05 at 16:43 -0700, Heather Booker wrote: > > > > > > > Hi! > > > > > > > > > > > > > > I'd love to work on the Code Review Dashboard project for > > > > this > > > > > > round > > > > > > > of Outreachy. > > > > > > > > > > > > Great!! > > > > > > > > > > > > > Are the steps outlined > > > > > > > here http://markmail.org/message/7adkmords3imkswd still the > > > > first > > > > > > > contribution you'd like to see? > > > > > > > > > > > > Yes. > > > > > > > > > > > > > So is this a project that has been worked on in previous > > > > rounds > > > > > > of > > > > > > > GSOC/Outreachy also? > > > > > > > If so is there a place to find links to the previous > > > > participants > > > > > > > blogs? :) > > > > > > > > > > > > No. We had one participation at some point, but couldn't even > > > > start > > > > > > for > > > > > > personal reasons. There are some people considering working > > > > on this > > > > > > for > > > > > > this next round of Outreachy, however. You'll see their > > > > messages in > > > > > > this mailing list. > > > > > > > > > > > > > Should questions about how the specifications/completion of > > > > the > > > > > > > microtask be addressed to > > > > > > > IRC or this list? If IRC, which channel - #xen-opw or > > > > #metrics- > > > > > > > grimoire? On that note, I'm > > > > > > > curious why #metrics-grimoire is the listed channel on the > > > > > > project > > > > > > > page - are main contributors > > > > > > > involved in both projects? Or is it just because the Xen > > > > > > dashboard > > > > > > > doesn't have a channel? > > > > > > > > > > > > The code review is for the Xen project, but it is done with > > > > (I > > > > > > mean, > > > > > > the ssoftware used for it is) GrimoireLab, which for > > > > historical > > > > > > reasons > > > > > > uses the #metrics-grimoire channel. That's why it is likely > > > > that > > > > > > you > > > > > > find somebody from the project there. > > > > > > > > > > > > If you have questions, and find me around in IRC, please ping > > > > me. > > > > > > If > > > > > > I'm not available, please send an email message. > > > > > > > > > > > > Saludos, > > > > > > > > > > > > Jesus. > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > Heather > > > > > > > _______________________________________________ > > > > > > > Xen-devel mailing list > > > > > > > Xen-devel@lists.xen.org > > > > > > > https://lists.xen.org/xen-devel > > > > > > -- > > > > > > Bitergia: http://bitergia.com > > > > > > /me at Twitter: https://twitter.com/jgbarah > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Xen-devel mailing list > > > > > Xen-devel@lists.xen.org > > > > > https://lists.xen.org/xen-devel > > > > -- > > > > Bitergia: http://bitergia.com > > > > /me at Twitter: https://twitter.com/jgbarah > > > > > > > > > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > https://lists.xen.org/xen-devel > -- > Bitergia: http://bitergia.com > /me at Twitter: https://twitter.com/jgbarah > >
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel