On Wed, Jan 29, 2014 at 11:52 AM, Ben Nemec <openst...@nemebean.com> wrote:
> Okay, I think you've convinced me. Specific comments below. > > -Ben > > On 2014-01-29 07:05, Doug Hellmann wrote: > > > On Tue, Jan 28, 2014 at 8:47 PM, Ben Nemec <openst...@nemebean.com> wrote: > >> On 2014-01-27 11:42, Doug Hellmann wrote: >> >> We have a blueprint open for separating translated log messages into >> different domains so the translation team can prioritize them differently >> (focusing on errors and warnings before debug messages, for example) [1]. >> Some concerns were raised related to the review [2], and I would like to >> address those in this thread and see if we can reach consensus about how to >> proceed. >> The implementation in [2] provides a set of new marker functions similar >> to _(), one for each log level (we have _LE, LW, _LI, _LD, etc.). These >> would be used in conjunction with _(), and reserved for log messages. >> Exceptions, API messages, and other user-facing messages all would still be >> marked for translation with _() and would (I assume) receive the highest >> priority work from the translation team. >> When the string extraction CI job is updated, we will have one "main" >> catalog for each app or library, and additional catalogs for the log >> levels. Those show up in transifex separately, but will be named in a way >> that they are obviously related. Each translation team will be able to >> decide, based on the requirements of their users, how to set priorities for >> translating the different catalogs. >> Existing strings being sent to the log and marked with _() will be >> removed from the main catalog and moved to the appropriate >> log-level-specific catalog when their marker function is changed. My >> understanding is that transifex is smart enough to recognize the same >> string from more than one source, and to suggest previous translations when >> it sees the same text. This should make it easier for the translation teams >> to "catch up" by reusing the translations they have already done, in the >> new catalogs. >> One concern that was raised was the need to mark all of the log messages >> by hand. I investigated using extraction patterns like "LOG.debug(" and >> "LOG.info(", but because of the way the translation actually works >> internally we cannot do that. There are a few related reasons. >> In other applications, the function _() translates a string at the point >> where it is invoked, and returns a new string object. OpenStack has a >> requirement that messages be translated multiple times, whether in the API >> or the LOG (there is already support for logging in more than one language, >> to different log files). This requirement means we delay the translation >> operation until right before the string is output, at which time we know >> the target language. We could update the log functions to create Message >> objects dynamically, except... >> Each app or library that uses the translation code will need its own >> "domain" for the message catalogs. We get around that right now by not >> translating many messages from the libraries, but that's obviously not what >> we want long term (we at least want exceptions translated). If we had a >> special version of a logger in oslo.log that knew how to create Message >> objects for the format strings used in logging (the first argument to >> LOG.debug for example), it would also have to know what translation domain >> to use so the proper catalog could be loaded. The wrapper functions defined >> in the patch [2] include this information, and can be updated to be >> application or library specific when oslo.log eventually becomes its own >> library. >> Further, as part of moving the logging code from oslo-incubator to >> oslo.log, and making our logging something we can use from other OpenStack >> libraries, we are trying to change the implementation of the logging code >> so it is no longer necessary to create loggers with our special wrapper >> function. That would mean that oslo.log will be a library for *configuring* >> logging, but the actual log calls can be handled with Python's standard >> library, eliminating a dependency between new libraries and oslo.log. (This >> is a longer, and separate, discussion, but I mention it here as backround. >> We don't want to change the API of the logger in oslo.log because we don't >> want to be using it directly in the first place.) >> Another concern raised was the use of a prefix _L for these functions, >> since it ties the priority definitions to "logs." I chose that prefix as an >> explicit indicate that these *are* just for logs. I am not associating any >> actual priority with them. The translators want us to move the log messages >> out of the main catalog. Having them all in separate catalogs is a >> refinement that gives them what they want -- some translators don't care >> about log messages at all, some only care about errors, etc. We decided >> that the translators should set priorities, and we would make that possible >> by separating the catalogs into logical groups. Everything marked with _() >> will still go into the main catalog, but beyond that it isn't up to the >> developers to indicate "priority" for translations. >> The alternative approach of using babel translator comments would, under >> other circumstances, help because each message could have some indication >> of its relative importance. However, it does not meet the requirement that >> the translators (and not the developers) set those priorities. It also >> doesn't help the translators because the main catalog does not shrink to >> hold only the user-facing messages. So the comments might be useful in >> addition to this proposed change, but they doesn't solve the original >> problem. >> If we all agree on the approach, I think the patches already in progress >> should be pretty easy to land in the incubator. The next step is to update >> the CI jobs that extract the messages and interact with transifex. After >> that, changes to the applications and existing libraries are likely to take >> longer, and could be done in batches. They may not happen until the next >> cycle, but I would like to have the infrastructure in place by the end of >> this one. >> Feedback? >> Doug >> [1] >> https://blueprints.launchpad.net/oslo/+spec/log-messages-translation-domain >> [2] https://review.openstack.org/#/c/65518/ >> >> I guess my thoughts are still largely the same as on the original >> review. This is already going to be an additional burden on developers and >> reviewers (who love i18n so much already ;-) and ideally I'd prefer that we >> be a little less granular with our designations. Something like _IMPORTANT >> and _OPTIONAL instead of separate translation domains for each individual >> log level. Maybe that can't get the translation load down to a manageable >> level though. I'm kind of guessing on that point. >> > > We did consider something like that at the summit, IIRC. However, we > wanted to leave the job of setting the priority for doing the translation > up to the translators, rather than the developers, because the priorities > vary by language. Using designators that match the log output level lowers > the review burden, because you don't have to think about the importance of > translation, only whether or not the translator tag matches the log > function. > > > Hmm, hadn't thought about it that way, but it does actually make more > work for reviewers. I guess that means I'm good with the 1:1 log > level:translation domain mapping. :-) > > I wonder if we could add something into our log wrappers to check that > Message domains match the log level in use. It wouldn't be able to catch > everything, but maybe we could turn it on in the gate and at least verify > anything that gets logged during those runs. Something to consider once > we've implemented this, I guess. > I like that. I found a few cases in the incubator where we have something like: msg = _('some text: %s') % args LOG.alevel(msg) raise Exception(msg) So we do have some known cases where the levels won't match. Because the message is used in an exception, I left it wrapped in _() for now, but we can think about whether that's the best approach once we have the basic pieces all working. > >> For reference, I grepped the nova source to see how many times we're >> logging at each of the different levels. It's a very rough estimate since >> I'm sure I'm missing some things and there are almost certainly some dupes, >> but I would expect it to be relatively close to reality. Here were the >> results: >> >> [fedora@openstack nova]$ grep -ri log.error | wc -l >> 190 >> [fedora@openstack nova]$ grep -ri log.warn | wc -l >> 286 >> [fedora@openstack nova]$ grep -ri log.info | wc -l >> 254 >> [fedora@openstack nova]$ grep -ri log.debug | wc -l >> 849 >> >> It seems like debug is the low-hanging fruit here - getting rid of that >> eliminates more translations than the rest of the log levels combined >> (since it looks like Nova is translating the vast majority of their debug >> messages). I don't know if that's helpful (enough) though. >> > I'm not sure either. Daisy, would it solve your team's needs if we just > removed translation markers from debug log messages and left everything in > the same catalog? It's not what we talked about at the summit, but maybe > it's an alternative? > > > A lot of my motivation for getting these numbers was finding a > "simpler" way to break down translation domains, but since I seem to have > changed my mind on that I'm not as hung up on this. If we can accomplish > what we need by dropping debug translations that would be great, but since > those numbers don't include non-log translations I'm guessing it won't be > enough. Still interested to hear from Daisy though. > Yes, I'd like to make sure our plans meet their needs. We could take both approaches -- not translate debug messages at all, and mark the others based on their level. > >> I suppose my biggest concern is getting reviewers to buy in to whatever >> we do. It's going to be some additional workload for them since we likely >> can't enforce this through a hacking rule, and some people basically refuse >> to touch anything to do with translation as it is. It's also one more >> hurdle for new contributors since it's a non-standard way of handling >> translation. And, as I noted on the review, it's almost certainly going to >> get out of sync over time as people adjust log message priorities and >> such. Maybe those are all issues we just have to accept, but they are >> issues. >> > I expect we'll need to set some project-wide standards, as Sean is doing > with the meanings of the various log levels. > >> >> Oh, one other thing I wanted to ask about was what the status of >> Transifex is as far as OpenStack is concerned. My understanding was that >> we were looking for alternatives because Transifex had pretty much >> abandoned their open source version. Does that have any impact on this? >> > If we replace it, we will replace it with another tool. The file formats > are standardized, so I wouldn't expect a tool change at that level to > affect our decision on this question. > > > Fair enough. Handling this gracefully would just become a requirement > on any new tool we adopted. > Agreed. Doug > > Doug > >> >> Anyway, it's getting late and my driveway won't shovel itself, so those >> are my slightly rambling thoughts on this. :-) >> >> -Ben >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev