Hello!

Possibly i'm missing something really obvious here. But ...

If i have a date-time string of the kind specified in RFC 1123, like this:

Tue, 12 Aug 2008 20:48:59 -0700

Can i turn that into a seconds-since-the-epoch time using the standard time module without jumping through substantial hoops?

Apart from the timezone, this can be parsed using time.strptime with the format:

%a, %d %b %Y %H:%M:%S

You can stick a %Z on the end for the timezone, but that parses timezone names ('BST', 'EDT'), not numeric specifiers. Also, it doesn't actually parse anything, it just requires that the timezone that's in the string matches your local timezone.

Okay, no problem, so you use a regexp to split off the timezone specifier, parse that yourself, then parse the raw time with strptime.

Now you just need to adjust the parsed time for the timezone. Now, from strptime, you get a struct_time, and that doesn't have room for a timezone (although it does have room for a daylight saving time flag), so you can't add the timezone in before you convert to seconds-since-the-epoch.

Okay, so convert the struct_time to seconds-since-the-epoch as if it were UTC, then apply the timezone correction. Converting a struct_time to seconds-since-the-epoch is done with mktime, right? Wrong! That does the conversion *in your local timezone*. There's no way to tell it to use any specific timezone, not even just UTC.

So how do you do this?

Can we convert from struct_time to seconds-since-the-epoch by hand? Well, the hours, minutes and seconds are pretty easy, but dealing with the date means doing some hairy calculations with leap years, which are doable but way more effort than i thought i'd be expending on parsing the date format found in every single email in the world.

Can we pretend the struct_time is a local time, convert it to seconds-since-the-epoch, then adjust it by whatever our current timezone is to get true seconds-since-the-epoch, *then* apply the parsed timezone? I think so:

def mktime_utc(tm):
        "Return what mktime would return if we were in the UTC timezone"
        return time.mktime(tm) - time.timezone

Then:

def mktime_zoned(tm, tz):
        "Return what mktime would return if we were in the timezone given by tz"
        return mktime_utc(tm) - tz

The only problem there is that mktime_utc doesn't deal with DST: if tm is a date for which DST would be in effect for the local timezone, then we need to subtract time.altzone, not time.timezone. strptime doesn't fill in the dst flag, as far as i can see, so we have to round-trip via mktime/localtim:

def isDST(tm):
        tm2 = time.localtime(time.mktime(tm))
        assert (tm2.isdst != -1)
        return bool(tm2.isdst)

def timezone(tm):
        if (isDST(tm)):
                return time.altzone
        else:
                return time.timezone

mktime_utc then becomes:

def mktime_utc(tm):
        return time.mktime(tm) - timezone(tm)

And you can of course inline that and eliminate a redundant call to mktime:

def mktime_utc(tm):
        t = time.mktime(tm)
        isdst = time.localtime(t).isdst
        assert (isdst != -1)
        if (isdst):
                tz = time.altzone
        else:
                tz = time.timezone
        return t - tz

So, firstly, does that work? Answer: i've tested it a it, and yes.

Secondly, do you really have to do this just to parse a date with a timezone? If so, that's ridiculous.

tom

--
102     FX 6 (goblins)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to