Re: Matplotlib X-axis timezone trouble

2015-06-30 Thread Chris Angelico
On Tue, Jun 30, 2015 at 2:49 PM, Peter Pearson
 wrote:
> Time zones teem with sneaky software problems, and so does daylight-saving
> time, so this problem might strain my brain.  Maybe it's going to turn
> out that my expectations are unreasonable . . . as in, "Well, smarty pants,
> how do you want the axis labelled when the abscissas straddle the
> beginning of daylight-saving time?"  I'll research and digest.

That's entirely possible. Even more so if you go across some other
civil time change - if you look at the history of timezones in tzdata,
there's no end of messes as different places adopted standard time,
redefined standard time, and unified with someone else's standard
time. And some of that happened relatively recently.

UTC is much easier for this kind of thing. Especially if the
granularity of your data lets you ignore leap seconds.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Scapy and MGCP

2015-06-30 Thread Devaki Chokshi (dchokshi)
Hello,

As per reply received I have begun to use scapy for MGCP. 

I started off with  reading a .pcap file with MGCP packets. 

For example:

from scapy.all import *
from scapy.utils import *
from scapy.layers.mgcp import *

mgcp_pkts = rdpcap("my-mgcp-capture.pcap")

However, rdpcap() is returning all MGCP packets as UDP packets. The output is 
something like:



All 262 UDP packets are actually MGCP packets. 

I would have thought rdpcap() will understand MGCP packet format. 

Would appreciate help on how I can make scapy read MGCP packets from a .pcap 
file.

Thank you
Devaki

-Original Message-
From: Laura Creighton [mailto:l...@openend.se] 
Sent: Tuesday, June 23, 2015 7:47 AM
To: Devaki Chokshi (dchokshi)
Cc: python-list@python.org; sht...@python.org; l...@openend.se
Subject: Re: MGCP Server implementation

In a message of Mon, 22 Jun 2015 20:42:21 -, "Devaki Chokshi (dchokshi)" wr
ites:
>Hello,
>
>I have a use case where a SIP voice call will be passing through an MGCP 
>gateway.
>
>Is there a python implementation that simulates MGCP gateway/server?
>
>Thank you
>Devaki Chokshi
>
>-- 
>https://mail.python.org/mailman/listinfo/python-list
>

I think http://www.secdev.org/projects/scapy/ will do this for you.
In particular
https://code.google.com/p/pythonxy/source/browse/src/python/_scapy/PURELIB/scapy/layers/mgcp.py?repo=xy-27&name=v2.7.2.1&r=c4a08530c8ff74fd6078507743ce9982675aea8c

But I have only used Sacpy for other protocols, so I am not certain.

Laura
-- 
https://mail.python.org/mailman/listinfo/python-list


Segmentation Fault with multiple interpreters

2015-06-30 Thread Florian Rüchel
Hey there,

I'm not sure this is the correct list considering the level of internal
Python knowledge it likely requires. If I should take this to another
list, please let me know.

I have written an application that links against libpython and starts
multiple interpreters within one thread. Beyond that it actually links
against libpython2.7 *and* libpython3.4 (though I doubt that makes a
difference here). In addition to launching my own interpreters I created
a custom C module that can be called from Python (and will be loaded
during my own init function).

The problem is a bit weird and not easy to reproduce. There are two
types of behaviors launched, both running the same C code but diverting
during interpreter runtime. The problem happens only if I start 8
instances of type A and at least one of type B. All instances of type A
start before any instance of type B starts. On the first start of tpye B
the program segfaults. Decreasing the number to 7 yields normal execution.

I could now go along and post a lot of code here and explain the complex
setup but I think it is much more helpful if I can get hints on how to
debug this myself. An excerpt of the traceback (full can be found here:
http://pastebin.com/unAaG6qi)

#0  0x7f922096cd6a in visit_decref (op=0x7f921d801518
<__hoisted_globals+216>, data=0x0) at Modules/gcmodule.c:360
#1  0x7f92208aadf2 in meth_traverse (m=0x7f921d4f0d40,
visit=0x7f922096cd52 , arg=0x0) at Objects/methodobject.c:166
#2  0x7f922096ce2c in subtract_refs (containers=0x7f9220c108e0
) at Modules/gcmodule.c:385
#3  0x7f922096dac9 in collect (generation=0x2) at Modules/gcmodule.c:925
#4  0x7f922096de0a in collect_generations () at Modules/gcmodule.c:1050
#5  0x7f922096e939 in _PyObject_GC_Malloc (basicsize=0x20) at
Modules/gcmodule.c:1511
#6  0x7f922096e9d9 in _PyObject_GC_NewVar (tp=0x7f9220bf1540
, nitems=0x1) at Modules/gcmodule.c:1531
#7  0x7f92208c7681 in PyTuple_New (size=0x1) at Objects/tupleobject.c:90
...
#58 0x7f9220953ff7 in initsite () at Python/pythonrun.c:726
#59 0x7f9220953d94 in Py_NewInterpreter () at Python/pythonrun.c:621
#60 0x7f921d60024b in prepare_interpreter (argc=0x9, argv=0x387ee70,
m=0x3861e80) at
/home/javex/Thesis/src/shadow-plugin-extras/python/src/python.c:19
#61 0x7f921d5ff925 in python_new (argc=0x9, argv=0x387ee70,
log=0x421420 <_thread_interface_log>) at
/home/javex/Thesis/src/shadow-plugin-extras/python/src/python.c:160

with the exact point of segfault being this:

355 /* A traversal callback for subtract_refs. */
356 static int
357 visit_decref(PyObject *op, void *data)
358 {
359 assert(op != NULL);
360 if (PyObject_IS_GC(op)) { < segfault here
361 PyGC_Head *gc = AS_GC(op);
362 /* We're only interested in gc_refs for objects in the
363  * generation being collected, which can be recognized
364  * because only they have positive gc_refs.

With "op" being:
gdb$ print *op
$4 = {ob_refcnt = 0x1, ob_type = 0x0}

Now I vaguely remember having seen this error before. It *might* have
been that I was passing back Py_None without increasing its reference
(but it later being decref'ed). That case is now solved and I don't
return Py_None at another place.

So the big question is: How do I find out what is happening here? From
what I can gather it looks like GC is cleaning up so I guess I have a
refcount wrong. But which? Where? Can I debug this?

If you think this might help in any case, my source code is also
available:
https://github.com/Javex/shadow-plugin-extras/tree/master/python/src
Here the files python-logging.c, python.c and python-plugin.c are in use
with python.c having most of the magic and python-logging.c being my own
module. Beyond that there's only python code, no more C of my own.

Thanks in advance for any help you can provide. Kind regards,
Florian
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 02:09:17 UTC+2 skrev Ben Bacarisse:
> Ian Kelly  writes:
> 
> > On Mon, Jun 29, 2015 at 4:56 PM, Ian Kelly  wrote:
> >> On Mon, Jun 29, 2015 at 4:39 PM,   wrote:
> >>> http://jt.node365.se/baseconversion8.html
> 
> > By the way, I think you have a bug. I did my time testing by
> > concatenating the default input number to itself several times. At
> > 5184 decimal digits, which was the last case I tried, the 58th output
> > digit was 111, after which point the remaining 672 output digits
> > are all 12665464, which seemed rather improbable.
> 
> Yes, it's a bug.  Because it's off-topic, I won't say more here but I
> posted a simpler test case to comp.lang.javacript just now (1000
> converted to base 100).  I would not normally reply here, but I wanted
> to acknowledge your post as prompting me to look for other cases.
> Followup-to: set.
> 
> -- 
> Ben.

Thank you, for finding the bug Ben, it turned out i did basicly same thing two 
times. One of the times was the more general case and that turned out to do the 
correct thing.

jt.node365.se/baseconversion9.html

It still bug out on very big numbers if base outside integer scope.
I am very keen on suggestions regarding the logic to make it faster.
I will read in base into an array holding bignumb, and then it should work for 
anybase and any number as far i can understand. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Christian Gollwitzer

Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:

It still bug out on very big numbers if base outside integer scope.
I am very keen on suggestions regarding the logic to make it faster.


Concerning the algorithmic complexity, it can't be faster than square 
time in the number of digits N. Baseconversion needs to do a sequence of 
division operations, where every operation gves you one digit in the new 
base. The number of digits in the new base is proportional to the number 
of digits in the old base (the ratio is log b1/log b2). Therefore it 
will be O(N^2).


Christian


--
https://mail.python.org/mailman/listinfo/python-list


Re: enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-30 Thread Robert Kern

On 2015-06-30 01:54, Denis McMahon wrote:

On Sun, 28 Jun 2015 17:07:00 -0700, Ned Batchelder wrote:


On Sunday, June 28, 2015 at 5:02:19 PM UTC-4, Denis McMahon wrote:




   string 3
   string 2
   string 1




Each  is just a member of the collection things, the xml does
not contain sufficient information to state that  is an ordered
collection containing a specific sequence of .


You are right that XML does not specify that  is an ordered
collection.
But XML does preserve the order of the children.  There are many XML
schema that rely on XML's order-preserving nature.


But what we *don't* know is whether the order of the umpteen identical
tags in the XML has any significance in terms of the original data,
although the OP seems intent on assigning some significance to that order
without any basis for doing so.

Consider the following tuple:

t = (tuplemember_1, tuplemember_2,  tuplemember_n)

Can we safely assume that if the tuple is ever converted to xml, either
now or at some future date using whatever the code implementation is
then, that the order of the items will be preserved:


   tuplemember_1
   tuplemember_2

   tuplemember_n/item>



Barring bugs, yes!


And if we're reading that xml structure at some point in the future, is
it safe to assume that the tuple members are in the same order in the xml
as they were in the original tuple?


Yes! Any conforming XML implementation will preserve the order.


For sanity  should have an attribute specifying the sequence of the
item in it's tuple.


While it may make you more comfortable, it's hardly a requirement for sanity.

I think you had a point in your first paragraph here, but you are obscuring it 
with FUD. The problem is not whether unadorned XML elements can be used to 
represent an ordered collection. They can and are, frequently, without any 
problem because XML elements are intrinsically ordered.


The real problem that you almost get around to articulating is that XML elements 
can *also* be used to represent unordered collections simply by ignoring the 
(preserved) order of the elements. And if you are completely blind as to the 
schema as the OP apparently is, and you are simply given a load of XML and told 
to do "something" with it, you don't know if any given collection is meant to be 
ordered or unordered. Of course, the only sensible thing to do is just preserve 
the order given to you as that is what the semantics of XML requires of you in 
the absence of a schema that says otherwise. You can always disregard the order 
later.


That said, if the data is regular enough to actually be restructured into a 
table (i.e. if  always has the same number of children, etc.), 
then it probably does represent an ordered collection. If it's variable, then 
putting it into a table structure probably doesn't make any sense regardless of 
ordering issues.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list


get Coursera video download link via program

2015-06-30 Thread iMath
I want extract Coursera video download link via program(mainly Python) behind 
those links

https://www.coursera.org/learn/human-computer-interaction/lecture/s4rFQ/the-interaction-design-specialization

https://www.coursera.org/learn/calculus1/lecture/IYGhT/why-is-calculus-going-to-be-so-much-fun

After red a lot of articles about this, still cannot find a way to extract the 
video download link via program, anyone can offer a step by step solution of 
extracting the video download link ? Thanks!

P.S. I know this project ,
https://github.com/coursera-dl/coursera
 but the code is so complex , so I dropped out.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 11:08:01 UTC+2 skrev Christian Gollwitzer:
> Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
> > It still bug out on very big numbers if base outside integer scope.
> > I am very keen on suggestions regarding the logic to make it faster.
> 
> Concerning the algorithmic complexity, it can't be faster than square 
> time in the number of digits N. Baseconversion needs to do a sequence of 
> division operations, where every operation gves you one digit in the new 
> base. The number of digits in the new base is proportional to the number 
> of digits in the old base (the ratio is log b1/log b2). Therefore it 
> will be O(N^2).
> 
>   Christian

There is not a single division "except for the 2 one, used in halfinterval 
search" 
This algorithm only use multiplication and a modified GCD search algorithm.

http://jt.node365.se/baseconversion9.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: get Coursera video download link via program

2015-06-30 Thread iMath
I know the two packages : BeautifulSoup and requests may help, I am also able 
to login in Coursera via requests, the difficulty is how to find out  the video 
download link from the page.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 11:08:01 UTC+2 skrev Christian Gollwitzer:
> Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
> > It still bug out on very big numbers if base outside integer scope.
> > I am very keen on suggestions regarding the logic to make it faster.
> 
> Concerning the algorithmic complexity, it can't be faster than square 
> time in the number of digits N. Baseconversion needs to do a sequence of 
> division operations, where every operation gves you one digit in the new 
> base. The number of digits in the new base is proportional to the number 
> of digits in the old base (the ratio is log b1/log b2). Therefore it 
> will be O(N^2).
> 
>   Christian

Any new digit will be found in SQRT(base) comparissons. 
Averaged case will be in (SQRT(base)*(SQRT(base)+1))/2
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 11:35:06 UTC+2 skrev jonas.t...@gmail.com:
> Den tisdag 30 juni 2015 kl. 11:08:01 UTC+2 skrev Christian Gollwitzer:
> > Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
> > > It still bug out on very big numbers if base outside integer scope.
> > > I am very keen on suggestions regarding the logic to make it faster.
> > 
> > Concerning the algorithmic complexity, it can't be faster than square 
> > time in the number of digits N. Baseconversion needs to do a sequence of 
> > division operations, where every operation gves you one digit in the new 
> > base. The number of digits in the new base is proportional to the number 
> > of digits in the old base (the ratio is log b1/log b2). Therefore it 
> > will be O(N^2).
> > 
> > Christian
> 
> Any new digit will be found in SQRT(base) comparissons. 
> Averaged case will be in (SQRT(base)*(SQRT(base)+1))/2

Well that averaging was balloney. How do i write that the digit will be found 
in.
Average values from 1 to SQRT(base)?
-- 
https://mail.python.org/mailman/listinfo/python-list


EuroPython 2015: Call for On-site Volunteers

2015-06-30 Thread M.-A. Lemburg
EuroPython is organized and run by volunteers from the Python
community, but we’re only a few and we will need more help to make the
conference run smoothly.

We need your help !
---

We will need help with the conference and registration desk, giving
out the swag bags and t-shirts, session chairing, entrance control,
set up and tear down, etc.

Perks for Volunteers


In addition to endless fame and glory as official EuroPython
Volunteer, we have also added some real-life few perks for you:

 * We will grant each volunteer a compensation of EUR 22 per shift

 * Volunteers will be eligible for student house rooms we have
   available and can use their compensation to pay for these

 * Get an awesome EuroPython Volunteer T-Shirt that you can keep and
   show off to your friends :-)

Register as Volunteer
-

Please see our EuroPython Volunteers page for details and the
registration form:

https://ep2015.europython.eu/en/registration/volunteers/

If you have questions, please write to our helpd...@europython.eu.

Hope to see you in Bilbao :-)

Enjoy,
--
EuroPython 2015 Team
http://ep2015.europython.eu/
http://www.europython-society.org/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano  wrote:
> On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:
>> Not sure why you posted the link.  The crc32 checksum is just to check
>> for possible filesystem corruption.  The system does periodic data
>> corruption checks.  BTRFS uses crc32 checksums also.  Please explain.
>
> The file system can trust that anything writing to a file is allowed to
> write to it, in doesn't have to defend against malicious writes. As I
> understand it, your application does.
>
> Here is the attack scenario I have in mind:
>
> - you write a file to my computer, and neglect to encrypt it;

Eh? The game is over right there. I don't trust you, and yet
I have just given you my private data, unencrypted. Checksums
don't even come into it, we have failed utterly at step 1.

> - since you are using CRC, it is quite easy for me to ensure the 
>   checksums match after inserting malware;

No, you have yet *again* misunderstood the difference between the
client and the server.

> I was wrong: cryptographically strong ciphers are generally NOT resistant to
> what I described as a preimage attack. If the key leaks, using AES won't
> save you: an attacker with access to the key can produce a ciphertext that
> decrypts to the malware of his choice, regardless of whether you use
> AES-256 or rot-13. There may be other encryption methods which don't suffer
> from that, but he doesn't know of them off the top of his head.

lol. I suspected as much. You and Johannes were even more wrong than
was already obvious.

> The other threat I mentioned is that the receiver will read the content of
> the file. For that, a strong cipher is much to be preferred over a weak
> one, and it needs to be encrypted by the sending end, not the receiving
> end. (If the receiving end does it, it has to keep the key so it can
> decrypt before sending back, which means the computer's owner can just grab
> the key and read the files.)

Yes, that is utterly basic.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-30 Thread Chris Angelico
On Tue, Jun 30, 2015 at 11:26 AM, Dennis Lee Bieber
 wrote:
> My system takes something like three hours just to generate a 500GB
> backup (one partition each week -- I have a 4TB backup drive with only
> 740GB free; the other drives are only half full or I'd need an 8TB backup).
> And that's using a compiled backup program -- I'd hate to consider what
> Python would require to backup the partition.

Probably about the same, actually. In my experience, there's often
very little speed difference between a straight 'dd' from one
partition to another (say, making a disk image prior to data recovery)
and doing more complicated work (say, archiving and compressing).
Until you actually manage to saturate your CPU with the workload
(video editing or something), the time is most likely to be dominated
by the disk platters.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 11:43:55 UTC+2 skrev jonas.t...@gmail.com:
> Den tisdag 30 juni 2015 kl. 11:35:06 UTC+2 skrev jonas.t...@gmail.com:
> > Den tisdag 30 juni 2015 kl. 11:08:01 UTC+2 skrev Christian Gollwitzer:
> > > Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
> > > > It still bug out on very big numbers if base outside integer scope.
> > > > I am very keen on suggestions regarding the logic to make it faster.
> > > 
> > > Concerning the algorithmic complexity, it can't be faster than square 
> > > time in the number of digits N. Baseconversion needs to do a sequence of 
> > > division operations, where every operation gves you one digit in the new 
> > > base. The number of digits in the new base is proportional to the number 
> > > of digits in the old base (the ratio is log b1/log b2). Therefore it 
> > > will be O(N^2).
> > > 
> > >   Christian
> > 
> > Any new digit will be found in SQRT(base) comparissons. 
> > Averaged case will be in (SQRT(base)*(SQRT(base)+1))/2
> 
> Well that averaging was balloney. How do i write that the digit will be found 
> in.
> Average values from 1 to SQRT(base)?

Regarding the time it seem to double the digits quadruple the time. And that is 
still linear or?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 15:22:44 UTC+2 skrev jonas.t...@gmail.com:
> Den tisdag 30 juni 2015 kl. 11:43:55 UTC+2 skrev jonas.t...@gmail.com:
> > Den tisdag 30 juni 2015 kl. 11:35:06 UTC+2 skrev jonas.t...@gmail.com:
> > > Den tisdag 30 juni 2015 kl. 11:08:01 UTC+2 skrev Christian Gollwitzer:
> > > > Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
> > > > > It still bug out on very big numbers if base outside integer scope.
> > > > > I am very keen on suggestions regarding the logic to make it faster.
> > > > 
> > > > Concerning the algorithmic complexity, it can't be faster than square 
> > > > time in the number of digits N. Baseconversion needs to do a sequence 
> > > > of 
> > > > division operations, where every operation gves you one digit in the 
> > > > new 
> > > > base. The number of digits in the new base is proportional to the 
> > > > number 
> > > > of digits in the old base (the ratio is log b1/log b2). Therefore it 
> > > > will be O(N^2).
> > > > 
> > > > Christian
> > > 
> > > Any new digit will be found in SQRT(base) comparissons. 
> > > Averaged case will be in (SQRT(base)*(SQRT(base)+1))/2
> > 
> > Well that averaging was balloney. How do i write that the digit will be 
> > found in.
> > Average values from 1 to SQRT(base)?
> 
> Regarding the time it seem to double the digits quadruple the time. And that 
> is still linear or?

2x seem linear to me?
-- 
https://mail.python.org/mailman/listinfo/python-list


Parsing logfile with multi-line loglines, separated by timestamp?

2015-06-30 Thread Victor Hooi
Hi,

I'm trying to parse iostat -xt output using Python. The quirk with iostat is 
that the output for each second runs over multiple lines. For example:

06/30/2015 03:09:17 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.030.000.030.000.00   99.94

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvdap10.00 0.040.020.07 0.30 3.2881.37 
0.00   29.832.74   38.30   0.47   0.00
xvdb  0.00 0.000.000.00 0.00 0.0011.62 
0.000.230.192.13   0.16   0.00
xvdf  0.00 0.000.000.00 0.00 0.0010.29 
0.000.410.410.73   0.38   0.00
xvdg  0.00 0.000.000.00 0.00 0.00 9.12 
0.000.360.351.20   0.34   0.00
xvdh  0.00 0.000.000.00 0.00 0.0033.35 
0.001.390.418.91   0.39   0.00
dm-0  0.00 0.000.000.00 0.00 0.0011.66 
0.000.460.460.00   0.37   0.00

06/30/2015 03:09:18 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.000.500.000.00   99.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvdap10.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdf  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdg  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdh  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

06/30/2015 03:09:19 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.000.500.000.00   99.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvdap10.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdf  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdg  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdh  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

Essentially I need to parse the output in "chunks", where each chunk is 
separated by a timestamp.

I was looking at itertools.groupby(), but that doesn't seem to quite do what I 
want here - it seems more for grouping lines, where each is united by a common 
key, or something that you can use a function to check for.

Another thought was something like:

for line in f:
if line.count("/") == 2 and line.count(":") == 2:
current_time = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
while line.count("/") != 2 and line.count(":") != 2:
print(line)
continue

But that didn't quite seem to work.

Is there a Pythonic way of parsing the above iostat output, and break it into 
chunks split by the timestamp?

Cheers,
Victor
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Ian Kelly
On Tue, Jun 30, 2015 at 8:13 AM,   wrote:
>> Regarding the time it seem to double the digits quadruple the time. And that 
>> is still linear or?
>
> 2x seem linear to me?

That's not linear, nor is it "2x". If doubling the size of the input
quadruples the time, then doubling the size of the input twice (i.e.
quadrupling the size of the input) results in the time being increased
by a factor of 16. Double it three times, and the time taken is
increased by a factor of 64. That's quadratic behavior.

If the algorithm were actually linear, then a constant factor of "2x"
wouldn't matter when calculating the growth. Doubling the size of the
input would simply double the time taken.

Of course, this is just an empirical estimation of the asymptotic
complexity of the algorithm, not a proof. Maybe after 10,000 digits it
actually does become linear, although I doubt it. Proving it would
require analysis of the code, and I'm not willing to dig into 500
lines of Javascript just for the sake of it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Python 3 resuma a file download

2015-06-30 Thread zljubisic
Hi,

I would like to download a file (http://video.hrt.hr/2906/otv296.mp4)

If the connection is OK, I can download the file with:

import urllib.request
urllib.request.urlretrieve(remote_file, local_file)

Sometimes when I am connected on week wireless (not mine) network I get 
WinError 10054 exception (windows 7).

When it happens, I would like to resume download instead of doing everything 
from very beginning.

How to do that?

I read about Range header and chunks, but this server doesn't have any headers.

What options do I have with this particular file?

Regards.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Ian Kelly
On Tue, Jun 30, 2015 at 9:40 AM, Ian Kelly  wrote:
> On Tue, Jun 30, 2015 at 3:07 AM, Christian Gollwitzer  wrote:
>> Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
>>>
>>> It still bug out on very big numbers if base outside integer scope.
>>> I am very keen on suggestions regarding the logic to make it faster.
>>
>>
>> Concerning the algorithmic complexity, it can't be faster than square time
>> in the number of digits N. Baseconversion needs to do a sequence of division
>> operations, where every operation gves you one digit in the new base. The
>> number of digits in the new base is proportional to the number of digits in
>> the old base (the ratio is log b1/log b2). Therefore it will be O(N^2).
>
> I don't think that's true. Here's a linear hexadecimal to binary function:
>
 def hextobin(value):
> ... digits = {'0': '', '1': '0001', '2': '0010', '3': '0011',
> ... '4': '0100', '5': '0101', '6': '0110', '7': '0111',
> ... '8': '1000', '9': '1001', 'A': '1010', 'B': '1011',
> ... 'C': '1100', 'D': '1101', 'E': '1110', 'F': ''}
> ... return ''.join(digits[d.upper()] for d in value)
> ...
 hextobin('3f')
> '0011'
>
> I believe this approach can be extended to arbitrary bases with some
> effort, although for converting arbitrary base b1 to b2, you would
> need up to b2 different mappings if b1 and b2 are relatively prime.

Actually, I think you need up to b1 * b2 mappings, as you're
effectively building a state machine with b1 * b2 states. The mappings
can be pre-computed, however, so actually running the state machine
would then just be a linear algorithm.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing logfile with multi-line loglines, separated by timestamp?

2015-06-30 Thread Skip Montanaro
Maybe define a class which wraps a file-like object. Its next() method (or
is it __next__() method?) can just buffer up lines starting with one which
successfully parses as a timestamp, accumulates all the rest, until a blank
line or EOF is seen, then return that, either as a list of strings, one
massive string, or some higher level representation (presumably an instance
of another class) which represents one "paragraph" of iostat output.

Skip


On Tue, Jun 30, 2015 at 10:24 AM, Victor Hooi  wrote:

> Hi,
>
> I'm trying to parse iostat -xt output using Python. The quirk with iostat
> is that the output for each second runs over multiple lines. For example:
>
> 06/30/2015 03:09:17 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>0.030.000.030.000.00   99.94
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap10.00 0.040.020.07 0.30 3.28
> 81.37 0.00   29.832.74   38.30   0.47   0.00
> xvdb  0.00 0.000.000.00 0.00 0.00
> 11.62 0.000.230.192.13   0.16   0.00
> xvdf  0.00 0.000.000.00 0.00 0.00
> 10.29 0.000.410.410.73   0.38   0.00
> xvdg  0.00 0.000.000.00 0.00 0.00
>  9.12 0.000.360.351.20   0.34   0.00
> xvdh  0.00 0.000.000.00 0.00 0.00
> 33.35 0.001.390.418.91   0.39   0.00
> dm-0  0.00 0.000.000.00 0.00 0.00
> 11.66 0.000.460.460.00   0.37   0.00
>
> 06/30/2015 03:09:18 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>0.000.000.500.000.00   99.50
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap10.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdb  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdf  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdg  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdh  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
>
> 06/30/2015 03:09:19 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>0.000.000.500.000.00   99.50
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvdap10.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdb  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdf  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdg  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> xvdh  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.000.000.00 0.00 0.00
>  0.00 0.000.000.000.00   0.00   0.00
>
> Essentially I need to parse the output in "chunks", where each chunk is
> separated by a timestamp.
>
> I was looking at itertools.groupby(), but that doesn't seem to quite do
> what I want here - it seems more for grouping lines, where each is united
> by a common key, or something that you can use a function to check for.
>
> Another thought was something like:
>
> for line in f:
> if line.count("/") == 2 and line.count(":") == 2:
> current_time = datetime.strptime(line.strip(), '%m/%d/%y
> %H:%M:%S')
> while line.count("/") != 2 and line.count(":") != 2:
> print(line)
> continue
>
> But that didn't quite seem to work.
>
> Is there a Pythonic way of parsing the above iostat output, and break it
> into chunks split by the timestamp?
>
> Cheers,
> Victor
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing logfile with multi-line loglines, separated by timestamp?

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro  wrote:
> Maybe define a class which wraps a file-like object. Its next() method (or
> is it __next__() method?) can just buffer up lines starting with one which
> successfully parses as a timestamp, accumulates all the rest, until a blank
> line or EOF is seen, then return that, either as a list of strings, one
> massive string, or some higher level representation (presumably an instance
> of another class) which represents one "paragraph" of iostat output.

next() in Py2, __next__() in Py3. But I'd do it, instead, as a
generator - that takes care of all the details, and you can simply
yield useful information whenever you have it. Something like this
(untested):

def parse_iostat(lines):
"""Parse lines of iostat information, yielding ... something

lines should be an iterable yielding separate lines of output
"""
block = None
for line in lines:
line = line.strip()
try:
tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
if block: yield block
block = [tm]
except ValueError:
# It's not a new timestamp, so add it to the existing block
block.append(line)
if block: yield block

This is a fairly classic line-parsing generator. You can pass it a
file-like object, a list of strings, or anything else that it can
iterate over; it'll yield some sort of aggregate object representing
each time's block. In this case, all it does is append strings to a
list, so this will result in a series of lists of strings, each one
representing a single timestamp; you can parse the other lines in any
way you like and aggregate useful data. Usage would be something like
this:

with open("logfile") as f:
for block in parse_iostat(f):
# do stuff with block

This will work quite happily with an ongoing stream, too, so if you're
working with a pipe from a currently-running process, it'll pick stuff
up just fine. (However, since it uses the timestamp as its signature,
it won't yield anything till it gets the *next* timestamp. If the
blank line is sufficient to denote the end of a block, you could
change the loop to look for that instead.)

Hope that helps!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Ian Kelly
On Tue, Jun 30, 2015 at 3:07 AM, Christian Gollwitzer  wrote:
> Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:
>>
>> It still bug out on very big numbers if base outside integer scope.
>> I am very keen on suggestions regarding the logic to make it faster.
>
>
> Concerning the algorithmic complexity, it can't be faster than square time
> in the number of digits N. Baseconversion needs to do a sequence of division
> operations, where every operation gves you one digit in the new base. The
> number of digits in the new base is proportional to the number of digits in
> the old base (the ratio is log b1/log b2). Therefore it will be O(N^2).

I don't think that's true. Here's a linear hexadecimal to binary function:

>>> def hextobin(value):
... digits = {'0': '', '1': '0001', '2': '0010', '3': '0011',
... '4': '0100', '5': '0101', '6': '0110', '7': '0111',
... '8': '1000', '9': '1001', 'A': '1010', 'B': '1011',
... 'C': '1100', 'D': '1101', 'E': '1110', 'F': ''}
... return ''.join(digits[d.upper()] for d in value)
...
>>> hextobin('3f')
'0011'

I believe this approach can be extended to arbitrary bases with some
effort, although for converting arbitrary base b1 to b2, you would
need up to b2 different mappings if b1 and b2 are relatively prime.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 1:45 AM, Ian Kelly  wrote:
> On Tue, Jun 30, 2015 at 9:40 AM, Ian Kelly  wrote:
>> On Tue, Jun 30, 2015 at 3:07 AM, Christian Gollwitzer  
>> wrote:
>>> Am 30.06.15 um 10:52 schrieb jonas.thornv...@gmail.com:

 It still bug out on very big numbers if base outside integer scope.
 I am very keen on suggestions regarding the logic to make it faster.
>>>
>>>
>>> Concerning the algorithmic complexity, it can't be faster than square time
>>> in the number of digits N. Baseconversion needs to do a sequence of division
>>> operations, where every operation gves you one digit in the new base. The
>>> number of digits in the new base is proportional to the number of digits in
>>> the old base (the ratio is log b1/log b2). Therefore it will be O(N^2).
>>
>> I don't think that's true. Here's a linear hexadecimal to binary function:
>>
> def hextobin(value):
>> ... digits = {'0': '', '1': '0001', '2': '0010', '3': '0011',
>> ... '4': '0100', '5': '0101', '6': '0110', '7': '0111',
>> ... '8': '1000', '9': '1001', 'A': '1010', 'B': '1011',
>> ... 'C': '1100', 'D': '1101', 'E': '1110', 'F': ''}
>> ... return ''.join(digits[d.upper()] for d in value)
>> ...
> hextobin('3f')
>> '0011'
>>
>> I believe this approach can be extended to arbitrary bases with some
>> effort, although for converting arbitrary base b1 to b2, you would
>> need up to b2 different mappings if b1 and b2 are relatively prime.
>
> Actually, I think you need up to b1 * b2 mappings, as you're
> effectively building a state machine with b1 * b2 states. The mappings
> can be pre-computed, however, so actually running the state machine
> would then just be a linear algorithm.

Arbitrary base conversion has to be stateful. You can take a shortcut
like this when the bases are related (eg binary, octal, hexadecimal),
but otherwise, you need the division. Consider what happens when you
convert the binary digit "1" to decimal, and then follow it with
varying numbers of zeros:

1, 2, 4, 8, 16, 32, 64,... 32768, 65536, 131072,... 1048576,... 1073741824,...

You can certainly do some useful analyses on the last digits (they'll
never be anything other than 2, 4, 8, 6, except for the special case
of 1 itself), but there's a lot of variation in the intermediate
digits.

When there's a simple ratio between the bases, it's fairly
straight-forward to convert a few digits at a time. Converting base
256 into base 64, for instance, can be done by taking three digits and
yielding four. But within that, you would still need a complete table
of all sixteen million possibilities, if you want to do the lookup
table. And that only works when there is that kind of relationship.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Michael Torrie
Do you have some Python code to show us?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread jonas . thornvall
Den tisdag 30 juni 2015 kl. 18:12:46 UTC+2 skrev Michael Torrie:
> Do you have some Python code to show us?

No i just thought you would find the digit search algorithm interesting.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Ian Kelly
On Tue, Jun 30, 2015 at 10:10 AM, Chris Angelico  wrote:
> When there's a simple ratio between the bases, it's fairly
> straight-forward to convert a few digits at a time. Converting base
> 256 into base 64, for instance, can be done by taking three digits and
> yielding four. But within that, you would still need a complete table
> of all sixteen million possibilities, if you want to do the lookup
> table. And that only works when there is that kind of relationship.

You're right. I was thinking that for base 5 to base 7, for instance,
one could read digits in groups of 7, but that doesn't work out; you
can't map any discrete number of base 5 digits to a corresponding
number of base 7 digits.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-30 Thread Marko Rauhamaa
Robert Kern :

>> 
>>tuplemember_1
>>tuplemember_2
>> 
>>tuplemember_n/item>
>> 
>
> [...]
>
> Yes! Any conforming XML implementation will preserve the order.

And not only the order: the newlines and other whitespace around the
s are also preserved as children of .

> And if you are completely blind as to the schema as the OP apparently
> is, and you are simply given a load of XML and told to do "something"
> with it, you don't know if any given collection is meant to be ordered
> or unordered. Of course, the only sensible thing to do is just
> preserve the order given to you as that is what the semantics of XML
> requires of you in the absence of a schema that says otherwise.

XML is an unfortunate creation. You cannot fully parse it without
knowing the schema (or document type). For example, these constructs
might or might not be equivalent:

å

å

That is because &...; entities are defined in the document type.

Similarly, these two constructs might or might not be equivalent:

   

   

By default, they would be, but the document type can declare whitespce
significant around an attribute value.

Finally, these two constructs might or might not be equivalent:

7 
   7

By default, they wouldn't be, but the document type can declare
whitespace *insignificant* around element contents.

Sigh, S-expressions would have been a much better universal data format.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Matplotlib X-axis timezone trouble

2015-06-30 Thread Peter Pearson
On Tue, 30 Jun 2015 17:01:15 +1000, Chris Angelico  wrote:
> On Tue, Jun 30, 2015 at 2:49 PM, Peter Pearson
> wrote:
>> Time zones teem with sneaky software problems, and so does daylight-saving
>> time, so this problem might strain my brain.  Maybe it's going to turn
>> out that my expectations are unreasonable . . . as in, "Well, smarty pants,
>> how do you want the axis labelled when the abscissas straddle the
>> beginning of daylight-saving time?"  I'll research and digest.
>
> That's entirely possible. Even more so if you go across some other
> civil time change - if you look at the history of timezones in tzdata,
> there's no end of messes as different places adopted standard time,
> redefined standard time, and unified with someone else's standard
> time. And some of that happened relatively recently.
>
> UTC is much easier for this kind of thing. Especially if the
> granularity of your data lets you ignore leap seconds.

Hear, hear!  I'm an enthusiastic fan of UTC.  But I need to interact
with nontechnical people who will say, "Hey, I logged that widget at
9AM, why do you show it as 4PM?"  Another funny thing that happens in
the pure-UTC world is my software gets confused by the way Frank
abruptly shifts his logging schedule by an hour ... twice a year ...
when DST starts or stops.

I'm just glad I don't have to worry about the distinctions among 
UTC, GMT, TAI, and UT1.

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Matplotlib X-axis timezone trouble

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 2:42 AM, Peter Pearson  wrote:
> I'm just glad I don't have to worry about the distinctions among
> UTC, GMT, TAI, and UT1.

Fortunately, that's often the case. GMT can be ignored, and the other
three differ by less seconds than most humans ever care about. If
you're scheduling a meeting, for instance, nobody's going to care if
your clocks differ by 36 seconds, which is as much as TAI and UTC will
differ in a few hours (there's a leap second scheduled for midnight at
the end of June 30th this year, and as my clock is now showing July,
it'll be happening soon). The difference between UTC and UT1 is
insignificant to virtually all humans, given that it's never more than
a single second; it's only high-precision calculations that ever need
concern themselves with that. So, yep! I share your gladness.
Distinctions exist but most of us don't have to worry about them.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Randall Smith

On 06/29/2015 10:00 PM, Steven D'Aprano wrote:

On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:


Not sure why you posted the link.  The crc32 checksum is just to check
for possible filesystem corruption.  The system does periodic data
corruption checks.  BTRFS uses crc32 checksums also.  Please explain.


The file system can trust that anything writing to a file is allowed to
write to it, in doesn't have to defend against malicious writes. As I
understand it, your application does.

Here is the attack scenario I have in mind:

- you write a file to my computer, and neglect to encrypt it;
- and record the checksum for later;
- I insert malware into your file;
- you retrieve the file from me;
- if the checksum matches what you have on record, you accept the file;
- since you are using CRC, it is quite easy for me to ensure the
   checksums match after inserting malware;
- and I have now successfully injected malware into your computer.

I'm making an assumption here -- I assume that the sender records a checksum
for uploaded files so that when they get something back again they can tell
whether or not it is the same content they uploaded.


Yes.  The client software computes sha256 checksums.



* * *

By the way, regarding the use of a substitution cipher, I spoke to the
crypto guy at work, and "preimage attack" is not quite the right
terminology, since that's normally used in the context of hash functions.
It's almost a "known ciphertext attack", but not quite, since that
terminology refers to guessing the key from the ciphertext.

I was wrong: cryptographically strong ciphers are generally NOT resistant to
what I described as a preimage attack. If the key leaks, using AES won't
save you: an attacker with access to the key can produce a ciphertext that
decrypts to the malware of his choice, regardless of whether you use
AES-256 or rot-13. There may be other encryption methods which don't suffer
from that, but he doesn't know of them off the top of his head.

His comment was, "don't leak the key".


I'm pretty sure all encryption hinges on guarding the key.



The other threat I mentioned is that the receiver will read the content of
the file. For that, a strong cipher is much to be preferred over a weak
one, and it needs to be encrypted by the sending end, not the receiving
end. (If the receiving end does it, it has to keep the key so it can
decrypt before sending back, which means the computer's owner can just grab
the key and read the files.)




And again, that's why the client (data owner) uses AES.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Randall Smith

On 06/29/2015 03:49 PM, Jon Ribbens wrote:

On 2015-06-29, Randall Smith  wrote:

Same reason newer filesystems like BTRFS use checkusms (BTRFS uses
CRC32).  The storage machine runs periodic file integrity checks.  It
has no control over the underlying filesystem.


True, but presumably neither does it have anything it can do to
rectify the situation if it finds a problem, and the client will
have to keep its own secure hash of its file anyway. (Unless I suppose
the server actually can request a new copy from the client or another
server if it finds a corrupt file?)



Yes.  The storage servers are monitored for integrity.  They can request 
a new copy, though frequent corruption results in the server being 
marked as unreliable.


-Randall

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Steven D'Aprano
On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote:

> On 2015-06-30, Steven D'Aprano  wrote:
>> On Tue, 30 Jun 2015 06:52 am, Randall Smith wrote:
>>> Not sure why you posted the link.  The crc32 checksum is just to check
>>> for possible filesystem corruption.  The system does periodic data
>>> corruption checks.  BTRFS uses crc32 checksums also.  Please explain.
>>
>> The file system can trust that anything writing to a file is allowed to
>> write to it, in doesn't have to defend against malicious writes. As I
>> understand it, your application does.
>>
>> Here is the attack scenario I have in mind:
>>
>> - you write a file to my computer, and neglect to encrypt it;
> 
> Eh? The game is over right there. I don't trust you, and yet
> I have just given you my private data, unencrypted.

Yes. That is exactly the problem. If the application doesn't encrypt the
data for me, *it isn't going to happen*. We are in violent agreement that
the sender needs to encrypt the data.

I think Randall has been somewhat less than clear about what the application
actually does and how it works. He probably thinks he doesn't need to
explain, that its none of our business, and wishes we'd just shut up about
it. That's his right.

It's also my right to discuss the possible security implications of some
hypothetical peer-to-peer dropbox-like application which may, or may not,
be similar to Randall's application. Whether Randall learns anything from
that discussion, or just tunes it out, is irrelevant. I've already learned
at least one thing from this discussion, so as far as I'm concerned it's a
win.

Randall has suggested that encryption is optional. It isn't clear whether he
means there is an option to turn encryption off, or whether he means I can
hack the application and disable it, or write my own application. I don't
expect him to be responsible for rogue applications that have been hacked
or written independently, which (out of malice or stupidity) don't encrypt
the uploaded data. But I think that it is foolish to support an unencrypted
mode of operation.

It's not unreasonable to raise this issue. The default state of security
among IT professionals is something worse than awful:

https://medium.com/message/everything-is-broken-81e5f33a24e1

One of Australia's largest ISPs recently was hacked, and guess how they
stored their customer's passwords? Yes, you got it: in plain text. There is
no security mistake so foolish that IT professionals won't make it.


> Checksums don't even come into it, we have failed utterly at step 1.

*shrug*

You're right. But having failed at step 1, there are multiple attacks that
can follow. The first attack is the obvious one: the ability to read the
unencrypted data.

If you can trick me into turning encryption off (say, you use a social
engineering attack on me and convince me to delete "the virus crypto.py"),
then I might inadvertently upload unencrypted data to you. Or maybe you
find an attack on the application that can fool it into dropping down to
unencrypted mode. If there's no unencrypted mode in the first place, that's
much harder.

Earlier, Chris suggested that the application might choose to import the
crypto module, and if it's not available, just keep working without
encryption. This hypothetical attack demonstrates that this would be a
mistake. It's hard for an attacker to convince a naive user to open up the
application source code and edit the code. It's easier to convince them to
delete a file.

Or, the application just has a bug in it. It accidentally flips the sense of
the "use encryption" flag. That's a failure mode that simply cannot occur
if there is no such flag in the first place.

If our attacker has managed to disable encryption in the sender's
application, then they can not only read my data, but tamper with it. These
are *separate attacks* with the same underlying cause. I can mitigate one
without mitigating the other.

We can mitigate against the second attack by using a cryptographically
strong hash function to detect tampering. These *are* resistant to preimage
attacks. If I give you a SHA512 checksum, there is no known practical
method to generate a file with that same checksum. If I give you a CRC
checksum, you can.

(Naturally the checksum has to be under the sender's control. If the
receiver has the checksum and the data, they can just replace the checksum
with one of their choosing.)

That's a separate issue from detecting non-malicious data corruption,
although of course a SHA512 checksum will detect that as well.


>> - since you are using CRC, it is quite easy for me to ensure the
>>   checksums match after inserting malware;
> 
> No, you have yet *again* misunderstood the difference between the
> client and the server.

This was described as a peer-to-peer application. You even stated that it
was a "pretty obvious" use-case, a "peer-to-peer dropbox". So is it
peer-to-peer or client-server?

In any case, since Randall has refused to go into specific deta

Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 4:17 AM, Steven D'Aprano  wrote:
> If you can trick me into turning encryption off (say, you use a social
> engineering attack on me and convince me to delete "the virus crypto.py"),
> then I might inadvertently upload unencrypted data to you. Or maybe you
> find an attack on the application that can fool it into dropping down to
> unencrypted mode. If there's no unencrypted mode in the first place, that's
> much harder.
>
> Earlier, Chris suggested that the application might choose to import the
> crypto module, and if it's not available, just keep working without
> encryption. This hypothetical attack demonstrates that this would be a
> mistake. It's hard for an attacker to convince a naive user to open up the
> application source code and edit the code. It's easier to convince them to
> delete a file.

And I'm sure Steven knows about this, but if anyone else isn't
convinced that this is a serious vulnerability, look into various
forms of downgrade attack, such as the recent POODLE. Security doesn't
exist if an attacker can convince your program to turn it off without
your knowledge.

>>> - since you are using CRC, it is quite easy for me to ensure the
>>>   checksums match after inserting malware;
>>
>> No, you have yet *again* misunderstood the difference between the
>> client and the server.
>
> This was described as a peer-to-peer application. You even stated that it
> was a "pretty obvious" use-case, a "peer-to-peer dropbox". So is it
> peer-to-peer or client-server?

I've never managed to get any sort of grasp of what this application
actually *is*, but "peer-to-peer Dropbox" is certainly something that
it *might be*. It could be simultaneously peer-to-peer from the
human's point of view, and client-server from the application's -
imagine BitTorrent protocol, but where one end connects to a socket
that the other's listening on, and the active socket always pushes
data to the passive socket. (With BitTorrent, it's truly symmetrical -
doesn't matter who listens and who connects. But imagine if it weren't
that way.) From the software's point of view, it has two distinct
modes: server, in which it listens on a socket and receives data, and
client, in which it connects to other people's sockets and sends data.
As such, the "server" mode is the only one that receives untrusted
data from another user and stores it on the hard disk.

But this is just one theory of what the program *might* be, based on
what I've gathered in this thread. Or rather, it's a vague theory of
something that's mostly plausible, without necessarily even being
useful.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano  wrote:
> On Tue, 30 Jun 2015 10:19 pm, Jon Ribbens wrote:
>> Eh? The game is over right there. I don't trust you, and yet
>> I have just given you my private data, unencrypted.
>
> Yes. That is exactly the problem. If the application doesn't encrypt the
> data for me, *it isn't going to happen*. We are in violent agreement that
> the sender needs to encrypt the data.

It's a good thing that he's said it will then.

> Randall has suggested that encryption is optional.

No he hasn't. You just keep creatively misreading what he says, for
some reason.

> It's not unreasonable to raise this issue.

It is unreasonable to raise it over and over again however,
especially when there's no reason at all to think it's relevant,
and nothing has changed from the last time you raised it.

> We can mitigate against the second attack by using a cryptographically
> strong hash function to detect tampering.

Not on the server you can't. If the attacker can edit the files he can
edit the hashes too.

> These *are* resistant to preimage attacks. If I give you a SHA512
> checksum, there is no known practical method to generate a file with
> that same checksum. If I give you a CRC checksum, you can.

Randall didn't suggest any usage of CRCs where preimage attacks are
relevant. You just made that bit up.

>>> - since you are using CRC, it is quite easy for me to ensure the
>>>   checksums match after inserting malware;
>> 
>> No, you have yet *again* misunderstood the difference between the
>> client and the server.
>
> This was described as a peer-to-peer application. You even stated that it
> was a "pretty obvious" use-case, a "peer-to-peer dropbox". So is it
> peer-to-peer or client-server?

Both. It sounds a bit like there are clients which upload files
to a cloud of servers which are peers of each other. But seriously,
is this the source of all your confusion? Even if all the nodes
are pure "peers" (which it doesn't sound like they are), any
particular file will still have a source node which is therefore
the "client" for that file. You're trying to draw a hard distinction
where there is none.

>> lol. I suspected as much. You and Johannes were even more wrong than
>> was already obvious.
>
> You "suspected as much"? Such a pity you didn't speak up earlier and
> explain that cryptographic ciphers aren't generally resistant to
> preimage attacks.

I think you're misusing that phrase. But taking what you meant,
I suspected it was true (would they be reistant, after all?)
but I couldn't be bothered to check because the whole "crypto" bit
was a complete red-herring in the first place. The original discussion
wasn't about crypto, all the discussion about that was only because
you and Johannes wrongly insisted it was necessary.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Steven D'Aprano
On Wed, 1 Jul 2015 03:39 am, Randall Smith wrote:

> On 06/29/2015 10:00 PM, Steven D'Aprano wrote:

>> I'm making an assumption here -- I assume that the sender records a
>> checksum for uploaded files so that when they get something back again
>> they can tell whether or not it is the same content they uploaded.
> 
> Yes.  The client software computes sha256 checksums.

Thanks for clarifying.


[...]
>> His comment was, "don't leak the key".
> 
> I'm pretty sure all encryption hinges on guarding the key.

That would be Kerckhoffs' Principle, also known as Shannon's Maxim.

I don't think there has been much research into keeping at least *some*
security even when keys have been compromised, apart from as it relates to
two-factor authentication. (Assume that other people know the password to
your bank account. They can read your balance, but they can't steal your
money unless they first steal your phone or RSA token.)

In the past, and still today among people who don't understand Kerckhoffs'
principle, people have tried to keep the cipher secret and not have a key
at all. E.g. atbash, or caesar cipher, which once upon a time were cutting
edge ciphers, as laughably insecure as they are today. If the method was
compromised, all was lost. 

Today, if the key is compromised, all is lost. Is it possible that there are
ciphers that are resistant to discovery of the key? Obviously if you know
the key you can read encrypted messages, that's what the key is for, but
there are scenarios where you would want security to degrade gracefully
instead of in a brittle all-or-nothing manner:

- even if the attacker can read my messages, he cannot tamper with 
  them or write new ones as me.

(I'm pretty sure that, for example, the military would consider it horrible
if the enemy could listen in on their communications, but *even worse* if
the enemy could send false orders that appear to be legitimate.)

Sixty years ago, the idea of having a separate encryption key that you keep
secret and a decryption key that you can give out to everyone (public key
encryption) probably would have seemed ridiculous too.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 4:59 AM, Steven D'Aprano  wrote:
> Today, if the key is compromised, all is lost. Is it possible that there are
> ciphers that are resistant to discovery of the key? Obviously if you know
> the key you can read encrypted messages, that's what the key is for, but
> there are scenarios where you would want security to degrade gracefully
> instead of in a brittle all-or-nothing manner:
>
> - even if the attacker can read my messages, he cannot tamper with
>   them or write new ones as me.
>
> (I'm pretty sure that, for example, the military would consider it horrible
> if the enemy could listen in on their communications, but *even worse* if
> the enemy could send false orders that appear to be legitimate.)

That would be accomplished by a two-fold enveloping of signing and
encrypting. If I sign something using my private key, then encrypt it
using your public key, someone who's compromised your private key
could snoop and read the message, but couldn't forge a message from
me. Of course, that just means there are lots more secrets to worry
about getting compromised.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Michael Torrie
On 06/30/2015 10:24 AM, jonas.thornv...@gmail.com wrote:
> Den tisdag 30 juni 2015 kl. 18:12:46 UTC+2 skrev Michael Torrie:
>> Do you have some Python code to show us?
> 
> No i just thought you would find the digit search algorithm interesting.

Yeah it is interesting, although I didn't really see an explanation of
your algorithm in your posts.  Did I miss it?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Christian Gollwitzer

Am 30.06.15 um 17:40 schrieb Ian Kelly:

On Tue, Jun 30, 2015 at 3:07 AM, Christian Gollwitzer  wrote:

Concerning the algorithmic complexity, it can't be faster than square time
in the number of digits N. Baseconversion needs to do a sequence of division
operations, where every operation gves you one digit in the new base. The
number of digits in the new base is proportional to the number of digits in
the old base (the ratio is log b1/log b2). Therefore it will be O(N^2).


I don't think that's true. Here's a linear hexadecimal to binary function:


def hextobin(value):

... digits = {'0': '', '1': '0001', '2': '0010', '3': '0011',
... '4': '0100', '5': '0101', '6': '0110', '7': '0111',
... '8': '1000', '9': '1001', 'A': '1010', 'B': '1011',
... 'C': '1100', 'D': '1101', 'E': '1110', 'F': ''}
... return ''.join(digits[d.upper()] for d in value)
...

hextobin('3f')

'0011'

I believe this approach can be extended to arbitrary bases with some
effort, although for converting arbitrary base b1 to b2, you would
need up to b2 different mappings if b1 and b2 are relatively prime.



OK. Show it for bases 2 and 3. It will just be a table of 6 entries, no?

Actually, you showed a very special case, for conversion from base b^4 
to base b. I'm pretty convinced it is not possible for the general case.


Christian

--
https://mail.python.org/mailman/listinfo/python-list


Re: Linear time baseconversion

2015-06-30 Thread Christian Gollwitzer

Am 30.06.15 um 18:34 schrieb Ian Kelly:

On Tue, Jun 30, 2015 at 10:10 AM, Chris Angelico  wrote:

When there's a simple ratio between the bases, it's fairly
straight-forward to convert a few digits at a time. Converting base
256 into base 64, for instance, can be done by taking three digits and
yielding four. But within that, you would still need a complete table
of all sixteen million possibilities, if you want to do the lookup
table. And that only works when there is that kind of relationship.


You're right. I was thinking that for base 5 to base 7, for instance,
one could read digits in groups of 7, but that doesn't work out; you
can't map any discrete number of base 5 digits to a corresponding
number of base 7 digits.


Yes, because there is no n for which 5^n=7^n (except n=0). This gives 
more-or-less the proof that the algorithm must be O(N^2) - at least that 
is my feeling. Fun fact, though: you can convert pi to hexadeicmal base 
without computing the preceding digits:


https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula

Christian

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to debug TypeError: required field "lineno" missing from expr?

2015-06-30 Thread Mark Lawrence

On 29/06/2015 03:44, Steven D'Aprano wrote:

On Mon, 29 Jun 2015 11:14 am, Mark Lawrence wrote:


Purely as an exercise I've been converting Grant Jenks' pypatt[1] from
2.7 to 3.4.  I've managed to sort out most of the required changes by
checking on what I can see with an AST pretty printer[2].  So it's
rather frustrating to have the compile stage throw the error given in
the subject line.


Now now Mark, you know how this works. Where's your COPY and PASTED
traceback? Where's the Short, Self Contained, Correct Example?

http://sscce.org/

Our crystal ball is in no better state than yours :-)


The traceback was effectively that in the subject line.  I'd have 
provided a SSCCE if I'd had the faintest idea of how to produce one, 
which I didn't have a couple of days ago.






The code has a call to the ast fix_missing_locations function.  I'm
aware of the Armin Ronacher tweet[3] stating


"TypeError: required field "lineno" missing from stmt" — no, what you
actually mean is "tuple is not a statement'.


but I don't think this is the problem.


That's what I dislike about Twitter. There's no context, and no way to tell
what Armin is talking about. I've never seen this error, and cannot imagine
how to get it, or what he means by "tuple is not a statement" for that
matter. A tuple can be a statement:

if condition:
 (1, 2)  # Construct a tuple, then throw it away.



Still I've clearly managed to
screw something up somewhere along so line, so any bright ideas as to
how I can proceed?

Further would it be worth raising an enhancement request to get some
better diagnostics from the built-in compile function,


If you think it is worth waiting months or years before you can actually use
those enhanced diagnostics, sure. What sort of diagnostics?

Peering into my crystal ball, I see that you're maybe generating some code
dynamically, then running something like:

 code = compile(the_text, something, something)

and that's giving you an error. What happens if you do this?

 print(the_text)
 code = compile(the_text, something, something)


Can you extract the offending line that way? What happens if you copy and
paste the_text into a Python file, and then try to import or run that file?



I managed to find the problem with the aid of 
https://github.com/simonpercivall/astunparse which allowed me to compare 
the 2.7 output  with the 3.4 and pinpoint precisely where I'd gone wrong.


Thanks anyway for your reply and all the others, it certainly got my 
brain engaged in a logical manner.  I might not be an expert on Python 
ASTs but I've certainly enjoyed playing with them.  All I've got to do 
now is correct the next problem in the pipeline.  Maybe I'll bore you 
all with that tomorrow :)


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pure Python Data Mangling or Encrypting

2015-06-30 Thread Jon Ribbens
On 2015-06-30, Steven D'Aprano  wrote:
> I don't think there has been much research into keeping at least *some*
> security even when keys have been compromised, apart from as it relates to
> two-factor authentication.

That's because "the key" is all the secret part. If an attacker knows
the algorithm, and the key, and the ciphertext, then *by definition*
all is lost. If you mean keeping the algorithm secret too then that's
just considered bad crypto.

> In the past, and still today among people who don't understand Kerckhoffs'
> principle, people have tried to keep the cipher secret and not have a key
> at all. E.g. atbash, or caesar cipher, which once upon a time were cutting
> edge ciphers, as laughably insecure as they are today. If the method was
> compromised, all was lost. 

Caesar cipher has a key. It's just very small, so is easy to guess.

> Today, if the key is compromised, all is lost. Is it possible that there are
> ciphers that are resistant to discovery of the key? Obviously if you know
> the key you can read encrypted messages, that's what the key is for, but
> there are scenarios where you would want security to degrade gracefully
> instead of in a brittle all-or-nothing manner:
>
> - even if the attacker can read my messages, he cannot tamper with 
>   them or write new ones as me.

I suppose that could be achieved by having separate encryption and
signing keys, but you could do the same but better by encrypting
with multiple algorithms. It's not an unstudied area:
https://en.wikipedia.org/wiki/Multiple_encryption
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3 resuma a file download

2015-06-30 Thread Cameron Simpson

On 30Jun2015 08:34, zljubi...@gmail.com  wrote:

I would like to download a file (http://video.hrt.hr/2906/otv296.mp4)
If the connection is OK, I can download the file with:

import urllib.request
urllib.request.urlretrieve(remote_file, local_file)

Sometimes when I am connected on week wireless (not mine) network I get 
WinError 10054 exception (windows 7).

When it happens, I would like to resume download instead of doing everything 
from very beginning.

How to do that?

I read about Range header and chunks, but this server doesn't have any headers.

What options do I have with this particular file?


You need to use a Range: header. I don't know what you mean when you say "this 
server doesn't have any headers". All HTTP requests and responses use headers.  
Possibly you mean you code isn't setting any headers.


What you need to do is separate your call to urlretrieve into a call to 
construct a Request object, add a Range header, then fetch the URL using the 
Request object, appending the results (if successful) to the end of your local 
file.


If you go to:

 
https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve

and scroll up you will find example code doing that kind of thing in the 
examples above.


Cheers,
Cameron Simpson 

The British Interplanetary Society? How many planets are members then?
   - G. Robb
--
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing logfile with multi-line loglines, separated by timestamp?

2015-06-30 Thread Victor Hooi
Aha, cool, that's a good idea =) - it seems I should spend some time getting to 
know generators/iterators.

Also, sorry if this is basic, but once I have the "block" list itself, what is 
the best way to parse each relevant line?

In this case, the first line is a timestamp, the next two lines are system 
stats, and then a newline, and then one line for each block device.

I could just hardcode in the lines, but that seems ugly:

  for block in parse_iostat(f):
  for i, line in enumerate(block):
  if i == 0:
  print("timestamp is {}".format(line))
  elif i == 1 or i == 2:
  print("system stats: {}".format(line))
  elif i >= 4:
  print("disk stats: {}".format(line))

Is there a prettier or more Pythonic way of doing this?

Thanks,
Victor

On Wednesday, 1 July 2015 02:03:01 UTC+10, Chris Angelico  wrote:
> On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro  
> wrote:
> > Maybe define a class which wraps a file-like object. Its next() method (or
> > is it __next__() method?) can just buffer up lines starting with one which
> > successfully parses as a timestamp, accumulates all the rest, until a blank
> > line or EOF is seen, then return that, either as a list of strings, one
> > massive string, or some higher level representation (presumably an instance
> > of another class) which represents one "paragraph" of iostat output.
> 
> next() in Py2, __next__() in Py3. But I'd do it, instead, as a
> generator - that takes care of all the details, and you can simply
> yield useful information whenever you have it. Something like this
> (untested):
> 
> def parse_iostat(lines):
> """Parse lines of iostat information, yielding ... something
> 
> lines should be an iterable yielding separate lines of output
> """
> block = None
> for line in lines:
> line = line.strip()
> try:
> tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
> if block: yield block
> block = [tm]
> except ValueError:
> # It's not a new timestamp, so add it to the existing block
> block.append(line)
> if block: yield block
> 
> This is a fairly classic line-parsing generator. You can pass it a
> file-like object, a list of strings, or anything else that it can
> iterate over; it'll yield some sort of aggregate object representing
> each time's block. In this case, all it does is append strings to a
> list, so this will result in a series of lists of strings, each one
> representing a single timestamp; you can parse the other lines in any
> way you like and aggregate useful data. Usage would be something like
> this:
> 
> with open("logfile") as f:
> for block in parse_iostat(f):
> # do stuff with block
> 
> This will work quite happily with an ongoing stream, too, so if you're
> working with a pipe from a currently-running process, it'll pick stuff
> up just fine. (However, since it uses the timestamp as its signature,
> it won't yield anything till it gets the *next* timestamp. If the
> blank line is sufficient to denote the end of a block, you could
> change the loop to look for that instead.)
> 
> Hope that helps!
> 
> ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing logfile with multi-line loglines, separated by timestamp?

2015-06-30 Thread Chris Angelico
On Wed, Jul 1, 2015 at 2:06 PM, Victor Hooi  wrote:
> Aha, cool, that's a good idea =) - it seems I should spend some time getting 
> to know generators/iterators.
>
> Also, sorry if this is basic, but once I have the "block" list itself, what 
> is the best way to parse each relevant line?
>
> In this case, the first line is a timestamp, the next two lines are system 
> stats, and then a newline, and then one line for each block device.
>
> I could just hardcode in the lines, but that seems ugly:
>
>   for block in parse_iostat(f):
>   for i, line in enumerate(block):
>   if i == 0:
>   print("timestamp is {}".format(line))
>   elif i == 1 or i == 2:
>   print("system stats: {}".format(line))
>   elif i >= 4:
>   print("disk stats: {}".format(line))
>
> Is there a prettier or more Pythonic way of doing this?

This is where you get into the nitty-gritty of writing a text parser.
Most of the work is in figuring out exactly what pieces of information
matter to you. I recommend putting most of the work into the
parse_iostat() function, and then yielding some really nice tidy
package that can be interpreted conveniently.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Matplotlib X-axis timezone trouble

2015-06-30 Thread Peter Pearson
On 30 Jun 2015 00:56:26 GMT, Peter Pearson  wrote:
> The following code produces a plot with a line running from (9:30, 0) to
> (10:30, 1), not from (8:30, 0) to (9:30, 1) as I desire.
>
> If I use timezone None instead of pacific, the plot is as desired, but
> of course that doesn't solve the general problem of which this is a
> much-reduced example.
>
> If I use timezone US/Central, I get the same (bad) plot.
>
> import matplotlib.pyplot as plt
> import datetime
> import pytz
> pacific = pytz.timezone("US/Pacific")
> fig = plt.figure()
> plt.plot([datetime.datetime(2014, 10, 7, 8, 30, tzinfo=pacific),
>   datetime.datetime(2014, 10, 7, 9, 30, tzinfo=pacific)],
>  [0,1], marker="o", color="green")
> fig.autofmt_xdate()
> plt.show()
>
> Does anybody know why this shift is occurring?  Is Matplotlib
> confused about what timezone to use in labeling the axis?  How
> would I tell it what timezone to use (preferably explicitly in
> the code, not in matplotlibrc)?

Progress report:

I might be wrong in blaming the axis formatting.  It looks as if the
datetimes themselves are being created wrong.

https://docs.python.org/2/library/datetime.html gives an
example like this:

>>> # Daylight Saving Time
>>> dt1 = datetime(2006, 11, 21, 16, 30, tzinfo=gmt1)
>>> dt1.dst()
datetime.timedelta(0)
>>> dt1.utcoffset()
datetime.timedelta(0, 3600)
>>> dt2 = datetime(2006, 6, 14, 13, 0, tzinfo=gmt1)
>>> dt2.dst()
datetime.timedelta(0, 3600)
>>> dt2.utcoffset()
datetime.timedelta(0, 7200)

... implying that adjustment for DST is made during the datetime constructor.
But look:

>>> from datetime import datetime
>>> import pytz
>>> pacific = pytz.timezone("US/Pacific")
>>> dt1 = datetime(2006, 11, 21, 16, 30, tzinfo=pacific) # no DST
>>> dt2 = datetime(2006, 6, 14, 13, 0, tzinfo=pacific)   # yes DST
>>> dt1.dst()
datetime.timedelta(0)
>>> dt2.dst()
datetime.timedelta(0)
>>> dt1.utcoffset()
datetime.timedelta(-1, 57600)
>>> dt2.utcoffset()
datetime.timedelta(-1, 57600)

The dst() values are equal, and the utcoffset() values are equal, even
though one datetime is during DST and the other is not -- exactly the
opposite of the example.

The debugging tool pdb can't step into datetime.datetime(), so I'm
kinda stuck here.

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list