date:20150625

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano

On Thursday 25 June 2015 14:07, Steven D'Aprano wrote:

>> You got it.  I didn't want to explain any more than necessary.  But yes,
>> the recipient just stores the data for the end-user.
> 
> Trust me. That's not all they are doing.

Hmm, sorry, that's a glib answer.

What I meant to say is, you can't *trust* that this is all they are doing, 
not unless all your users are within a single organisation where everyone 
trusts everyone else.

Obviously some people are more trustworthy, or less inquisitive, than 
others. But you don't know which ones are which.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Looking for people who are using Hypothesis and are willing to say so

2015-06-25 Thread David MacIver

Actually one of the things that's helped the most in the course of
designing Hypothesis is the realisation that types are something of a red
herring for this sort of testing. Thinking purely in terms of custom
generators helps a lot, because it means you can do things like specify
size bounds on lists, integers, etc. as well as map and filter over the
resulting data (e.g. lists(integers().map(lambda x: x * 2), min_size=1,
max_size=10). Lists of length between 1 and 10 only containing even
integers. So in this regard the design of Hypothesis should be considered
more closely related to that of the Erlang than the Haskell quickcheck
(although I'd only ever used statically typed quickchecks before writing
Hypothesis).

In particular Hypothesis's strategies are best thought of in terms of how
to provide data as the argument rather than a type - you can't check
whether a given value is producable from a given strategy.

I was originally thinking it would be worth writing some auto derivation
functionality for using the new python 3 type annotations, but I actually
don't think it would be useful. You lose far too much flexibility.

On 24 June 2015 at 23:13, Paul Rubin  wrote:

> David MacIver  writes:
> > Author of Hypothesis here. (If you don't know what Hypothesis is, you're
> > probably not the target audience for this email but you should totally
> > check it out: https://hypothesis.readthedocs.org/
>
> Oh very cool: a QuickCheck-like unit test library.  I heard of something
> like that for Python recently, that might or might not have been
> Hypothesis.  I certainly plan to try it out.  The original QuickCheck
> (for Haskell) used the static type signatures on the functions under
> test to know what test cases to generate, but Erlang QuickCheck has had
> some good successes, including finding some subtle bugs during
> development in the HAMT (Clojure-like hash array mapped trie)
> implementation just released with Erlang/OTP 18.0 this week.
>
> I see Hypothesis use decorators that look sort of like Erlang Dialyzer
> so that can help with test cases.  Maybe later, it use Python 3 type
> annotations, though I think those are still much less precise than
> Dialyzer or Haskell types.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

windows and file names > 256 bytes

2015-06-25 Thread Albert-Jan Roskam

Hi,

Consider the following calls, where very_long_path is more than 256 bytes:
[1] os.mkdir(very_long_path)
[2] os.getsize(very_long_path)
[3] shutil.rmtree(very_long_path)

I am using Python 2.7 and [1] and [2] fail under Windows XP [3] fails 
under Win7 (not sure about XP). It throws: “WindowsError: [Error 206] The 
filename or extension is too long” This is even when I use the "special" 
notations \\?\c:\dir\file or \\?\UNC\server\share\file, e.g.
os.path.getsize("?\\" + "c:\\dir\\file")
(Oddly, os.path.getsize(os.path.join("?", "c:\\dir\\file")) will 
truncate the prefix)

My questions:
1. How can I get the file size of very long paths under XP?
2. Is this a bug in Python? I would prefer if Python dealt with the gory 
details of Windows' silly behavior.

Regards,
Albert-Jan

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Chris Angelico

On Thu, Jun 25, 2015 at 7:16 PM, Steven D'Aprano
 wrote:
>> 2. Is this a bug in Python? I would prefer if Python dealt with the gory
>> details of Windows' silly behavior.
>
> I would say that it is a bug that it doesn't work with extended-length paths
> (those starting with \\?\) but may or may not be a bug with regular paths.

I'd go further and say that the OP is right in expecting Python to
deal with the gory details. Would it break anything for Python to
prepend \\?\ to all file names before giving them to certain APIs?
Then the current behaviour of stripping off that prefix would be fine.

Are there any times when you *don't* want Windows to use the
extended-length path?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Steven D'Aprano

On Thursday 25 June 2015 18:00, Albert-Jan Roskam wrote:

> Hi,
> 
> Consider the following calls, where very_long_path is more than 256 bytes:
> [1] os.mkdir(very_long_path)
> [2] os.getsize(very_long_path)
> [3] shutil.rmtree(very_long_path)
> 
> I am using Python 2.7 and [1] and [2] fail under Windows XP [3] fails
> under Win7 (not sure about XP). It throws: “WindowsError: [Error 206] The
> filename or extension is too long” 

I don't think this is a bug. It seems to be a limitation of Windows.

https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#maxpath

> This is even when I use the "special"
> notations \\?\c:\dir\file or \\?\UNC\server\share\file, e.g.
> os.path.getsize("?\\" + "c:\\dir\\file")

However, that may be a bug.

What happens if you use a Unicode string?

path = u"?\\c:a\\very\\long\\path"
os.mkdir(path)


Can you open an existing file?

open(u"?\\c:a\\very\\long\\path\\file.txt")



> (Oddly, os.path.getsize(os.path.join("?", "c:\\dir\\file")) will
> truncate the prefix)

That's worth reporting as a bug.


> My questions:
> 1. How can I get the file size of very long paths under XP?

If all else fails:

last = os.getcwd()
try:
os.chdir('C:/a/very/long')
os.chdir('path/with/many')
os.chdir('nested/folders')
os.path.getsize('/and/even/more/file.txt')
finally:
os.chdir(last)


> 2. Is this a bug in Python? I would prefer if Python dealt with the gory
> details of Windows' silly behavior.

I would say that it is a bug that it doesn't work with extended-length paths 
(those starting with \\?\) but may or may not be a bug with regular paths.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano

On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:

> On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano 
> wrote:
>> But just sticking to the three above, the first one is partially
>> mitigated by allowing virus scanners to scan the data, but that implies
>> that the owner of the storage machine can spy on the files. So you have a
>> conflict here.
> 
> If it's encrypted malware, and you can't decrypt it, there's no threat.

If the *only* threat is that the sender will send malware, you can mitigate 
around that by dropping the file in an unencrypted container. Anything good 
enough to prevent Windows from executing the code, accidentally or 
deliberately, say, a tar file with a custom extension.

But encrypting the file is also a good solution, and it prevents the storage 
machine spying on the file contents too. Provided the encryption is strong.

>> Honestly, the *only* real defence against the spying issue is to encrypt
>> the files. Not obfuscate them with a lousy random substitution cipher.
>> The storage machine can keep the files as long as they like, just by
>> making a copy, and spend hours bruteforcing them. They *will* crack the
>> substitution cipher. In pure Python, that may take a few days or weeks;
>> in C, hours or days. If they have the resources to throw at it, minutes.
>> Substitution ciphers have not been effective encryption since, oh, the
>> 1950s, unless you use a one-time pad. Which you won't be.
> 
> The original post said that the sender will usually send files they
> encrypted, unless they are malicious. So if the sender wants them to
> be encrypted, they already are.

The OP *hopes* that the sender will encrypt the files. I think that's a 
vanishingly faint hope, unless the application itself encrypts the file.

Most people don't have any encryption software beyond password-protecting 
zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools 
available to break it. Winzip has an extension for 128-bit and 256-bit AES 
encryption, both of which are probably strong enough unless you're targeted 
by the NSA, but the weak link in the chain is the idea that people will 
encrypt the software before sending it. Even if they have the tools, 
laziness being the defining characteristic of most people, they won't use 
them.

> "While the data senders are supposed to encrypt data, that's not
> guaranteed, and I'd like to protect the recipient against exposure to
> nefarious data by mangling or encrypting the data before it is written
> to disk."
> 
> The cipher is just to keep the sender from being able to control what
> is on disk.

The sender has a copy of the application? Then they can see the type of 
obfuscation used. If they know the key, or can guess it, they can take their 
malware, *decrypt* it, and send that, so that *encrypting* that file puts 
the malicious code on the disk.

E.g. suppose I want to send you an insult, but I know your program 
automatically ROT-13s the strings I send you. Then I send you:

'lbhe sngure fzryyf bs ryqreoreevrf'

and your program ROT-13s it to:

'your father smells of elderberries'

I know that the OP doesn't propose using ROT-13, but a classical 
substitution cipher isn't that much stronger.

> I am usually very oppositional when it comes to rolling your own
> crypto, but am I alone here in thinking the OP very clearly laid out
> their case?

I don't think any of us *really* understand his use-case or the potential 
threats, but to my way of thinking, you can never have too strong a cipher 
or underestimate the risk of users taking short-cuts.

-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Devin Jeanpierre

On Thu, Jun 25, 2015 at 2:25 AM, Steven D'Aprano
 wrote:
> On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
>> The original post said that the sender will usually send files they
>> encrypted, unless they are malicious. So if the sender wants them to
>> be encrypted, they already are.
>
> The OP *hopes* that the sender will encrypt the files. I think that's a
> vanishingly faint hope, unless the application itself encrypts the file.
>
> Most people don't have any encryption software beyond password-protecting
> zip files. Zip 2.0 legacy encryption is crap, and there are plenty of tools
> available to break it. Winzip has an extension for 128-bit and 256-bit AES
> encryption, both of which are probably strong enough unless you're targeted
> by the NSA, but the weak link in the chain is the idea that people will
> encrypt the software before sending it. Even if they have the tools,
> laziness being the defining characteristic of most people, they won't use
> them.

You're right, I was supposing that since they wrote the server, they
also wrote the client, and were just protecting from the protocol
itself being weak.

> I know that the OP doesn't propose using ROT-13, but a classical
> substitution cipher isn't that much stronger.

Yes, it is. It requires the attacker being able to see something about
the ciphertext, unlike ROT13. But it is reasonable to suppose that
maybe the attacker can trigger the file getting executed, at which
point maybe you can deduce from the behavior what the starting bytes
are...?

> I don't think any of us *really* understand his use-case or the potential
> threats, but to my way of thinking, you can never have too strong a cipher
> or underestimate the risk of users taking short-cuts.

This is truth. It would be nice if something like keyczar came in the stdlib.

(Otherwise, users of Python take shortcuts and use randomized
substitution ciphers instead of AES.)

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list

404 Error when using local CGI Server

2015-06-25 Thread Charles Carr

I am running a local cgi server from python on a windows 7 computer.
Whenever I try to serve the output of a cgi file by entering the following
into my browser: http://localhost:8080/filename.py , I get a 404 error
message that the file was not found. I'm positive that the files I am
trying to serve are in the same directory as the server script that is
running. Are there any tips as to where I should save my files in order to
avoid this error?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico

On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre
 wrote:
>> I know that the OP doesn't propose using ROT-13, but a classical
>> substitution cipher isn't that much stronger.
>
> Yes, it is. It requires the attacker being able to see something about
> the ciphertext, unlike ROT13. But it is reasonable to suppose that
> maybe the attacker can trigger the file getting executed, at which
> point maybe you can deduce from the behavior what the starting bytes
> are...?
>

If a symmetric cipher is being used and the key is known, anyone can
simply perform a decryption operation on the desired bytes, get back a
pile of meaningless encrypted junk, and submit that. When it's
encrypted with the same key, voila! The cleartext will reappear.

Asymmetric ciphers are a bit different, though. AIUI you can't perform
a decryption without the private key, whereas you can encrypt with
only the public key. So you ought to be safe on that one; the only way
someone could deliberately craft input that, when encrypted with your
public key, produces a specific set of bytes, would be to brute-force
it. (But I might be wrong on that. I'm no crypto expert.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

[no subject]

2015-06-25 Thread Knss Teja via Python-list

I WANT TO install 4.3  version ... but the MSI file is giving a DLL error .. 
what should I do :/
please use REPLY ALL .. so that I get the mail to my gmail inbox-- 
https://mail.python.org/mailman/listinfo/python-list

Python realted to question

2015-06-25 Thread 문건희




	
	

Hello,
 
I'm Korean and Software Developer.
 
I have a question.
 
Two raspberryPi2 Model B was connected to a socket to communicate with each other.
 
I want to send the video by using the communication.
 
Method of transmitting a text file is known. However, the video file is not sent.
 
So, Please tell me how to transfer a video file...
 I'm sorry I did not get to speak English well. 
 
So I look forward to answers.
 
Thanks.





 문 건 희 (Moon, Geon-Hee) 연구원 Daekyung Electri & Communication CO. Ltd 지번 주소 :대전광역시 서구 월평동 68번지 (주)대경이앤씨 도로명 주소 : 대전 서구 월드컵대로 484번길 147-42 (주)대경이앤씨 Tel : 042-525-3568 FAX : 042-525-3569 H.P: 010-3347-0742 E-mail: skyjjog...@hanmail.net mgh1...@gmail.com Website: http://www.dkenc.com


 

-- 
https://mail.python.org/mailman/listinfo/python-list

ANN: eGenix mxODBC 3.3.3 - Python ODBC Database Interface

2015-06-25 Thread eGenix Team: M.-A. Lemburg



ANNOUNCING

 eGenix.com mxODBC

   Python ODBC Database Interface

   Version 3.3.3


mxODBC is our commercially supported Python extension providing
 ODBC database connectivity to Python applications
on Windows, Mac OS X, Unix and BSD platforms
   with many advanced Python DB-API extensions and
 full support of stored procedures


This announcement is also available on our web-site for online reading:
http://www.egenix.com/company/news/eGenix-mxODBC-3.3.3-GA.html



INTRODUCTION

mxODBC provides an easy-to-use, high-performance, reliable and robust
Python interface to ODBC compatible databases such as MS SQL Server,
Oracle Database, IBM DB2, Informix and Netezza, SAP Sybase ASE and
Sybase Anywhere, Teradata, MySQL, MariaDB, PostgreSQL, SAP MaxDB and
many more:

http://www.egenix.com/products/python/mxODBC/

The "eGenix mxODBC - Python ODBC Database Interface" product is a
commercial extension to our open-source eGenix mx Base Distribution:

http://www.egenix.com/products/python/mxBase/



NEWS

The 3.3.3 release of our mxODBC is a patch level release of our
popular Python ODBC Interface for Windows, Linux, Mac OS X and
FreeBSD. It includes these enhancements and fixes:

Driver Compatibility


MS SQL Server

 MS SQL Server Native Client

 * Added a fix for the MS SQL Server Native Client error
   "[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]The data
   types varchar and text are incompatible in the equal to operator."
   when trying to bind a string of more than 256 bytes to a *VARCHAR*
   column while using cursor.executedirect(). cursor.execute() was
   unaffected by this. Thanks to Paul Perez for reporting this.

 * Added a note to *avoid using "execute "* when calling stored
   procedures with MS SQL Server. This can result in '[Microsoft][SQL
   Native Client]Invalid Descriptor Index' errors. Simply dropping the
   "execute " will have the error go away.

 FreeTDS ODBC Driver

 * Added a work-around to address the FreeTDS driver error
   '[FreeTDS][SQL Server]The data types varbinary and image are
   incompatible in the equal to operator.' when trying to bind binary
   strings longer than 256 bytes to a *VARBINARY* column. This problem
   does not occur with the MS SQL Server Native Client.

 * Reenabled returning *cursor.rowcount* for FreeTDS >= 0.91. In
   previous versions, FreeTDS could return wrong data for .rowcount
   when using SELECTs.This should make *SQLAlchemy* users happy again.

 * Add work-around to have FreeTDS ODBC driver accept *binary data* in
   strings as input for VARBINARY columns. A side effect of this is
   that FreeTDS will now also accept binary data in VARCHAR columns.

SAP Sybase ASE

 * Added work-arounds and improvements for Sybase ASE ODBC drivers to
   enable working with *BINARY* and *VARBINARY* columns.

 * Added a work-around for a *cursor.rowcount* problem with Sybase ASE's
   ODBC driver on 64-bit platforms. It sometimes returns 4294967295
   instead of -1.

 * Added note about random segfault problems with the
   *Sybase ASE 15.7 ODBC driver* on Windows. Unfortunately, there's
   nothing much we can do about this, other than recommend using the
   Sybase ASE 15.5 ODBC driver version which does not have these
   stability problems.

Misc:

 * Added improved documentation on the *direct execution model*
   available in mxODBC. This can help in more complex parameter
   binding situations and also provides performance boosts for a few
   databases, including e.g. MS SQL Server.

 * Improved tests and added more data binding tests, esp. for SELECT
   queries with bound parameters.

 * Fixed some minor issues with the *package web installer* related to
   Linux2 vs. Linux3, FreeBSD installations and an intermittent error
   related to hash seeds, which sometimes caused prebuilt archives to
   not install correctly.

For the full set of changes please check the mxODBC change log:

http://www.egenix.com/products/python/mxODBC/changelog.html



FEATURES

mxODBC 3.3 was released on 2014-04-08. Please see the full
announcement for highlights of the 3.3 release:

http://www.egenix.com/company/news/eGenix-mxODBC-3.3.0-GA.html

For the full set of features mxODBC has to offer, please see:

http://www.egenix.com/products/python/mxODBC/#Features



EDITIONS

mxODBC is available in these two editions:

 * The Professional Edition, which gives full access to all mxODBC features.

 * The Product Development Edition, which allows including mxODBC in
   applications yo

Re: 404 Error when using local CGI Server

2015-06-25 Thread Chris Angelico

On Thu, Jun 25, 2015 at 3:48 AM, Charles Carr  wrote:
> I am running a local cgi server from python on a windows 7 computer.
> Whenever I try to serve the output of a cgi file by entering the following
> into my browser: http://localhost:8080/filename.py , I get a 404 error
> message that the file was not found. I'm positive that the files I am trying
> to serve are in the same directory as the server script that is running. Are
> there any tips as to where I should save my files in order to avoid this
> error?

It depends entirely on how your server is set up. What I would
recommend is completely ignoring the file system, and designing a web
site using one of the frameworks that are available for Python, such
as Flask or Django. Your URLs are defined in your code; you can have a
'static' directory from which simple files (images, CSS, etc) get
served, but the file system does not define executable entry points.
This avoids the *massive* problems of PHP, where an attacker can
upgrade a file delivery exploit into remote code execution; the worst
that can happen with Python+Flask+static is that the file gets
uploaded into static/ and is then available as-is for download (it
won't be run on the server).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Jon Ribbens

On 2015-06-25, Steven D'Aprano  wrote:
> On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
>> If it's encrypted malware, and you can't decrypt it, there's no threat.
>
> If the *only* threat is that the sender will send malware, you can mitigate 
> around that by dropping the file in an unencrypted container. Anything good 
> enough to prevent Windows from executing the code, accidentally or 
> deliberately, say, a tar file with a custom extension.

That won't stop virus scanners etc potentially making their own minds
up about the file.

> But encrypting the file is also a good solution, and it prevents the storage 
> machine spying on the file contents too. Provided the encryption is strong.

How would the receiver encrypting the file after receiving it prevent
the receiver from seeing what's in the file?

>> The original post said that the sender will usually send files they
>> encrypted, unless they are malicious. So if the sender wants them to
>> be encrypted, they already are.
>
> The OP *hopes* that the sender will encrypt the files. I think that's a 
> vanishingly faint hope, unless the application itself encrypts the file.

Yes, the application itself encrypts the file. Haven't you been
reading what he's saying?

> The sender has a copy of the application? Then they can see the type of 
> obfuscation used. If they know the key, or can guess it, they can take their 
> malware, *decrypt* it, and send that, so that *encrypting* that file puts 
> the malicious code on the disk.

Not if they don't know the key they can't.

> E.g. suppose I want to send you an insult, but I know your program 
> automatically ROT-13s the strings I send you. Then I send you:
>
> 'lbhe sngure fzryyf bs ryqreoreevrf'
>
> and your program ROT-13s it to:
>
> 'your father smells of elderberries'
>
> I know that the OP doesn't propose using ROT-13, but a classical 
> substitution cipher isn't that much stronger.

Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
receiver, and suddenly it's not such a bad method of obfuscation.
Improve it to the random-translation-map method he's actually using
and you've got really quite a reasonable system.

>> I am usually very oppositional when it comes to rolling your own
>> crypto, but am I alone here in thinking the OP very clearly laid out
>> their case?
>
> I don't think any of us *really* understand his use-case or the potential 
> threats, but to my way of thinking, you can never have too strong a cipher 
> or underestimate the risk of users taking short-cuts.

The use case is pretty obvious (a peer-to-peer dropbox type thing) but
it does appear to be being misunderstood. This isn't actually a crypto
problem at all and "users taking short-cuts" isn't an issue.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Tim Golden

On 25/06/2015 10:23, Chris Angelico wrote:
> On Thu, Jun 25, 2015 at 7:16 PM, Steven D'Aprano
>  wrote:
>>> 2. Is this a bug in Python? I would prefer if Python dealt with the gory
>>> details of Windows' silly behavior.
>>
>> I would say that it is a bug that it doesn't work with extended-length paths
>> (those starting with \\?\) but may or may not be a bug with regular paths.
> 
> I'd go further and say that the OP is right in expecting Python to
> deal with the gory details. Would it break anything for Python to
> prepend \\?\ to all file names before giving them to certain APIs?
> Then the current behaviour of stripping off that prefix would be fine.
> 
> Are there any times when you *don't* want Windows to use the
> extended-length path?

Yes: when you're passing a relative filepath. Which could pretty much be
any time. As you might imagine, this has come up before -- there's an
issue on the tracker for it somewhere. I just don't think it's simple
enough for Python to know when and when not to use the extended path
syntax without danger of breaking something.

Bear in mind that the \\?\ prefix doesn't just extend the length: it
also allows otherwise special-cased characters such as "." or "..". It's
a general-purpose mechanism for handing something straight to the file
system without parsing it first.

TJG

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Joonas Liik

Personally, i have had AVG give at least 2 false positives (fyi one of
them was like python2.6)

as long as antivirus software can give so many false positives i would
thing preventing your AV from nuking someone elses data is a
reasonable thing.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Devin Jeanpierre

On Thu, Jun 25, 2015 at 2:57 AM, Chris Angelico  wrote:
> On Thu, Jun 25, 2015 at 7:41 PM, Devin Jeanpierre
>  wrote:
>>> I know that the OP doesn't propose using ROT-13, but a classical
>>> substitution cipher isn't that much stronger.
>>
>> Yes, it is. It requires the attacker being able to see something about
>> the ciphertext, unlike ROT13. But it is reasonable to suppose that
>> maybe the attacker can trigger the file getting executed, at which
>> point maybe you can deduce from the behavior what the starting bytes
>> are...?
>>
>
> If a symmetric cipher is being used and the key is known, anyone can
> simply perform a decryption operation on the desired bytes, get back a
> pile of meaningless encrypted junk, and submit that. When it's
> encrypted with the same key, voila! The cleartext will reappear.
>
> Asymmetric ciphers are a bit different, though. AIUI you can't perform
> a decryption without the private key, whereas you can encrypt with
> only the public key. So you ought to be safe on that one; the only way
> someone could deliberately craft input that, when encrypted with your
> public key, produces a specific set of bytes, would be to brute-force
> it. (But I might be wrong on that. I'm no crypto expert.)

Yes, so it should be random.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Chris Angelico

On Thu, Jun 25, 2015 at 8:10 PM, Tim Golden  wrote:
>> Are there any times when you *don't* want Windows to use the
>> extended-length path?
>
> Yes: when you're passing a relative filepath. Which could pretty much be
> any time. As you might imagine, this has come up before -- there's an
> issue on the tracker for it somewhere. I just don't think it's simple
> enough for Python to know when and when not to use the extended path
> syntax without danger of breaking something.

Oh blah. So I suppose that means there's fundamentally no way to use a
long (>256 byte) relative path on Windows?

> Bear in mind that the \\?\ prefix doesn't just extend the length: it
> also allows otherwise special-cased characters such as "." or "..". It's
> a general-purpose mechanism for handing something straight to the file
> system without parsing it first.

Ohh. So... hmm. So what this really means is that a path could get
\\?\ prepended when, and ONLY when, it becomes absolute. Windows can
be a real pest...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Windows install error

2015-06-25 Thread Mark Lawrence


On 24/06/2015 16:56, Knss Teja via Python-list wrote:

I WANT TO install 4.3 version ... but the MSI file is giving a DLL error
.. what should I do :/
please use REPLY ALL .. so that I get the mail to my gmail inbox



I'll assume that you mean 3.4.x.  Please give the x, your Windows 
version and the precise error message that you're getting, then we 
should be able to help.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Mark Lawrence


On 25/06/2015 09:00, Albert-Jan Roskam wrote:

Hi,

Consider the following calls, where very_long_path is more than 256 bytes:
[1] os.mkdir(very_long_path)
[2] os.getsize(very_long_path)
[3] shutil.rmtree(very_long_path)

I am using Python 2.7 and [1] and [2] fail under Windows XP [3] fails
under Win7 (not sure about XP). It throws: “WindowsError: [Error 206] The
filename or extension is too long” This is even when I use the "special"
notations \\?\c:\dir\file or \\?\UNC\server\share\file, e.g.
os.path.getsize("?\\" + "c:\\dir\\file")
(Oddly, os.path.getsize(os.path.join("?", "c:\\dir\\file")) will
truncate the prefix)

My questions:
1. How can I get the file size of very long paths under XP?


Please see 
https://msdn.microsoft.com/en-gb/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath



2. Is this a bug in Python? I would prefer if Python dealt with the gory
details of Windows' silly behavior.


I don't see why Python should work around any particular limitation of 
any given OS.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Joonas Liik

It sounds to me more like it is possible to use long file names on windows
but it is a pain and in python, on windows it is basically impossible.

So shouldn't it be possible to maniulate these files with extended names..

I mean even if you had to use some special function to ask for long names
it would still be better than no support at all.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Tim Golden

On 25/06/2015 13:04, Joonas Liik wrote:
> It sounds to me more like it is possible to use long file names on windows
> but it is a pain and in python, on windows it is basically impossible.

Certainly not impossible: you could write your own wrapper function:

def extended_path(p):
return r"\\?\%s" % os.path.abspath(p)

where you knew that there was a possibility of long paths and that an
absolute path would work.

TJG
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Windows install error

2015-06-25 Thread Laura Creighton

In a message of Thu, 25 Jun 2015 11:58:09 +0100, Mark Lawrence writes:
>On 24/06/2015 16:56, Knss Teja via Python-list wrote:
>> I WANT TO install 4.3 version ... but the MSI file is giving a DLL error
>> .. what should I do :/
>> please use REPLY ALL .. so that I get the mail to my gmail inbox
>>
>
>I'll assume that you mean 3.4.x.  Please give the x, your Windows 
>version and the precise error message that you're getting, then we 
>should be able to help.
>
>-- 
>My fellow Pythonistas, ask not what our language can do for you, ask
>what you can do for our language.
>
>Mark Lawrence

Note that some people I know of, via webmaster, found that they
could install 3.4.x with the ActiveState installer but not the python.org
one.  Apparantly ActiveState bundles up some DLLs in its installer
'in case the user doesn't have them' whereas we don't -- and turns out
some users don't have them.  I have never been able to get enough out
of a user to find out exactly what magic sauce they have that we do
not -- it would be really nice to find out so that we can add it too.

No guarantees that this particular user has this particular problem,
of course.

Laura

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Chris Angelico

On Thu, Jun 25, 2015 at 9:06 PM, Mark Lawrence  wrote:
>> 2. Is this a bug in Python? I would prefer if Python dealt with the gory
>> details of Windows' silly behavior.
>
>
> I don't see why Python should work around any particular limitation of any
> given OS.

Check out the multiprocessing module, and then tell me whether it's
better that Python paper over the OS differences or if you'd rather do
all that yourself. The biggest difference left between Windows and
POSIX is that on Windows, your main module has to be importable (which
doesn't hurt on POSIX). Python deals with all the mess of "can we
fork, or do we have to do it differently?".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re:

2015-06-25 Thread Michael Torrie

On 06/24/2015 09:56 AM, Knss Teja via Python-list wrote:
> I WANT TO install 4.3  version ... but the MSI file is giving a DLL error .. 
> what should I do :/
> please use REPLY ALL .. so that I get the mail to my gmail inbox

No idea what you mean about wanting to get mail to your gmail inbox...
I'd think the mailing list would do just that.

Anyway, what is your DLL error?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Michael Torrie

On 06/25/2015 06:34 AM, Tim Golden wrote:
> On 25/06/2015 13:04, Joonas Liik wrote:
>> It sounds to me more like it is possible to use long file names on windows
>> but it is a pain and in python, on windows it is basically impossible.
> 
> Certainly not impossible: you could write your own wrapper function:
> 
> def extended_path(p):
> return r"\\?\%s" % os.path.abspath(p)
> 
> where you knew that there was a possibility of long paths and that an
> absolute path would work.

The OP mentions that even when he manually supplies extended paths,
os.mkdir, os.getsize, and shutil.rmtree return errors for him in Python
2.7.  So there's more to this problem.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Tim Golden

On 25/06/2015 14:35, Michael Torrie wrote:
> On 06/25/2015 06:34 AM, Tim Golden wrote:
>> On 25/06/2015 13:04, Joonas Liik wrote:
>>> It sounds to me more like it is possible to use long file names on windows
>>> but it is a pain and in python, on windows it is basically impossible.
>>
>> Certainly not impossible: you could write your own wrapper function:
>>
>> def extended_path(p):
>> return r"\\?\%s" % os.path.abspath(p)
>>
>> where you knew that there was a possibility of long paths and that an
>> absolute path would work.
> 
> The OP mentions that even when he manually supplies extended paths,
> os.mkdir, os.getsize, and shutil.rmtree return errors for him in Python
> 2.7.  So there's more to this problem.
> 

He's probably not passing unicode strings: the extended path only works
for unicode string. For 3.x that's what you do by default.

TJG
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Steven D'Aprano

On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:

> On 2015-06-25, Steven D'Aprano 
> wrote:
>> On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
>>> If it's encrypted malware, and you can't decrypt it, there's no threat.
>>
>> If the *only* threat is that the sender will send malware, you can
>> mitigate around that by dropping the file in an unencrypted container.
>> Anything good enough to prevent Windows from executing the code,
>> accidentally or deliberately, say, a tar file with a custom extension.
> 
> That won't stop virus scanners etc potentially making their own minds
> up about the file.

*shrug* Sure, but I was specifically referring to the risk of the malware
being executed, not being detected by a virus scanner.

Encrypting the file won't even necessarily stop the virus scanner from
finding false positives. It might even increase the chances. But it will
prevent the virus scanner from finding actual viruses. You may or may not
consider that a problem.

>> But encrypting the file is also a good solution, and it prevents the
>> storage machine spying on the file contents too. Provided the encryption
>> is strong.
> 
> How would the receiver encrypting the file after receiving it prevent
> the receiver from seeing what's in the file?

I didn't say it ought to be encrypted by the receiver. Obviously the
encryption needs to be done in a way that the recipient doesn't get access
to the key. The obvious way to do that is for the application to encrypt
the data before it sends it. Then the receiver just writes the encrypted
bytes directly to a file. That would have the benefit of protecting against
man-in-the-middle attacks as well, since the file is never transmitted in
the clear.

>>> The original post said that the sender will usually send files they
>>> encrypted, unless they are malicious. So if the sender wants them to
>>> be encrypted, they already are.
>>
>> The OP *hopes* that the sender will encrypt the files. I think that's a
>> vanishingly faint hope, unless the application itself encrypts the file.
> 
> Yes, the application itself encrypts the file. Haven't you been
> reading what he's saying?

I have been reading what the OP has been saying. I'm not sure if you have
been. The OP doesn't want to encrypt the file, because he wants the
application to be pure Python and encryption in pure Python is too slow. So
he wants to obfuscate it with some sort of substitution cipher or
equivalent, which may be easily crackable by anyone who really wants to.

I've been arguing that the application *should* encrypt the file, and not
mess about giving the illusion of security.

>> The sender has a copy of the application? Then they can see the type of
>> obfuscation used. If they know the key, or can guess it, they can take
>> their malware, *decrypt* it, and send that, so that *encrypting* that
>> file puts the malicious code on the disk.
> 
> Not if they don't know the key they can't.

"If they know the key, or can guess it, ..."
"Not if they don't know the key they can't."

Really? Glad you're around to point that out to me.

But seriously, they have the application. If the application is using a
symmetric substitution cipher, it needs the key (because there is only
one), so the receiver will have the cipher.

With the sort of substitution cipher the OP is experimenting with, forcing a
particular result is trivially easy. The sender has access to the
application, knows the cipher, knows the key, and can easily generate a
file which will generate whatever content the sender wants after being
obfuscated.

Modern asymmetric ciphers like AES are quite resistant to that sort of
attack. There is, so far as I know, no way to generate a file which results
in a specific content after encryption.

>> E.g. suppose I want to send you an insult, but I know your program
>> automatically ROT-13s the strings I send you. Then I send you:
>>
>> 'lbhe sngure fzryyf bs ryqreoreevrf'
>>
>> and your program ROT-13s it to:
>>
>> 'your father smells of elderberries'
>>
>> I know that the OP doesn't propose using ROT-13, but a classical
>> substitution cipher isn't that much stronger.
> 
> Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
> receiver, and suddenly it's not such a bad method of obfuscation.

There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.

> Improve it to the random-translation-map method he's actually using
> and you've got really quite a reasonable system.

No, truly you haven't. The OP is experimenting with bytearray.translate,
which likely makes it a monoalphabetic substitution cipher, and the
techniques for cracking those go back to the 9th century AD. That's over a
thousand years of experience in cracking these things.

The situation is a bit harder than the sor

Could you explain why this program runs?

2015-06-25 Thread fl

Hi,

I download and install pyPDF2 library online. It says the test can run by:


python -m unittest Tests.tests


tests.py is under folder PyPDF2-master\Tests\


The above command line does run and give output message, but I don't 
understand why it run after I read tests.py:


///
import os, sys, unittest

# Configure path environment
TESTS_ROOT = os.path.abspath(os.path.dirname(__file__))
PROJECT_ROOT = os.path.dirname(TESTS_ROOT)
RESOURCE_ROOT = os.path.join(PROJECT_ROOT, 'Resources')

sys.path.append(PROJECT_ROOT)

# Test imports
import unittest
from PyPDF2 import PdfFileReader


class PdfReaderTestCases(unittest.TestCase):

def test_PdfReaderFileLoad(self):
''' Test loading and parsing of a file. Extract text of the 
file and compare to expected
textual output. Expected outcome: file loads, text 
matches expected.
'''
with open(os.path.join(RESOURCE_ROOT, 'crazyones.pdf'), 'rb') 
as inputfile:

# Load PDF file from file
ipdf = PdfFileReader(inputfile)
ipdf_p1 = ipdf.getPage(0)

# Retrieve the text of the PDF
pdftext_file = open(os.path.join(RESOURCE_ROOT, 
'crazyones.txt'), 'r')
pdftext = pdftext_file.read()
ipdf_p1_text = ipdf_p1.extractText()

# Compare the text of the PDF to a known source
self.assertEqual(ipdf_p1_text.encode('utf-8', 
errors='ignore'), pdftext,
msg='PDF extracted text differs from expected 
value.\n\nExpected:\n\n%r\n\nExtracted:\n\n%r\n\n'
% (pdftext, 
ipdf_p1_text.encode('utf-8', errors='ignore')))
//

It only gives a class PdfReaderTestCases() substantiation. I have read
usage on class, but I have not found the answer.
Can you help me on why the command line can run the test?

Thanks,
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Jon Ribbens

On 2015-06-25, Steven D'Aprano  wrote:
> On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:
>> That won't stop virus scanners etc potentially making their own minds
>> up about the file.
>
> *shrug* Sure, but I was specifically referring to the risk of the malware
> being executed, not being detected by a virus scanner.
>
> Encrypting the file won't even necessarily stop the virus scanner from
> finding false positives. It might even increase the chances.

That seems spectacularly unlikely.

> But it will prevent the virus scanner from finding actual viruses.
> You may or may not consider that a problem.

The OP would consider it a benefit.

> I didn't say it ought to be encrypted by the receiver. Obviously the
> encryption needs to be done in a way that the recipient doesn't get access
> to the key.

No, you're still misunderstanding. The encryption needs to be done in
a way that the *sender* doesn't get access to the key. The recipient
has access to it by definition because the recipient chooses it.

> The obvious way to do that is for the application to encrypt the
> data before it sends it.

Yes, he already said the application does that. The problem is,
what if the sender is not the genuine application but is instead
a malicious attacker?

> Then the receiver just writes the encrypted bytes directly to a file.

That's precisely what he's trying to avoid.

> That would have the benefit of protecting against man-in-the-middle
> attacks as well, since the file is never transmitted in the clear.

With what he's talking about, the file after encryption is never
transmitted *at all*.

> I've been arguing that the application *should* encrypt the file, and not
> mess about giving the illusion of security.

You haven't understood the threat model.

> But seriously, they have the application. If the application is using a
> symmetric substitution cipher, it needs the key (because there is only
> one), so the receiver will have the cipher.

There is not only one key. The recipient would invent a new key for
each file after the file is received.

> With the sort of substitution cipher the OP is experimenting with, forcing a
> particular result is trivially easy. The sender has access to the
> application, knows the cipher, knows the key, and can easily generate a
> file which will generate whatever content the sender wants after being
> obfuscated.

No, because the sender does not know the key.

>> Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
>> receiver, and suddenly it's not such a bad method of obfuscation.
>
> There are only 256 possible values for n, one of which doesn't transform the
> data at all (ROT-0). If you're thinking of attacking this by pencil and
> paper, 255 transformations sounds like a lot. For a computer, that's barely
> harder than a single transformation.

Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.

>> Improve it to the random-translation-map method he's actually using
>> and you've got really quite a reasonable system.
>
> No, truly you haven't. The OP is experimenting with bytearray.translate,
> which likely makes it a monoalphabetic substitution cipher, and the
> techniques for cracking those go back to the 9th century AD.

Only if you have the ciphertext, which the attacker in this scenario
does not. The attacker gets to set the plaintext, knows the algorithm,
does not know the key (unless the method of choosing the key has a
flaw), and wants to set the ciphertext to some specific string.
Frequency analysis doesn't even begin to apply to this scenario.

> You're relying on security by obscurity

No, he really isn't.

>> The use case is pretty obvious (a peer-to-peer dropbox type thing) but
>> it does appear to be being misunderstood. This isn't actually a crypto
>> problem at all and "users taking short-cuts" isn't an issue.
>
> Yes it is. If users don't properly pre-encrypt their files before sending it
> out to the cloud, AND THEY WON'T,

Yes they will. He said his application encrypts the files for them,
presumably he is indeed using "proper crypto" for that.

> receivers WILL be able to read those files,

That's a problem for the sender not the receiver.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Could you explain why this program runs?

2015-06-25 Thread fl

On Thursday, June 25, 2015 at 8:20:52 AM UTC-7, fl wrote:
> Hi,
> 
> I download and install pyPDF2 library online. It says the test can run by:
> 
> 
> python -m unittest Tests.tests
> 
> 
> tests.py is under folder PyPDF2-master\Tests\
> 
> 
> The above command line does run and give output message, but I don't 
> understand why it run after I read tests.py:
> 
> 
> ///
> import os, sys, unittest
> 
> # Configure path environment
> TESTS_ROOT = os.path.abspath(os.path.dirname(__file__))
> PROJECT_ROOT = os.path.dirname(TESTS_ROOT)
> RESOURCE_ROOT = os.path.join(PROJECT_ROOT, 'Resources')
> 
> sys.path.append(PROJECT_ROOT)
> 
> # Test imports
> import unittest
> from PyPDF2 import PdfFileReader
> 
> 
> class PdfReaderTestCases(unittest.TestCase):
> 
>   def test_PdfReaderFileLoad(self):
>   ''' Test loading and parsing of a file. Extract text of the 
> file and compare to expected
>   textual output. Expected outcome: file loads, text 
> matches expected.
>   '''
>   with open(os.path.join(RESOURCE_ROOT, 'crazyones.pdf'), 'rb') 
> as inputfile:
>   
>   # Load PDF file from file
>   ipdf = PdfFileReader(inputfile)
>   ipdf_p1 = ipdf.getPage(0)
>   
>   # Retrieve the text of the PDF
>   pdftext_file = open(os.path.join(RESOURCE_ROOT, 
> 'crazyones.txt'), 'r')
>   pdftext = pdftext_file.read()
>   ipdf_p1_text = ipdf_p1.extractText()
>   
>   # Compare the text of the PDF to a known source
>   self.assertEqual(ipdf_p1_text.encode('utf-8', 
> errors='ignore'), pdftext,
>   msg='PDF extracted text differs from expected 
> value.\n\nExpected:\n\n%r\n\nExtracted:\n\n%r\n\n'
>   % (pdftext, 
> ipdf_p1_text.encode('utf-8', errors='ignore')))
> //
> 
> It only gives a class PdfReaderTestCases() substantiation. I have read
> usage on class, but I have not found the answer.
> Can you help me on why the command line can run the test?
> 
> Thanks,

Thanks for reading. I make it out that it is a feature of unittest module.
-- 
https://mail.python.org/mailman/listinfo/python-list

converting boolean filter function to lambda

2015-06-25 Thread Ethan Furman


I have the following function:

def phone_found(p):
  for c in contacts:
if p in c:
  return True
  return False

with the following test data:

contacts = ['672.891.7280 x999', '291.792.9000 x111']
main = ['291.792.9001', '291.792.9000']

which works:

filter(phone_found, main)
# ['291.792.9000']

My attempt at a lambda function fails:

filter(lambda p: (p in c for c in contacts), main)
# ['291.792.9001', '291.792.9000']

Besides using a lambda ;) , what have I done wrong?

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list

Re: Could you explain why this program runs?

2015-06-25 Thread Mark Lawrence


On 25/06/2015 16:20, fl wrote:

Hi,

I download and install pyPDF2 library online. It says the test can run by:

python -m unittest Tests.tests



The -m flag says run unittest as a script which then calls Tests.tests.

You can find out what all flags do by typing at the command prompt.

python --help

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Ian Kelly

On Thu, Jun 25, 2015 at 9:59 AM, Ethan Furman  wrote:
> I have the following function:
>
> def phone_found(p):
>   for c in contacts:
> if p in c:
>   return True
>   return False
>
> with the following test data:
>
> contacts = ['672.891.7280 x999', '291.792.9000 x111']
> main = ['291.792.9001', '291.792.9000']
>
> which works:
>
> filter(phone_found, main)
> # ['291.792.9000']
>
> My attempt at a lambda function fails:
>
> filter(lambda p: (p in c for c in contacts), main)
> # ['291.792.9001', '291.792.9000']
>
> Besides using a lambda ;) , what have I done wrong?

The lambda returns a generator, not a boolean. All generators are truthy.

I think you want this instead:

filter(lambda p: any(p in c for c in contacts), main)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Peter Otten

Ethan Furman wrote:

> I have the following function:
> 
> def phone_found(p):
>for c in contacts:
>  if p in c:
>return True
>return False
> 
> with the following test data:
> 
> contacts = ['672.891.7280 x999', '291.792.9000 x111']
> main = ['291.792.9001', '291.792.9000']
> 
> which works:
> 
> filter(phone_found, main)
> # ['291.792.9000']
> 
> My attempt at a lambda function fails:
> 
> filter(lambda p: (p in c for c in contacts), main)
> # ['291.792.9001', '291.792.9000']
> 
> Besides using a lambda ;) , what have I done wrong?

The lambda returns a generator expression and that expression is always true 
in a boolean context:

>>> bool(False for _ in ())
True

you're missing any() ...

>>> contacts = ['672.891.7280 x999', '291.792.9000 x111']
>>> main = ['291.792.9001', '291.792.9000']
>>> filter(lambda p: any(p in c for c in contacts), main)


... and list() if you were using Python 3.

>>> list(_)
['291.792.9000']


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To write headers once with different values in separate row in CSV

2015-06-25 Thread kbtyo

Okay, so I have gone back to the drawing board and have the following 
predicament (my apologies, in advance for the indentation):

Here is my sample:


0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0
0
0


1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM
1/1/0001 12:00:00 
AM




False
False
False
False
False
0






Using this:


import xml.etree.cElementTree as ElementTree 
from xml.etree.ElementTree import XMLParser
import csv

def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict 
if len(element) == 1 or element[0].tag != element[1].tag: 
yield from flatten_dict(element, eprefix)
# treat like list 
elif element[0].tag == element[1].tag: 
yield from flatten_list(element, eprefix)
elif element.text: 
text = element.text.strip() 
if text: 
yield eprefix[:].rstrip('.'), element.text

def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag 
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = prefix + element.tag  
if element:
# treat like dict - we assume that if the first two tags 
# in a series are different, then they are all different. 
if len(element) == 1 or element[0].tag != element[1].tag: 
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags 
# in a series are the same, then the rest are the same. 
else: 
# here, we put the list in dictionary; the key is the 
# tag name the list elements all share in common, and 
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag, 
# you won't be having any text. This may or may not be a 
# good idea -- time will tell. It works for the way we are 
# currently doing XML configuration files... 
elif element.items(): 
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract 
# the text 
else:
yield eprefix, element.text

def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend(

Re: Python realted to question

2015-06-25 Thread Terry Reedy


On 6/25/2015 2:49 AM, 문건희 wrote:


Two raspberryPi2 Model B was connected to a socket to communicate with
each other.
I want to send the video by using the communication.
Method of transmitting a text file is known. However, the video file is
not sent.


What are the symptoms of 'not sent'?  What Python code are you using?


So, Please tell me how to transfer a video file...


If you are using Python (2 or 3?), you will have to read and send the 
file as a binary file, not a text file.


You might want to send this question to an RPi-specific mailing list.

When posting to a mailing list, send plain text, not html with 'remote 
content' (which good readers will block).  This rule is true for all 
English-language mailing lists unless specified otherwise.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread Terry Reedy


On 6/25/2015 5:16 AM, Steven D'Aprano wrote:

On Thursday 25 June 2015 18:00, Albert-Jan Roskam wrote:


Hi,

Consider the following calls, where very_long_path is more than 256 bytes:
[1] os.mkdir(very_long_path)
[2] os.getsize(very_long_path)
[3] shutil.rmtree(very_long_path)

I am using Python 2.7 and [1] and [2] fail under Windows XP [3] fails
under Win7 (not sure about XP). It throws: “WindowsError: [Error 206] The
filename or extension is too long”


I don't think this is a bug. It seems to be a limitation of Windows.

https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#maxpath


This is even when I use the "special"
notations \\?\c:\dir\file or \\?\UNC\server\share\file, e.g.
os.path.getsize("?\\" + "c:\\dir\\file")


However, that may be a bug.

What happens if you use a Unicode string?

path = u"?\\c:a\\very\\long\\path"
os.mkdir(path)


Can you open an existing file?

open(u"?\\c:a\\very\\long\\path\\file.txt")




(Oddly, os.path.getsize(os.path.join("?", "c:\\dir\\file")) will
truncate the prefix)


That's worth reporting as a bug.


If possible, please try the same operations with Python 3.4 or .5 before 
making a report



--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: windows and file names > 256 bytes

2015-06-25 Thread random832

On Thu, Jun 25, 2015, at 09:35, Michael Torrie wrote:
> The OP mentions that even when he manually supplies extended paths,
> os.mkdir, os.getsize, and shutil.rmtree return errors for him in Python
> 2.7.  So there's more to this problem.

The byte versions of the underlying OS APIs use a 256-character buffer
to do conversion - he needs to also be passing unicode strings.
-- 
https://mail.python.org/mailman/listinfo/python-list

Could you explain "[1, 2, 3].remove(2)" to me?

2015-06-25 Thread fl

Hi,

I see a code snippet online:

[1, 2, 3].remove(42)

after I modify it to:

[1, 2, 3].remove(2)

and

aa=[1, 2, 3].remove(2)


I don't know where the result goes. Could you help me on the question?

Thanks,
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Could you explain "[1, 2, 3].remove(2)" to me?

2015-06-25 Thread Ian Kelly

On Thu, Jun 25, 2015 at 12:14 PM, fl  wrote:
> Hi,
>
> I see a code snippet online:
>
> [1, 2, 3].remove(42)

I don't know where you pulled this from, but if this is from a
tutorial then it doesn't seem to be a very good one.

This constructs a list containing the elements 1, 2, and 3, and
attempts to remove the non-existent element 42, which will raise a
ValueError.

> after I modify it to:
>
> [1, 2, 3].remove(2)

This at least removes an element that is actually in the list, so it
won't throw an error, but the list is then discarded, so nothing was
actually accomplished by it.

> and
>
> aa=[1, 2, 3].remove(2)

This does the same thing, but it sets the variable aa to the result of
the *remove* operation. The remove operation, as it happens, returns
None, so the the list is still discarded, and the only thing
accomplished is that aa is now bound to None.

> I don't know where the result goes. Could you help me on the question?

You need to store the list somewhere before you start calling
operations on it. Try this:

aa = [1, 2, 3]
aa.remove(2)

Now you have the list in the variable aa, and the value 2 has been
removed from it.
-- 
https://mail.python.org/mailman/listinfo/python-list

enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-25 Thread kbtyo

My question can be found here:


http://stackoverflow.com/questions/31058100/enumerate-column-headers-in-csv-that-belong-to-the-same-tag-key-in-python


Here is an additional sample sample of the XML that I am working with: 



0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0
0
0


1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM




False
False
False
False
False
0




-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith

Thanks Jon.  I couldn't have answered those questions better myself, and 
I wrote the software in question.


I didn't intend to describe the entire system, but rather just enough of 
it to present the issue at hand.  You seem to understand it quite well.


I'm now using a randomly generated 256 byte translation table, which 
performs very well on the lowly Raspberry PI ARM chip.  The Raspberry PI 
is to be my recommended storage node platform.


For those that care, the storage system is something like Amazon S3, 
except storage is distributed peer to peer.  Clients compress, encrypt, 
and chunk data, then send it to storage nodes. Storage nodes propagate 
the data.  Encryption and Authentication are handled through TLS.  Files 
use AES encryption for storage.  Storage Nodes are monitored for 
availability, integrity, and performance.  Data transfers are 
coordinated by a centralized service which tracks storage and transfers. 
 Redundancy is configurable by chunk. Storage nodes are compensated for 
storage x time.  Uploads and downloads can utilize several storage nodes 
simultaneously to increase throughput.


-Randall

On 06/25/2015 10:26 AM, Jon Ribbens wrote:

On 2015-06-25, Steven D'Aprano  wrote:

On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:

That won't stop virus scanners etc potentially making their own minds
up about the file.


*shrug* Sure, but I was specifically referring to the risk of the malware
being executed, not being detected by a virus scanner.

Encrypting the file won't even necessarily stop the virus scanner from
finding false positives. It might even increase the chances.


That seems spectacularly unlikely.


But it will prevent the virus scanner from finding actual viruses.
You may or may not consider that a problem.


The OP would consider it a benefit.


I didn't say it ought to be encrypted by the receiver. Obviously the
encryption needs to be done in a way that the recipient doesn't get access
to the key.


No, you're still misunderstanding. The encryption needs to be done in
a way that the *sender* doesn't get access to the key. The recipient
has access to it by definition because the recipient chooses it.


The obvious way to do that is for the application to encrypt the
data before it sends it.


Yes, he already said the application does that. The problem is,
what if the sender is not the genuine application but is instead
a malicious attacker?


Then the receiver just writes the encrypted bytes directly to a file.


That's precisely what he's trying to avoid.


That would have the benefit of protecting against man-in-the-middle
attacks as well, since the file is never transmitted in the clear.


With what he's talking about, the file after encryption is never
transmitted *at all*.


I've been arguing that the application *should* encrypt the file, and not
mess about giving the illusion of security.


You haven't understood the threat model.


But seriously, they have the application. If the application is using a
symmetric substitution cipher, it needs the key (because there is only
one), so the receiver will have the cipher.


There is not only one key. The recipient would invent a new key for
each file after the file is received.


With the sort of substitution cipher the OP is experimenting with, forcing a
particular result is trivially easy. The sender has access to the
application, knows the cipher, knows the key, and can easily generate a
file which will generate whatever content the sender wants after being
obfuscated.


No, because the sender does not know the key.


Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
receiver, and suddenly it's not such a bad method of obfuscation.


There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.


Improve it to the random-translation-map method he's actually using
and you've got really quite a reasonable system.


No, truly you haven't. The OP is experimenting with bytearray.translate,
which likely makes it a monoalphabetic substitution cipher, and the
techniques for cracking those go back to the 9th century AD.


Only if you have the ciphertext, which the attacker in this scenario
does not. The attacker gets to set the plaintext, knows the algorithm,
does not know the key (unless the method of choosing the key has a
flaw), and wants to set the ciphertext to some specific string.
Frequency analysis doesn't even begin to apply to this scenario.


You're relying on security by obscurity


No, he really isn't.


The use case is pretty obvious (a peer-to-peer dropbox type thing) but
it does appear to be being misunderstood. This i

Re: Could you explain "[1, 2, 3].remove(2)" to me?

2015-06-25 Thread Jussi Piitulainen

fl writes:

> aa=[1, 2, 3].remove(2)
>
> I don't know where the result goes. Could you help me on the question?

That method modifies the list and returns None (or raises an exception).

Get a hold on the list first:

aa=[1, 2, 3]

*Then* call the method. Just call the method, do not try to store the
value (which will be None) anywhere:

aa.remove(2)

*Now* you can see that the list has changed. Try it.

By the way, it's no use to try [1, 2, 3].remove(2). That will only
modify and throw away a different list that just happens to have the
same contents initially. Try these two:

aa=[1, 2, 3]
bb=[1, 2, 3]  # a different list!

aa=[1, 2, 3]
bb=aa # the same list!

In both cases, try removing 2 from aa and then watch what happens to aa
and what happens to bb.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith


On 06/24/2015 11:27 PM, Devin Jeanpierre wrote:

On Wed, Jun 24, 2015 at 9:07 PM, Steven D'Aprano  wrote:

But just sticking to the three above, the first one is partially mitigated
by allowing virus scanners to scan the data, but that implies that the
owner of the storage machine can spy on the files. So you have a conflict
here.


If it's encrypted malware, and you can't decrypt it, there's no threat.


Honestly, the *only* real defence against the spying issue is to encrypt the
files. Not obfuscate them with a lousy random substitution cipher. The
storage machine can keep the files as long as they like, just by making a
copy, and spend hours bruteforcing them. They *will* crack the substitution
cipher. In pure Python, that may take a few days or weeks; in C, hours or
days. If they have the resources to throw at it, minutes. Substitution
ciphers have not been effective encryption since, oh, the 1950s, unless you
use a one-time pad. Which you won't be.


The original post said that the sender will usually send files they
encrypted, unless they are malicious. So if the sender wants them to
be encrypted, they already are.

"While the data senders are supposed to encrypt data, that's not
guaranteed, and I'd like to protect the recipient against exposure to
nefarious data by mangling or encrypting the data before it is written
to disk."

The cipher is just to keep the sender from being able to control what
is on disk.

I am usually very oppositional when it comes to rolling your own
crypto, but am I alone here in thinking the OP very clearly laid out
their case?

-- Devin



Thanks Devin.  You understand the issue perfectly despite my limited 
description of the system.  I've fully implemented and performance 
tested your suggested solution and am quite happy with it.


Though the issue is solved, I would be glad to listen to any remaining 
criticisms, suggestions or questions.


--Randall
--
https://mail.python.org/mailman/listinfo/python-list

Reference Counting Irregularity

2015-06-25 Thread Eric Edmond

Hi,

I have been writing a C++ extension for Python recently, and am currently 
fixing the reference counting throughout the extension. As I am very new to 
this topic, my question may have a simple answer, but I was unable to find any 
mention of the behavior online.

When using the PyObject_GetItem(obj, key) function, I noticed inconsistent 
behavior with various types of obj (for reference, key is always a str object). 
When obj is a standard dictionary, the key's reference count remains unchanged, 
as expected. However, when obj is a pandas DataFrame object, the key's 
reference count increases by 2. This could very well be by design of the 
DataFrame object doing some internal caching of the string, but does not appear 
in the documentation, so I thought I would bring up the issue.

Thanks,

Eric Edmond
University of Michigan | Class of 2016

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Randall Smith


On 06/24/2015 08:33 PM, Dennis Lee Bieber wrote:

On Wed, 24 Jun 2015 13:20:07 -0500, Randall Smith 
declaimed the following:


On 06/24/2015 06:36 AM, Steven D'Aprano wrote:

I don't understand how mangling the data is supposed to protect the
recipient. Don't they have the ability unmangle the data, and thus expose
themselves to whatever nasties are in the files?


They never look at the data and wouldn't care to unmangle it.  The
purpose is primarily to prevent automated software (file indexers, virus
scanners) from doing bad things to the data.



Which leads to the question: what is "doing bad things".


Storage nodes are computers running the software in discussion, that 
store chunks of data they are sent (recipient) and send it upon request. 
 Their job (as related to this software) is to accept, store and send 
chunks of data upon request.  So losing data is a bad thing.


The storage node software is cross platform and should run on anything 
from a dedicated Raspberry PI to an old Windows PC.  Data integrity is 
insured using encryption and hashes generated by the original data 
owners.  Normally, a data chunk would look like random bytes, because it 
is encrypted.  However, the storage node cannot prevent the client 
(uploader) from sending unencrypted data.  The purpose of this 
obfuscation is to protect the storage node, as many potential users have 
expressed hesitation in storing other peoples data.


Example: A storage node runs a Desktop OS with an image indexer. It 
receives an unencrypted nasty image or movie. The indexer picks it up 
and shows it in the person's image or movie "Library".


Does that clear things up?


-Randall
--
https://mail.python.org/mailman/listinfo/python-list

Could you give me the detail process of 'make_incrementor(22)(33)'?

2015-06-25 Thread fl

Hi,

I read a tutorial on lambda on line. I don't think that I am clear about
the last line in its example code. It gives two parameters (22, 23). 
Is 22 for n, and 23 for x? Or, it creates two functions first. Then,
each function gets 22 while the other function gets 23?


Please help me on this interesting problem. Thanks,






>>> def make_incrementor (n): return lambda x: x + n
>>> 
>>> f = make_incrementor(2)
>>> g = make_incrementor(6)
>>> 
>>> print f(42), g(42)
44 48
>>> 
>>> print make_incrementor(22)(33)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Reference Counting Irregularity

2015-06-25 Thread Ian Kelly

On Thu, Jun 25, 2015 at 12:49 PM, Eric Edmond  wrote:
> Hi,
>
> I have been writing a C++ extension for Python recently, and am currently
> fixing the reference counting throughout the extension. As I am very new to
> this topic, my question may have a simple answer, but I was unable to find
> any mention of the behavior online.
>
> When using the PyObject_GetItem(obj, key) function, I noticed inconsistent
> behavior with various types of obj (for reference, key is always a str
> object). When obj is a standard dictionary, the key’s reference count
> remains unchanged, as expected. However, when obj is a pandas DataFrame
> object, the key’s reference count increases by 2. This could very well be by
> design of the DataFrame object doing some internal caching of the string,
> but does not appear in the documentation, so I thought I would bring up the
> issue.

What is your question? If you want to know why 2 is added to the
reference count for pandas objects, the first place to check would be
with the pandas developers, as PyObject_GetItem is just going to call
the implementation for that type. You can find the CPython code here:

https://hg.python.org/cpython/file/9aad116baee8/Objects/abstract.c#l136
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Could you give me the detail process of 'make_incrementor(22)(33)'?

2015-06-25 Thread Ian Kelly

On Thu, Jun 25, 2015 at 2:53 PM, fl  wrote:
> Hi,
>
> I read a tutorial on lambda on line. I don't think that I am clear about
> the last line in its example code. It gives two parameters (22, 23).
> Is 22 for n, and 23 for x? Or, it creates two functions first. Then,
> each function gets 22 while the other function gets 23?
>
>
 def make_incrementor (n): return lambda x: x + n

 f = make_incrementor(2)
 g = make_incrementor(6)

 print f(42), g(42)
> 44 48

 print make_incrementor(22)(33)

make_incrementor is a function that takes an argument n. It returns a
function that takes an argument x. So when you do make_incrementor(22)
that passes 22 to make_incrementor as the value of n, and when you do
make_incrementor(22)(33), that also passes 22 to make_incrementor as
the value of n, and then it passes 33 to the returned function as the
value of x.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 1:59 AM, Ethan Furman  wrote:
> My attempt at a lambda function fails:
>
> filter(lambda p: (p in c for c in contacts), main)
> # ['291.792.9001', '291.792.9000']
>
> Besides using a lambda ;) , what have I done wrong?

This looks like a job for a list comprehension!

(Cue the swooping-in cape-wearing list comp, coming to save the day)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Mark Lawrence


On 26/06/2015 00:59, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 1:59 AM, Ethan Furman  wrote:

My attempt at a lambda function fails:

filter(lambda p: (p in c for c in contacts), main)
# ['291.792.9001', '291.792.9000']

Besides using a lambda ;) , what have I done wrong?


This looks like a job for a list comprehension!

(Cue the swooping-in cape-wearing list comp, coming to save the day)

ChrisA



In exactly the same way that swooping-in cape-wearing Greg Ewing, from 
the home of 10 man rugby, had to point out to Ethan some years back what 
"booted into touch" meant? :)


I've also seen Ethan refer to bowlers today, I just hope he's very 
careful and doesn't get caught on the back foot.  Next question from 
Ethan is?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
 wrote:
>> There are only 256 possible values for n, one of which doesn't transform the
>> data at all (ROT-0). If you're thinking of attacking this by pencil and
>> paper, 255 transformations sounds like a lot. For a computer, that's barely
>> harder than a single transformation.
>
> Well, it means you need to send 256 times as much data, which is a
> start. If you're instead using a 256-byte translation table then
> an attack becomes utterly impractical.
>

Utterly impractical? Maybe, if you attempt a pure brute-force approach
- there are 256! possible translation tables, which is roughly e500
attempts [1], and at roughly four a microsecond [2] that'd still take
a ridiculously long time. But there are two gigantic optimizations you
could do. Firstly, there are frequency-based attacks, and byte value
duplicates will tell you a lot - classic cryptographic work. And
secondly, you can simply take the first few bytes of a file - let's
say 16, although a lot of files can be recognized in less than that.
Even if there are no duplicate bytes, that'd be a maximum of 16!
translation tables that truly matter, or just 2e13. At the same speed,
that makes about a million seconds of computing time required. Divide
that across a bunch of separate computers (the job is embarrassingly
parallel after all), and you could get that result pretty easily. Cut
the prefix to just 8 bytes and you have a mere 40K encryption keys to
try - so quick that you wouldn't even see it happen. Nope, a simple
substitution cipher is still not secure. Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.

ChrisA

[1] It's actually closer to 8.6e506, if you care.
[2] timeit result from my laptop - you could do better, but that's a
reasonable average
-- 
https://mail.python.org/mailman/listinfo/python-list

mutual coaching

2015-06-25 Thread adhamyos

hello anyone wants to study python? we can learn together! pm me my name is 
adham128 iam at the #programming room
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Ethan Furman


On 06/25/2015 05:09 PM, Mark Lawrence wrote:

On 26/06/2015 00:59, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 1:59 AM, Ethan Furman  wrote:

My attempt at a lambda function fails:

filter(lambda p: (p in c for c in contacts), main)
# ['291.792.9001', '291.792.9000']

Besides using a lambda ;) , what have I done wrong?


This looks like a job for a list comprehension!

(Cue the swooping-in cape-wearing list comp, coming to save the day)

ChrisA



In exactly the same way that swooping-in cape-wearing Greg Ewing, from the home of 10 man 
rugby, had to point out to Ethan some years back what "booted into touch" 
meant? :)

I've also seen Ethan refer to bowlers today, I just hope he's very careful and 
doesn't get caught on the back foot.  Next question from Ethan is?


Uh, "caught on the back foot"?  I thought I was okay as long as I didn't slide 
across the line or gutter the ball.  ;)

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Ian Kelly

On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico  wrote:
> On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
>  wrote:
>>> There are only 256 possible values for n, one of which doesn't transform the
>>> data at all (ROT-0). If you're thinking of attacking this by pencil and
>>> paper, 255 transformations sounds like a lot. For a computer, that's barely
>>> harder than a single transformation.
>>
>> Well, it means you need to send 256 times as much data, which is a
>> start. If you're instead using a 256-byte translation table then
>> an attack becomes utterly impractical.
>>
>
> Utterly impractical? Maybe, if you attempt a pure brute-force approach
> - there are 256! possible translation tables, which is roughly e500
> attempts [1], and at roughly four a microsecond [2] that'd still take
> a ridiculously long time. But there are two gigantic optimizations you
> could do. Firstly, there are frequency-based attacks, and byte value
> duplicates will tell you a lot - classic cryptographic work. And
> secondly, you can simply take the first few bytes of a file - let's
> say 16, although a lot of files can be recognized in less than that.
> Even if there are no duplicate bytes, that'd be a maximum of 16!
> translation tables that truly matter, or just 2e13. At the same speed,
> that makes about a million seconds of computing time required. Divide
> that across a bunch of separate computers (the job is embarrassingly
> parallel after all), and you could get that result pretty easily. Cut
> the prefix to just 8 bytes and you have a mere 40K encryption keys to
> try - so quick that you wouldn't even see it happen. Nope, a simple
> substitution cipher is still not secure. Even the famous Enigma
> machine was a lot more than just letter-for-letter substitution - a
> double letter in the cleartext wouldn't be represented by a double
> letter in the result - and once the machine's secrets were figured
> out, the day's key could be reassembled fairly readily.

You're making the same mistake that Steven did in misunderstanding the
threat model. The goal isn't to prevent the attacker from working out
the key for a file that has already been obfuscated. Any real data
that might be exposed by a vulnerability in the server is presumed to
have already been strongly encrypted by the user.

The goal is to prevent the attacker from guessing a key that hasn't
even been generated yet, which could be exploited to engineer the
obfuscated content into something malicious. There are no
frequency-based attacks possible here, because you can't do frequency
analysis on the result of a key that hasn't even been generated yet.
Assuming that you have no attack on the key generation itself, the
best you can do is send a file deobfuscated with a random key and hope
that the recipient randomly chooses the same key; the odds of that
happening are 1 in 256!.

That said, I do see a potential weakness here: if the attacker can
create a malicious payload using only a subset of the 256 possible
byte values, then the odds of getting a correct key are increased,
since multiple keys will work. For an extreme example, if the attacker
can manage to craft a malicious payload that uses only the two byte
values 32 and 47, then the probability of getting a key that will
obfuscate to that is increased to 1 in 256! / 254!, or 1 in 65280. If
they distribute 65280 copies of that payload to various recipients,
then they can expect that one recipient on average will get the
payload in its malicious form.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: converting boolean filter function to lambda

2015-06-25 Thread Mark Lawrence


On 26/06/2015 01:57, Ethan Furman wrote:

On 06/25/2015 05:09 PM, Mark Lawrence wrote:

On 26/06/2015 00:59, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 1:59 AM, Ethan Furman 
wrote:

My attempt at a lambda function fails:

filter(lambda p: (p in c for c in contacts), main)
# ['291.792.9001', '291.792.9000']

Besides using a lambda ;) , what have I done wrong?


This looks like a job for a list comprehension!

(Cue the swooping-in cape-wearing list comp, coming to save the day)

ChrisA



In exactly the same way that swooping-in cape-wearing Greg Ewing, from
the home of 10 man rugby, had to point out to Ethan some years back
what "booted into touch" meant? :)

I've also seen Ethan refer to bowlers today, I just hope he's very
careful and doesn't get caught on the back foot.  Next question from
Ethan is?


Uh, "caught on the back foot"?  I thought I was okay as long as I didn't
slide across the line or gutter the ball.  ;)

--
~Ethan~


I takes it you're referring to the cissies game where everybody has to 
wear a glove just to catch the ball?  And there's no seam to cut your 
hands if you get it wrong, so I guess they don't bother rubbing white 
spirit into the hands to help toughen them up? :)


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Can anybody explain the '-' in a 2-D creation code?

2015-06-25 Thread fl

Hi,

I read Ned's tutorial on Python. It is very interesting. On its last
example, I cannot understand the '_' in:



board=[[0]*8 for _ in range(8)]


I know  '_' is the precious answer, but it is still unclear what it is
in the above line. Can you explain it to me?


Thanks,
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Mark Lawrence


On 26/06/2015 01:33, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
 wrote:

There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


Well, it means you need to send 256 times as much data, which is a
start. If you're instead using a 256-byte translation table then
an attack becomes utterly impractical.



Utterly impractical? Maybe, if you attempt a pure brute-force approach
- there are 256! possible translation tables, which is roughly e500
attempts [1], and at roughly four a microsecond [2] that'd still take
a ridiculously long time. But there are two gigantic optimizations you
could do. Firstly, there are frequency-based attacks, and byte value
duplicates will tell you a lot - classic cryptographic work. And
secondly, you can simply take the first few bytes of a file - let's
say 16, although a lot of files can be recognized in less than that.
Even if there are no duplicate bytes, that'd be a maximum of 16!
translation tables that truly matter, or just 2e13. At the same speed,
that makes about a million seconds of computing time required. Divide
that across a bunch of separate computers (the job is embarrassingly
parallel after all), and you could get that result pretty easily. Cut
the prefix to just 8 bytes and you have a mere 40K encryption keys to
try - so quick that you wouldn't even see it happen. Nope, a simple
substitution cipher is still not secure. Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.



The day's key for a given network, with the Luftwaffe easily being the 
worst offenders.  Some networks remained unbroken at the end of WWII.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Can anybody explain the '-' in a 2-D creation code?

2015-06-25 Thread Mark Lawrence


On 26/06/2015 02:07, fl wrote:

Hi,

I read Ned's tutorial on Python. It is very interesting. On its last
example, I cannot understand the '_' in:



board=[[0]*8 for _ in range(8)]


I know  '_' is the precious answer, but it is still unclear what it is
in the above line. Can you explain it to me?


Thanks,



Lots of people could carry on explaining things to you, but you don't 
appear to be making any attempt to do some research before posing your 
questions, so how about using a search engine?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Can anybody explain the '-' in a 2-D creation code?

2015-06-25 Thread André Roberge

On Thursday, 25 June 2015 22:07:42 UTC-3, fl  wrote:
> Hi,
> 
> I read Ned's tutorial on Python. It is very interesting. On its last
> example, I cannot understand the '_' in:
> 
> 
> 
> board=[[0]*8 for _ in range(8)]
> 
> 
> I know  '_' is the precious answer, but it is still unclear what it is
> in the above line. Can you explain it to me?

'_' is the previous answer ONLY when using the read-eval-print-loop interpreter.

Here, it is the "name" of a variable; since we don't care about the particular 
name (it is used just for looping a fixed number of times), the common practice 
of using '_' has been used.  As you will have noted (since it confused you), 
'_' doesn't seem to designate anything of interest - unlike a variable name 
like 'string_index' or 'character', etc.

Sometimes, people will use the name "dummy" instead of '_', with the same idea 
in mind.
> 
> 
> Thanks,

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Can anybody explain the '-' in a 2-D creation code?

2015-06-25 Thread fl

On Thursday, June 25, 2015 at 6:24:07 PM UTC-7, Mark Lawrence wrote:
> On 26/06/2015 02:07, fl wrote:
> > Hi,
> >
> > I read Ned's tutorial on Python. It is very interesting. On its last
> > example, I cannot understand the '_' in:
> >
> >
> >
> > board=[[0]*8 for _ in range(8)]
> >
> >
> > I know  '_' is the precious answer, but it is still unclear what it is
> > in the above line. Can you explain it to me?
> >
> >
> > Thanks,
> >
> 
> Lots of people could carry on explaining things to you, but you don't 
> appear to be making any attempt to do some research before posing your 
> questions, so how about using a search engine?
> 
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
> 
> Mark Lawrence

Excuse me. On one hand, I am busying on cram these Python stuff quickly for
a position. On the other hand, the search seems to me needing a little 
skill to get the goal I hope. I would really appreciate if someone can give
an example on what phrase to use in the search. I am not a lazy guy.
Thanks to all the response.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence  wrote:
>> Even the famous Enigma
>> machine was a lot more than just letter-for-letter substitution - a
>> double letter in the cleartext wouldn't be represented by a double
>> letter in the result - and once the machine's secrets were figured
>> out, the day's key could be reassembled fairly readily.
>>
>
> The day's key for a given network, with the Luftwaffe easily being the worst
> offenders.  Some networks remained unbroken at the end of WWII.

I was massively oversimplifying, here. But there's a reason that
modern crypto doesn't use str.translate() level ciphers.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 11:01 AM, Ian Kelly  wrote:
> On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico  wrote:
>> On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
>>  wrote:
>>> Well, it means you need to send 256 times as much data, which is a
>>> start. If you're instead using a 256-byte translation table then
>>> an attack becomes utterly impractical.
>>>
>>
>> Utterly impractical? 
>
> You're making the same mistake that Steven did in misunderstanding the
> threat model.

To be honest, I wasn't actually answering anything about the original
threat model, but only responding to the statement that a 256-byte
"anything-to-anything" cipher is somehow incredibly secure. It isn't,
but that might not be a problem for the original purpose.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Mark Lawrence


On 26/06/2015 03:06, Chris Angelico wrote:

On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence  wrote:

Even the famous Enigma
machine was a lot more than just letter-for-letter substitution - a
double letter in the cleartext wouldn't be represented by a double
letter in the result - and once the machine's secrets were figured
out, the day's key could be reassembled fairly readily.



The day's key for a given network, with the Luftwaffe easily being the worst
offenders.  Some networks remained unbroken at the end of WWII.


I was massively oversimplifying, here. But there's a reason that
modern crypto doesn't use str.translate() level ciphers.

ChrisA



I should know.  Ever heard of DISCON?  Like to hazard a guess as to who 
worked on it all those years ago?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Pure Python Data Mangling or Encrypting

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 12:24 PM, Mark Lawrence  wrote:
> On 26/06/2015 03:06, Chris Angelico wrote:
>>
>> On Fri, Jun 26, 2015 at 11:17 AM, Mark Lawrence 
>> wrote:

 Even the famous Enigma
 machine was a lot more than just letter-for-letter substitution - a
 double letter in the cleartext wouldn't be represented by a double
 letter in the result - and once the machine's secrets were figured
 out, the day's key could be reassembled fairly readily.

>>>
>>> The day's key for a given network, with the Luftwaffe easily being the
>>> worst
>>> offenders.  Some networks remained unbroken at the end of WWII.
>>
>>
>> I was massively oversimplifying, here. But there's a reason that
>> modern crypto doesn't use str.translate() level ciphers.
>>
>> ChrisA
>>
>
> I should know.  Ever heard of DISCON?  Like to hazard a guess as to who
> worked on it all those years ago?

No, not familiar with it. But I'm guessing you have the crypto
background to know all this stuff, which means you aren't the sort of
person I need to explain things to. Great! :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Can anybody explain the '-' in a 2-D creation code?

2015-06-25 Thread Mark Lawrence


On 26/06/2015 02:40, fl wrote:

On Thursday, June 25, 2015 at 6:24:07 PM UTC-7, Mark Lawrence wrote:

On 26/06/2015 02:07, fl wrote:

Hi,

I read Ned's tutorial on Python. It is very interesting. On its last
example, I cannot understand the '_' in:



board=[[0]*8 for _ in range(8)]


I know  '_' is the precious answer, but it is still unclear what it is
in the above line. Can you explain it to me?


Thanks,



Lots of people could carry on explaining things to you, but you don't
appear to be making any attempt to do some research before posing your
questions, so how about using a search engine?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


Excuse me. On one hand, I am busying on cram these Python stuff quickly for
a position. On the other hand, the search seems to me needing a little
skill to get the goal I hope. I would really appreciate if someone can give
an example on what phrase to use in the search. I am not a lazy guy.
Thanks to all the response.



http://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

[ANN] PySWITCH 0.2

2015-06-25 Thread Godson Gera

=
pyswitch 0.2
=

PySWITCH 0.2 is released

Please, note that PySWITCH 0.2 is not available on PyPI because of name
conflict

Major changes to 0.2

*Many new FreeSWITCH API and dialplan commands added

Full list of changes can be found at
http://pyswitch.sourceforge.net/pages/changelog.html

About PySWITCH
=

PySWITCH is a Python  library to communicate with
FreeSWITCH  server via EventSocketLayer (ESL). It's
based on Twisted library. Unlike the default python ESL library that comes
with FreeSWITCH, this library is designed to handle high volume of
concurrent calls with an easy to use API. PySWITCH currently supports
extensive set of FreeSWITCH API and dialplan commands. You need to be
familiar with Twisted library inorder use this.

Project website: http://pyswitch.sf.net
Download Page: http://pyswitch.sourceforge.net/pages/download.html

-- 
Godson Gera
Python Consultant 
-- 
https://mail.python.org/mailman/listinfo/python-list

Turning string into object (name)

2015-06-25 Thread liam . oshea

Hi all,
I have something like this:

def dbconn():
#Establishes connection to local db
try:
conn = client()
db = conn.db_solar #dbname
collection = db.main # dbcollection / Table
print "Connected to database successfully!!!"
return(db,collection)
except errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e 
sys.exit()

Now I want to remove the hardcoded 'db_solar' and 'main' (from db.main) and 
load these values from a config file.

Once I have loaded in a string 'db_solar' from a config file how do I use it 
such that something like db=conn.db_solar will be constructed and run as 
expected.
Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Turning string into object (name)

2015-06-25 Thread Chris Angelico

On Fri, Jun 26, 2015 at 12:51 PM,   wrote:
> def dbconn():
> #Establishes connection to local db
> try:
> conn = client()
> db = conn.db_solar #dbname
> collection = db.main # dbcollection / Table
> print "Connected to database successfully!!!"
> return(db,collection)
> except errors.ConnectionFailure, e:
> print "Could not connect to MongoDB: %s" % e
> sys.exit()
>
> Now I want to remove the hardcoded 'db_solar' and 'main' (from db.main) and 
> load these values from a config file.
>
> Once I have loaded in a string 'db_solar' from a config file how do I use it 
> such that something like db=conn.db_solar will be constructed and run as 
> expected.
> Thanks

You can retrieve attributes using strings like this:

# Same effect:
db = conn.db_solar
db = getattr(conn, "db_solar")

So a variable attribute name can be handled the same way:

db = getattr(conn, dbname)
collection = getattr(db, dbcollection)

Incidentally, I would suggest not having the try/except at all, since
all it does is print an error and terminate (which is the same result
you'd get if that error bubbled all the way to top level). But if you
are going to use it, then I strongly recommend using the newer syntax:

except errors.ConnectionFailure as e:

unless you have some reason for supporting ancient versions of Python.
For most modern code, you can usually rely on at least 2.7, so the new
syntax works; it's unambiguous in the face of multiple-exception
handlers, and it works identically on Python 3 (the "except type,
name:" syntax isn't supported on Py3). Since it's a trivial syntactic
change, there's generally no reason to use the old form, unless you
actually need your code to run on Python 2.5.

But for what you're doing, chances are you can just let the exception
bubble up. That way, a calling function gets the choice of handling it
some other way, which currently isn't an option.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Questions on Pandas

2015-06-25 Thread Tommy C

Hi there, I have a number of questions related to the Pandas exercises found 
from the book, Python for Data Analysis by Wes McKinney. Particularly, these 
exercises are from Chapter 6 of the book. It'd be much appreciated if you could 
answer the following questions!

1.
[code]
Input: pd.read_csv('ch06/ex2.csv', header=None)
Output:
 X.1 X.2 X.3 X.4 X.5
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo

[/code]

Does the header appear as "X.#" by default when it is set to be None?

2.
[code]
Input: chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
Input: chunker
Output: 

[/code]

Please explain the idea of chunksize and the output meaning.


3.
[code]
The TextParser object returned by read_csv allows you to iterate over the parts 
of the
file according to the chunksize. For example, we can iterate over ex6.csv, 
aggregating
the value counts in the 'key' column like so:
chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
tot = Series([])
for piece in chunker:
 tot = tot.add(piece['key'].value_counts(), fill_value=0)
tot = tot.order(ascending=False)
We have then:
In [877]: tot[:10]
Out[877]:
E 368
X 364
L 346
O 343
Q 340
M 338
J 337
F 335
K 334
H 330

[/code]

I couldn't run the Series function successfully... is there something missing 
in this code?

4.
[code]
Data can also be exported to delimited format. Let's consider one of the CSV 
files read
above:
In [878]: data = pd.read_csv('ch06/ex5.csv')
In [879]: data
Out[879]:
 something a b c d message
0 one 1 2 3 4 NaN
1 two 5 6 NaN 8 world
2 three 9 10 11 12 foo

Missing values appear as empty strings in the output. You might want to denote 
them
by some other sentinel value:
In [883]: data.to_csv(sys.stdout, na_rep='NULL')
,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo


[/code]

Error occured as I tried to run this code with sys.stdout.


5.
[code]
class of csv.Dialect:
class my_dialect(csv.Dialect):
 lineterminator = '\n'
 delimiter = ';'
 quotechar = '"'
reader = csv.reader(f, dialect=my_dialect)

[/code]

An error occurred when I tried to run this code: "quotechar must be an 
1-character integer... please explain.


6.
[code]
with open('mydata.csv', 'w') as f:
 writer = csv.writer(f, dialect=my_dialect)
 writer.writerow(('one', 'two', 'three'))
 writer.writerow(('1', '2', '3'))
 writer.writerow(('4', '5', '6'))
 writer.writerow(('7', '8', '9'))
[/code]

An error occurred when I ran this code. Please explain the cause of the error.


7.
[code]
But these are objects representing HTML elements; to get the URL and link text 
you
have to use each element's get method (for the URL) and text_content method (for
the display text):
In [908]: lnk = links[28]
In [909]: lnk
Out[909]: 
In [910]: lnk.get('href')
Out[910]: 'http://biz.yahoo.com/special.html'
In [911]: lnk.text_content()
Out[911]: 'Special Editions'
Thus, getting a list of all URLs in the document is a matter of writing this 
list comprehension:
In [912]: urls = [lnk.get('href') for lnk in doc.findall('.//a')]
In [913]: urls[-10:]
Out[913]:
['http://info.yahoo.com/privacy/us/yahoo/finance/details.html',
 'http://info.yahoo.com/relevantads/',
 'http://docs.yahoo.com/info/terms/',
 'http://docs.yahoo.com/info/copyright/copyright.html',
 'http://help.yahoo.com/l/us/yahoo/finance/forms_index.html',
 'http://help.yahoo.com/l/us/yahoo/finance/quotes/fitadelay.html',
 'http://help.yahoo.com/l/us/yahoo/finance/quotes/fitadelay.html',

[/code]

An error related to line 912 occurred as I tried to run the code. Please 
explain.

8.
[code]
Using lxml.objectify, we parse the file and get a reference to the root node of 
the XML
file with getroot:
from lxml import objectify
path = 'Performance_MNR.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()

[/code]

An error occured when I tried to run the code to access the XML file. Please 
explain.


9.
[code]
ML data can get much more complicated than this example. Each tag can have 
metadata,
too. Consider an HTML link tag which is also valid XML:
from StringIO import StringIO
tag = 'http://www.google.com";>Google'
root = objectify.parse(StringIO(tag)).getroot()
You can now access any of the fields (like href) in the tag or the link text:
In [930]: root
Out[930]: 
In [931]: root.get('href')
Out[931]: 'http://www.google.com'
In [932]: root.text
Out[932]: 'Google'
[/code]

The outputs for line 930 and 931 are the same as line 932 (i.e., Google). 
Please explain


10.

[code]
One of the easiest ways to store data efficiently in binary format is using 
Python's builtin
pickle serialization. Conveniently, pandas objects all have a save method which
writes the data to disk as a pickle:
In [933]: frame = pd.read_csv('ch06/ex1.csv')
In [934]: frame
Out[934]:
 a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
In [935]: frame.save('ch06/frame_pickle')
You read the data back into Python with pandas.load, another pickle convenience
function:
In [936]: pd.load('ch06/frame_pickle')
Out[936]:
 a b c

Re: mutual coaching

2015-06-25 Thread Kev Dwyer

adham...@gmail.com wrote:

> hello anyone wants to study python? we can learn together! pm me my name
> is adham128 iam at the #programming room

Welcome to Python!

To improve your chances of finding someone who wants to learn with you, you 
might try posting your message in the Python Tutor list too: 
https://mail.python.org/mailman/listinfo/tutor

Good luck!

Kev

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Questions on Pandas

2015-06-25 Thread Steven D'Aprano

On Fri, 26 Jun 2015 02:34 pm, Tommy C wrote:

> Hi there, I have a number of questions related to the Pandas exercises
> found from the book, Python for Data Analysis by Wes McKinney.
> Particularly, these exercises are from Chapter 6 of the book. It'd be much
> appreciated if you could answer the following questions!

Too many questions for one post!

> Does the header appear as "X.#" by default when it is set to be None?

Why don't you try setting header to something else and see what happens?

> 2.
> [code]
> Input: chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
> Input: chunker
> Output: 
> 
> [/code]
> 
> Please explain the idea of chunksize and the output meaning.

The output shows that chunker is a TextParser object.

The chunksize let's you set how much data is read at once. Just like it says
in your next question:

> 3.
> [code]
> The TextParser object returned by read_csv allows you to iterate over the
> parts of the file according to the chunksize. 

Do you have access to the csv file? How many rows does it take to get the
results you see below?

> For example, we can iterate 
> over ex6.csv, aggregating the value counts in the 'key' column like so:
> chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
> tot = Series([])
> for piece in chunker:
>  tot = tot.add(piece['key'].value_counts(), fill_value=0)
> tot = tot.order(ascending=False)
> We have then:
> In [877]: tot[:10]
> Out[877]:
> E 368
> X 364
> L 346
> O 343
> Q 340
> M 338
> J 337
> F 335
> K 334
> H 330
> 
> [/code]
> 
> I couldn't run the Series function successfully... is there something
> missing in this code?

I don't know. What did you do, and what error did you get? "I couldn't run
this successfully..." could mean anything:

- is the keyboard plugged in?
- did you import Pandas?
- did you make a typo?

and about a million other possible things could have gone wrong. Unless you
tell us *what you did* and *what happened*, how can we possibly guess why
you can't run the code?

> Error occured as I tried to run this code with sys.stdout.

Again, are we supposed to guess what the error was?

And now I'm bored and stopped reading. I suggest you think a bit more
carefully about the questions you ask, and the way you ask them. Imagine
that we're not watching over your shoulder to see what you did wrong when
you get an error. Don't assume that an error means the tutorial is wrong.
Please give more detail: what you did, and COPY AND PASTE the result you
got, don't summarise it, or re-type it from memory.

Ideally, you should have one question per post, or at least no more than a
few *related* questions. No, "all part of the same tutorial" doesn't make
them related. Try to give each set of questions a descriptive subject line,
so people can keep track of what you are asking and which questions they
care about and which ones they don't.

-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

73 matches

Mail list logo