On Sat, Jun 18, 2011 at 9:34 PM, mzagu...@gmail.com <mzagu...@gmail.com> wrote: > I am wondering what your strategies are for ensuring that data > transmitted to a website via a python program is indeed from that > program, and not from someone submitting POST data using some other > means. I find it likely that there is no solution, in which case what > is the best solution for sending data to a remote server from a python > program and ensuring that it is from that program?
You're correct there: there is no solution. Everything on the other side of your network cable should be treated as hostile and spoofed. But the real question is, how much effort are people likely to go to to avoid using your program? SSL certificates are good, but they can be stolen (very easily if the client is open source). Anything algorithmic suffers from the same issue. In the example you gave, there's no solution. Someone could easily spoof it and stuff the ballot. But if you make that more difficult than the survey is worth, then you can largely trust your data. The other common reason for wanting to be sure that the far end really is your script is when you're trusting the client to do data validation. There's a solution to that one: repeat the validation on the server, and then it doesn't matter if they use your program or not. (And before you cry "Isn't that obvious?", a lot of people have completely missed that point.) In neither case can you prove what program was on the far end. You're working with network packets, so anything can be spoofed. You could go a long way toward it, though, by using something ridiculously complex, such as: * Client connects via SSL to host, using a known certificate. * Server verifies certificate, and sends client some Python code to execute. * Client verifies the server's certificate (vital!). * Client executes the code it's given, and based on the result, plus some other data, sends the server a hash value. * Server executes the same code it gave the client, knows the data it was working with, and calculates the equivalent hash. * If the two hashes match, the client is deemed to be valid. This is a variant of the usual nonce-based hashing systems, where the nonce in question is actually executable code. By randomizing the code, you can make it difficult for any non-Python program to duplicate the hash algorithm. But it still won't provide certainty, by any means. I've spent quite a bit of time this past fortnight explaining some of these concepts to my boss and one of my coworkers; they were building a rather elaborate system but didn't realise that, apart from requiring about three times as much data from /dev/random, it wasn't materially different from a simple SSL cert check... Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list