Re: how to optimize the below code with a helper function

2016-04-04 Thread Martin A. Brown
optype="set")
return addLogFilename(d, LOG_DIR)


def run_tool(logfile, **kw):
logger.info('%s would execute with %r', logfile, kw)


def addLogFilename(d, logdir):
'''put the logfile name into the test case data dictionary'''
for casename, args in d.items():
args['logfile'] = os.path.join(logdir, casename + '.log')
return d


def main():
testcases = createTestCases(LOG_DIR)
get_baddr = dict()
for casename, kw in testcases.items():
# -- yank the logfile name out of the dictionary, before calling func
logfile = kw.pop('logfile')
get_baddr[casename] = run_tool(logfile, **kw)


if __name__ == '__main__':
main()

# -- end of file


-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to optimize the below code with a helper function

2016-04-04 Thread Martin A. Brown

Hello again,

>(1)  Any tips how I can optimize this i.e test case, should have a 
>helper function that all test cases call.
>
>(2) Also note that failure.run_tool function can have variable 
>number of argments how to handle this in the helper function?

Here's a little example of how you could coerce your problem into a 
ConfigParser-style configuration file.

With this example, I'd think you could also see how to create a 
config section called [lin_02] that contains the parameters you want 
for creating that object.  Then, it's a new problem to figure out 
how to refer to that object for one of your tests.

Anyway, this is just another way of answering the question of "how 
do I simplify this repetitive code".

Good luck and enjoy,

-Martin

#! /usr/bin/python

from __future__ import absolute_import, division, print_function

import os
import sys
import collections
from ConfigParser import SafeConfigParser as ConfigParser
import logging


logging.basicConfig(stream=sys.stderr, level=logging.INFO)
logger = logging.getLogger(__name__)

LOG_DIR = '/var/log/frobnitz'

def readCfgTestCases(cfgfile):
data = collections.defaultdict(dict)
parser = ConfigParser()
parser.read(cfgfile)
for section in parser.sections():
for name, value in parser.items(section):
data[section][name] = value
return data

def main(cfgfile):
testdata = readCfgTestCases(cfgfile)
for k, v in testdata.items():
print(k, v)


if __name__ == '__main__':
main(sys.argv[1])

# -- end of file


# -- config file

[test01]
offset = 18
size = 4
object = inode
optype = set

[test02]
# -- no way to capture lin=lin_02; must reproduce contents of lin_02
object = lin
offset = 100
size = 5
optype = set


[test100]
# -- no way to capture baddr=lin_02; must reproduce contents of lin_02
object = baddr
offset = 100
size = 5
optype = set


-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Set type for datetime intervals

2016-04-04 Thread Martin A. Brown

Greetings László,

>I need to compare sets of datetime intervals, and make set 
>operations on them: intersect, union, difference etc. One element 
>of a set would be an interval like this:
>
>element ::= (start_point_in_time, end_point_in_time)
>intervalset ::= { element1, element2,  }
>
>Operations on elements:
>
>element1.intersect(element2)
>element1.union(element2)
>element1.minus(element2)
>
>Operations on sets:
>
>intervalset1.intersect(intervalset2)
>intervalset1.union(intervalset2)
>intervalset1.minus(intervalset2)
>
>Does anyone know a library that already implements these functions?

Sorry to be late to the party--I applaud that you have already 
crafted something to attack your problem.  When you first posted, 
there was a library that was tickling my memory, but I could not 
remember its (simple) name.  It occurred to me this morning, after 
you posted your new library:

  https://pypi.python.org/pypi/intervaltree

This handles overlapping ranges nicely and provides some tools for 
managing them.  Before posting this, I checked that it works with 
datetime types, and, unsurprisingly, it does.

Happy trails!

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Set type for datetime intervals

2016-04-06 Thread Martin A. Brown

>> Sorry to be late to the party--I applaud that you have already 
>> crafted something to attack your problem.  When you first posted, 
>> there was a library that was tickling my memory, but I could not 
>> remember its (simple) name.  It occurred to me this morning, after 
>> you posted your new library:
>>
>>   https://pypi.python.org/pypi/intervaltree
>>
>> This handles overlapping ranges nicely and provides some tools for 
>> managing them.  Before posting this, I checked that it works with 
>> datetime types, and, unsurprisingly, it does.
>
>Thank you! It is so much better than the one I have created. 
>Possibly I'll delete my own module from pypi. :-)

I'm glad to have been able to help, László.

And, even if you don't delete your new module, you have certainly 
stimulated quite a discussion on the mailing list.

Best regards and have a good day!

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-10 Thread Martin A. Brown

Hello Fillmore,

> Here you go:
>
> >>> a = '"string1"'
> >>> b = '"string1","string2"'
> >>> c = '"string1","string2","string3"'
> >>> ea = eval(a)
> >>> eb = eval(b)
> >>> ec = eval(c)
> >>> type(ea)
><--- HERE 
> >>> type(eb)
> 
> >>> type(ec)
> 
>
> I can tell you that it exists because it bit me in the butt today...
>
> and mind you, I am not saying that this is wrong. I'm just saying 
> that it surprised me.

Recently in one of these two threads on your question, people have 
identified why the behaviour is as it is.

Below, I will add one question (about eval) and one suggestion about 
how to circumvent the behaviour you perceive as a language 
discontinuity.

#1: I would not choose eval() except when there is no other 
solution.  If you don't need eval(), it may save you some 
headache in the future, as well, to find an alternate way.
So, can we help you choose something other than eval()?
What are you trying to do with that usage?

#2: Yes, but, you can outsmart Python here!  Simply include a 
terminal comma in each case, right?  In short, you can force
the consuming language (Python, because you are calling eval())
to understand the string as a tuple of strings, rather than 
merely one string.

>>> a = '"string1",'
>>> ea = eval(a)
>>> len(ea), type(ea)
(1, )

>>> b = '"string1","string2",'
>>> eb = eval(b)
>>> len(eb), type(eb)
(2, )

>>> c = '"string1","string2","string3",'
>>> ec = eval(c)
>>> len(ec), type(ec)
(3, )

Good luck in your continuing Python explorations,

-Martin

P.S. Where do your double-quoted strings come from, anyway?

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sys.exit(1) vs raise SystemExit vs raise

2016-04-12 Thread Martin A. Brown

Hello all,

Apologies for this post which is fundamentally, a 'me too' post, but 
I couldn't help but chime in here.

>This is good practice, putting the mainline code into a ‘main’ 
>function, and keeping the ‘if __name__ == '__main__'’ block small 
>and obvious.
>
>What I prefer to do is to make the ‘main’ function accept the
>command-line arguments, and return the exit status for the program::
>
>def main(argv):
>exit_status = EXIT_STATUS_SUCCESS
>try:
>parse_command_line(argv)
>setup_program()
>run_program()
>except SystemExit as exc:
>exit_status = exc.code
>except Exception as exc:
>logging.exception(exc)
>exit_status = EXIT_STATUS_ERROR
>
>return exit_status
>
>if __name__ == '__main__':
>exit_status = main(sys.argv)
>sys.exit(exit_status)
>
>That way, the ‘main’ function is testable like any other function: 
>specify the command line arguments, and receive the exit status. 
>But the rest of the code doesn't need to know that's happening.

This is only a riff or a variant of what Ben has written.  Here's what I like
to write:

  def run(argv):
  if program_runs_smoothly:
  return os.EX_OK
  else:
  # -- call logging, report to STDERR, or just raise an Exception
  return SOMETHING_ELSE

  def main():
  sys.exit(run(sys.argv[1:]))
  
  if __name__ == '__main__':
  main()

Why do I do this?

  * the Python program runs from CLI because [if __name__ == '__main__']
  * I can use main() as an entry point with setuptools
  * my unit testing code can pass any argv it wants to the function run()
  * the run() function never calls sys.exit(), so my tests can see what WOULD
have been the process exit code

The only change from what Ben suggests is that, once I found os.EX_OK, I just
kept on using it, instead of difining my own EXIT_SUCCESS in every program.

Clearly, in my above example the contents of the run() function look strange.
Usually it has more different kinds of stuff in it.

Anyway, best of luck!

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for feedback on weighted voting algorithm

2016-04-14 Thread Martin A. Brown

Greetings Justin,

>score = sum_of_votes/num_of_votes

>votes = [(72, 4), (96, 3), (48, 2), (53, 1), (26, 4), (31, 3), (68, 2), (91, 
>1)]

>Specifically, I'm wondering if this is a good algorithm for 
>weighted voting. Essentially a vote is weighted by the number of 
>votes it counts as. I realize that this is an extremely simple 
>algorithm, but I was wondering if anyone had suggestions on how to 
>improve it.

I snipped most of your code.  I don't see anything wrong with your 
overall approach.  I will make one suggestion: watch out for 
DivisionByZero.

try:
score = sum_of_votes / num_of_votes
except ZeroDivisionError:
score = float('nan')

In your example data, all of the weights were integers, which means 
that a simple mean function would work, as well, if you expanded the 
votes to an alternate representation:

  votes = [72, 72, 72, 72, 96, 96, 96, 48, 48, 53, 26, 26, 26, 26,
   31, 31, 31, 68, 68, 91]

But, don't bother!

Your function can handle votes that have a float weight:

  >>> weight([(4, 1.3), (1, 1),])
  2.695652173913044

Have fun!

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to track files processed

2016-04-18 Thread Martin A. Brown

Greetings,

>If you are parsing files in a directory what is the best way to 
>record which files were actioned?
>
>So that if i re-parse the directory i only parse the new files in 
>the directory?

How will you know that the files are new?

If a file has exactly the same content as another file, but a 
different name, is it new?

Often this depends on the characteristics of the system in which 
your (planned) software is operating.

Peter Otten has also asked for some more context, which would help 
us give you some tips that are more targetted to the problem you are 
trying to solve.

But, I'll just forge ahead and make some assumptions:

  * You are watching a directory for new/changed files.
  * New files are appearing regularly.
  * Contents of old files get updated and you want to know.

Have you ever seen an MD5SUMS file?  Do you know what a content hash 
is?  You could find a place to store the content hash (a.k.a. 
digest) of each file that you process.

Below is a program that should work in Python2 and Python3.  You 
could use this sort of approach as part of your solution.  In order 
to make sure you have handled a file before, you should store and 
compare two things.

  1.  The filename.
  2.  The content hash.

Note:  If you are sure the content is not going to change, then just 
use the filename to track whether you have handled something or not.

How would you use this tracking info ?

  * Create a dictionary (or a set), e.g.:

handled = dict()
handled[('410c35da37b9a25d9b5d701753b011e5','setup.py')] = time.time()

Lasts only as long as the program runs.  But, you will know
that you have handled any file by the tuple of its content hash 
and filename.

  * Store the filename (and/or digest) in a database.  So many 
options: sqlite, pickle, anydbm, text file of your own 
crafting, SQLAlchemy ...

  * Create a file, hardlink or symlink in the filesystem (in the 
same directory or another directory), e.g.:

trackingfile = os.path.join('another-directory', 'setup.py')
with open(trackingfile, 'w') as f:
f.write('410c35da37b9a25d9b5d701753b011e5')
 
  OR

os.symlink('setup.py', '410c35da37b9a25d9b5d701753b011e5-setup.py')

Now, you can also examine your little cache of handled files to 
compare for when the content hash changes.  If the system is an 
automated system, then this can be perfectly fine.  If humans 
create the files, I would suggest not doing this.  Humans tend 
to be easily confused by such things (and then want to delete 
the files or just be intimidated by them; scary hashes!).

There are lots of options, but without some more context, we can 
only make generic suggestions.  So, I'll stop with my generic 
suggestions now.

Have fun and good luck!

-Martin


#! /usr/bin/python

from __future__ import print_function

import os
import sys
import logging
import hashlib

logformat = '%(levelname)-9s %(name)s %(filename)s#%(lineno)s ' \
 + '%(funcName)s %(message)s'
logging.basicConfig(stream=sys.stderr, format=logformat, level=logging.ERROR)
logger = logging.getLogger(__name__)

def hashthatfile(fname):
contenthash = hashlib.md5()
try:
with open(fname, 'rb') as f:
contenthash.update(f.read())
return contenthash.hexdigest()
except IOError as e:
logger.warning("See exception below; skipping file %s", fname)
logger.exception(e)
return None


def main(dirname):
for fname in os.listdir(dirname):
if not os.path.isfile(fname):
logger.debug("Skipping non-file %s", fname)
continue
logger.info("Found file %s", fname)
digest = hashthatfile(fname)
logger.info("Computed MD5 hash digest %s", digest)
print('%s %s' % (digest, fname,))
    return os.EX_OK


if __name__ == '__main__':
if len(sys.argv) == 1:
sys.exit(main(os.getcwd()))
else:
sys.exit(main(sys.argv[1]))

# -- end of file


-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


manpage writing [rst, asciidoc, pod] was [Re: What should Python apps do when asked to show help?]

2016-04-29 Thread Martin A. Brown

Hello,

>What is a good place where I can find out more about writing 
>manpage?

I don't know of a single place where manpage authorship is 
particularly documented.  This seems to be one of the common target 
links.  In addition to introducing the breakdown of manpages by 
type (section) and providing some suggestions for content, it 
introduces the *roff markup:

  http://www.schweikhardt.net/man_page_howto.html

It's been many years since I have written that stuff directly.  I 
prefer one of the lightweight, general documentation or markup 
languages.  So, below, I'll mention and give examples for creating 
manpages from reStructuredtext, AsciiDoc and Plain Old 
Documentation.

With the reStructuredText format [0] [1], you can convert an .rst 
file to a manpage using two different document processors; you can 
use sphinx-build from the sphinx-project [2] or rst2man from the 
docutils project.  The outputs are largely the same (as far as I 
can tell).

There's also the AsciiDoc [3] format, which is a near to text and 
reads like text, but has a clear structure.  With the tooling 
(written in Python), you can produce docbook, latex, html and a 
bunch of other output formats.  Oh, and manpages [4], too.  There is 
a tool called 'asciidoc' which processes AsciiDoc formats into a 
variety of backend formats.  The 'a2x' tool converts AsciiDoc 
sources into some other (x) desired output.

If you don't like .rst or AsciiDoc, there's also the Plain Old 
Documentation (POD) format.  This is the oldest tool (of which I'm 
aware) which other than the direct *roff processing tools.  You run 
'pod2man' (written in Perl) on your .pod file.  POD is another dead 
simple documentation language, supported by the pod2man [5] tool.  
For more on the format, read also 'man 1 perlpod'.

sphinx-build: the sphinx documentation system is geared for 
handling project-scoped documentation and provides many additional 
features to reStructuredText.  It can produce all kinds of output 
formats, HTML single-page, help, multipage, texinfo, latex, text, 
epub and oh, yeah, manpages.  It's a rich set of tools.

If you wish to use sphinx, I can give you an example .rst file [6] 
which I recently wrote and the following instructions for how to 
process this with sphinx.  When processing docs with sphinx, a 
'conf.py' file is required.  It can be generated with an ancillary 
tool from the sphinx suite:

I know that I always find an example helpful.  So, here are some 
examples to help you launch.

  mkdir sampledir && cd sampledir
  sphinx-quickstart   # -- and answer a bunch of questions
  # -- examine conf.py and adjust to your heart's content
  #confirm that master_doc is your single document for a manpage
  #confirm that there's an entry for your document in man_pages
  sphinx-build -b man -d _build/doctrees . _build/man

  # -- or grab the files from my recent project [6] and try yourself

rst2man: even more simply, if you don't need the kitchen sink...

  wget https://gitlab.com/pdftools/pdfposter/raw/develop/pdfposter.rst
  rst2man < pdfposter.rst  > pdfposter.1
  # -- will complain about this, but still produces a manpage
  # :10: (ERROR/3) Undefined substitution referenced: "VERSION".
  man ./pdfposter.1

asciidoc (a randomly selected example asciidoc file [7]):

  wget https://raw.githubusercontent.com/DavidGamba/grepp/master/grepp.adoc
  a2x -f manpage grepp.adoc  
  man ./grepp.1

perlpod:

  wget https://api.metacpan.org/source/RJBS/perl-5.18.1/pod/perlrun.pod
  pod2man --section 1 < perlrun.pod > perlrun.1
  man ./perlrun.1

I know there are other tools for generating manpages; the 
original *roff tools, visual manpage editors, DocBook, 
help2man, manpage generators from argparse.ArgumentParser 
instances, 

And, of course, make sure to use version control for your 
documentation.  These git manpages may be helpful for the 
uninitiated (joke, joke):

  https://git-man-page-generator.lokaltog.net/  # -- humour!

Good luck,

-Martin

 [0] http://docutils.sourceforge.net/docs/user/rst/quickref.html
 [1] http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html
 [2] http://www.sphinx-doc.org/en/stable/rest.html
 [3] http://www.methods.co.nz/asciidoc/
 [4] http://www.methods.co.nz/asciidoc/chunked/ch24.html
 [5] http://perldoc.perl.org/pod2man.html
 [6] 
https://raw.githubusercontent.com/tLDP/python-tldp/master/docs/ldptool-man.rst
 [7] https://raw.githubusercontent.com/DavidGamba/grepp/master/grepp.adoc

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: redirecting stdout and stderr to /dev/null

2016-05-07 Thread Martin A. Brown

Hello there,

>I'm new to python but well versed on other languages such as C and 
>Perl
>
>I'm have problems redirecting stdout and stderr to /dev/null in a 
>program that does a fork and exec. T found this method googling 
>around and it is quite elegant compared to to the Perl version.
>
>So to isolate things I made a much shorter test program and it 
>still is not redirecting. What am I doing wrong?
>
>test program test.py
>- cut here ---
>import sys
>import os
>
>f = open(os.devnull, 'w')
>sys.stdout = f
>sys.stderr = f
>os.execl("/bin/ping", "",  "-w", "20",  "192.168.1.1");
>-- cut here ---

Think about the file descriptors.

Unix doesn't care what the name is, rather that the process inherits 
the FDs from the parent.  So, your solution might need to be a bit 
more complicated to achieve what you desire.  Run the following to 
see what I mean.

  realstdout = sys.stdout
  realstderr = sys.stderr
  f = open(os.devnull, 'w')
  sys.stdout = f
  sys.stderr = f
  
  print("realstdout FD: %d" % (realstdout.fileno(),), file=realstdout)
  print("realstderr FD: %d" % (realstderr.fileno(),), file=realstdout)
  print("sys.stdout FD: %d" % (sys.stdout.fileno(),), file=realstdout)
  print("sys.stderr FD: %d" % (sys.stderr.fileno(),), file=realstdout)

That should produce output that looks like this:

  realstdout FD: 1
  realstderr FD: 2
  sys.stdout FD: 3
  sys.stderr FD: 3

I hope that's a good hint...

I like the idea of simply calling the next program using one of the 
exec() variants, but you'll have to adjust the file descriptors, 
rather than just the names used by Python.

If you don't need to exec(), but just run a child, then here's the 
next hint (this is for Python 3.5):

  import subprocess
  cmd = ["ping", "-w", "20",  "192.168.1.1"]
  devnull = subprocess.DEVNULL
  proc = subprocess.run(cmd, stdout=devnull, stderr=devnull)
  proc.check_returncode()

(By the way, your "ping" command looked like it had an empty token 
in the second arg position.  Looked weird to me, so I removed it 
in my examples.)

For subprocess.run, see:

  https://docs.python.org/3/library/subprocess.html#subprocess.run

For earlier Python versions without run(), you can use Popen():

  import subprocess
  cmd = ["/bin/ping", "-w", "20",  "192.168.1.1"]
  devnull = subprocess.DEVNULL
  proc = subprocess.Popen(cmd, stdout=devnull, stderr=devnull)
  retcode = proc.wait()
  if retcode != 0:
  raise FlamingHorribleDeath

You will have to define FlamingHorribleDeath or figure out what you 
want to do in the event of the various different types of 
failureif you don't then, you'll just see this:

  NameError: name 'FlamingHorribleDeath' is not defined

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Average calculation Program *need help*

2016-05-12 Thread Martin A. Brown

Greetings kobsx4,

>Hello all, I have been struggling with this code for 3 hours now 
>and I'm still stumped. My problem is that when I run the following 
>code:

>--
>#this function will get the total scores
>def getScores(totalScores, number):
>for counter in range(0, number):
>score = input('Enter their score: ')
>totalScores = totalScores + score
>
>while not (score >= 0 and score <= 100):
>
>print "Your score must be between 0 and 100."
>score = input('Enter their score: ')
>
>
>
>return totalScores
>--

>the program is supposed to find the average of two test scores and 
>if one of the scores is out of the score range (0-100), an error 
>message is displayed. The main problem with this is that when 
>someone types in a number outside of the range, it'll ask them to 
>enter two scores again, but ends up adding all of the scores 
>together (including the invalid ones) and dividing by how many 
>there are. Please help.

Suggestion #1: 
--
When you are stuck on a small piece of code, set it aside (stop 
looking at it) and start over again; sometimes rewriting with 
different variable names and a clean slate helps to highlight the 
problem.

Professional programmers will tell you that they are quite 
accustomed to 'throwing away' code.  Don't be afraid to do it.

(While you are still learning, you might want to keep the old chunk 
of code around to examine so that you can maybe figure out what you 
did wrong.)
  

Suggestion #2:
--
Put a print statement or two in the loop, so that you see how your 
variables are changing.  For example, just before your 'while' line, 
maybe something like:

  print "score=%d totalScores=%d" % (score, totalScores,)


Suggestion #3:
--
Break the problem down even smaller (Rustom Mody appears to have 
beat me to the punch on that suggestion, so I'll just point to his 
email.)


Hint #1:
----
What is the value of your variable totalScores each time through the 
loop?  Does it ever get reset?

Good luck with your degubbing!

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: raise None

2015-12-31 Thread Martin A. Brown

Hi there,

>>> At worst it actively misleads the user into thinking that there 
>>> is a bug in _validate.

Is this "user" a software user or another programmer?

If a software user, then some hint about why the _validate found 
unacceptable data might benefit the user's ability to adjust inputs 
to the program.

If another programmer, then that person should be able to figure it 
out with the full trace.  Probably it's not a bug in _validate, but 
it could be.  So, it could be a disservice to the diagnostician 
to exempt the _validate function from suspicion.  Thus, I'd want to 
see _validate in the stack trace.

>Maybe. As I have suggested a number of times now, I'm aware that 
>this is just a marginal issue.
>
>But I think it is a real issue. I believe in beautiful tracebacks 
>that give you just the right amount of information, neither too 
>little nor two much. Debugging is hard enough with being given more 
>information than you need and having to decide what bits to ignore 
>and which are important.

I agree about tracebacks that provide the right amount of 
information.  If I were a programmer working with the code you are 
describingi, I would like to know in any traceback that the failed 
comparisons (which implement some sort of business logic or sanity 
checking) occurred in the _validate function.

In any software system beyond the simplest, code/data tracing would 
be required to figure out where the bad data originated.

Since Python allows us to provide ancillary text to any exception, 
you could always provide a fuller explanation of the validation 
failure.  And, while you are at it, you could add the calling 
function name to the text to point the programmer faster toward the 
probable issue.

Adding one optional parameter to _validate (defaulting to the 
caller's function name) would allow you to point the way to a 
diagnostician.  Here's a _validate function I made up with two silly 
comparision tests--where a must be greater than b and both a and b 
must not be convertible to integers.

  def _validate(a, b, func=None):
if not func:
func = sys._getframe(1).f_code.co_name
if a >= b:
raise ValueError("a cannot be larger than b in " + func)
if a == int(a) or b == int(b):
raise TypeError("a, b must not be convertible to int in " + func)

My main point is less about identifying the calling function or its 
calling function, but rather to observe that arbitrary text can be 
used.  This should help the poor sap (who is, invariably, diagnosing 
the problem at 03:00) realize that the function _validate is not the 
problem.

>The principle is that errors should be raised as close to their 
>cause as possible. If I call spam(a, b) and provide bad arguments, 
>the earliest I can possibly detect that is in spam. (Only spam 
>knows what it accepts as arguments.) Any additional levels beyond 
>spam (like _validate) is moving further away:
>
>  File "spam", line 19, in this
>  File "spam", line 29, in that  <--- where the error really lies
>  File "spam", line 39, in other
>  File "spam", line 89, in spam  <--- the first place we could detect it
>  File "spam", line 5, in _validate  <--- where we actually detect it

Yes, indeed!  Our stock in trade.  I never liked function 'that'.  I 
much prefer function 'this'.

-Martin

  Q:  Who is Snow White's brother?
  A:  Egg white.  Get the yolk?

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help on return type(?)

2016-01-09 Thread Martin A. Brown

Hello there,

>> def make_cov(cov_type, n_comp, n_fea):
>> mincv = 0.1
>> rand = np.random.random
>> return {
>> 'spherical': (mincv + mincv * np.dot(rand((n_components, 1)),
>>  np.ones((1, n_features **
>>  2,
>> 'tied': (make_spd_matrix(n_features)
>>  + mincv * np.eye(n_features)),
>> 'diag': (mincv + mincv * rand((n_components, n_features))) ** 2,
>> 'full': np.array([(make_spd_matrix(n_features)
>>+ mincv * np.eye(n_features))
>>   for x in range(n_components)])
>> }[cov_type]
>> 
>> Specifically, could you explain the meaning of
>> 
>> {
>> ...}[cov_type]
>> 
>> to me?
>
>It is a dictionary lookup. { ... } sets up a dictionary with keys
>
>'spherical'
>'tied'
>'diag'
>'full'
>
>then { ... }[cov_type] extracts one of the values depending on 
>whether cov_type is 'spherical', 'tied', 'diag', or 'full'.

You will see that Steven has answered your question.  I will add to 
his answer.

Your original function could be improved many ways, but especially 
in terms of readability.  Here's how I might go at improving the 
readability, without understanding anything about the actual 
computation.
  
  def make_cov_spherical(mincv, n_components, n_features):
  return (mincv + mincv * np.dot(np.random.random((n_components, 1)), 
np.ones((1, n_features ** 2
 
  def make_cov_diag(mincv, n_components, n_features):
  return (mincv + mincv * np.random.random((n_components, n_features))) ** 2
 
  def make_cov_tied(mincv, n_components, n_features):
  return make_spd_matrix(n_features) + mincv * np.eye(n_features)
 
  def make_cov_full(mincv, n_components, n_features):
  return np.array([(make_spd_matrix(n_features) + mincv * 
np.eye(n_features)) for x in range(n_components)])
 
  def make_cov(cov_type, n_comp, n_fea):
  mincv = 0.1
  dispatch_table = {
  'spherical': make_cov_spherical,
  'tied': make_cov_tied,
  'diag': make_cov_diag,
  'full': make_cov_full,
  }
  func = dispatch_table[cov_type]
  return func(mincv, n_comp, n_fea)

Some thoughts (and reaction to the prior code):

  * Your originally posted code referred to n_comp and n_fea in the 
signature, but then used n_components and n_features in the 
processing lines.  Did this function ever work?

  * Individual functions are easier to read and understand.  I would
find it easier to write testing code (and docstrings) for these 
functions, also.

  * The assignment of a name (rand = np.random.random) can make 
sense, but I think it simply shows that the original function 
was trying to do too many things and was hoping to save space
with this shorter name for the np.random.random.  Not bad, but I 
dropped it anyway for the sake of clarity.

  * Each of the above functions (which I copied nearly verbatim) 
could probably now be broken into one or two lines.  That would 
make the computation even clearer.

  * There may be a better way to make a function dispatch table than 
the one I demonstrate above, but I think it makes the point 
nicely.

  * If you break the individual computations into functions, then 
you only run the specific computation when it's needed.  In the 
    original example, all of the computations were run AND then, one 
of the results was selected.  It may not matter, since computers 
are so fast, but, practicing basic parsimony can avoid little 
obvious performance hazards like this.

  * In short, longer, but much much clearer.

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ignoring or replacing white lines in a diff

2016-01-14 Thread Martin A. Brown

Hello Adriaan,

>Maybe someone here has a clue what is going wrong here? Any help is 
>appreciated.

Have you tried out this tool that does precisely what you need? to 
do yourself?

  https://pypi.python.org/pypi/xmldiff

I can't vouch specifically for it, am simply a user, but I know that 
I have used it happily in the past.  (Other CLI tools, include 
non-Python tools, such as xmllint, which can produce a predictable, 
reproducible XML formatting, too.)

>I'm writing a regression test for a module that generates XML.

Very good.  Good == Testing.

>I'm using diff to compare the results with a pregenerated one from an
>earlier version.

[
Interesting.  I can only speculate randomly about the whitespace 
issue.  Have you examined (with the CLI tools hexdump, od or your 
favorite byte dumper) the two different XML outputs?
]

Back to the lands of Python

>  cmd   = ["diff", "-w", "-I '^[[:space:]]*$'", "./xml/%s.xml" % name, 
> "test.xml"]

It looks like a quoting issue.  I think you are passing the 
following tokens to your OS.  You should be able to run your Python 
program under a system call tracer to see what is actually getting 
exec()d.

I'm accustomed to using strace, but it seems that Macintosh uses 
dtruss.  Anyway, I think your cmd is turning into this (as for as 
your kernel is concerned):

   token 1: diff
   token 2: -w
   token 3: -I '^[[:space:]]*$'
   token 4: ./xml/name.xml
   token 5: test.xml

Try this (untested):

>  cmd = ["diff", "-w", "-I", "^[[:space:]]*$", "./xml/%s.xml" % name, 
> "test.xml"]

But, perhaps the xmldiff module will be what you want.

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-02-02 Thread Martin A. Brown

Hello,

>http://www.barnesandnoble.com/w/perl-to-python-migration-martin-c-brown/1004847881?ean=9780201734881
>
>Given that this was published in 2001, surely it is time for a 
>second edition.

How many times do you think somebody migrates from Perl to Python?!

  ;)

-Martin

P.S.  I was amused when I first discovered (about 15 years ago)
   Martin C. Brown, an author of Perl books.  I am also amused to 
   discover that he has written one on Python.  Too many of us
   chaps named 'Martin Brown'.

 https://en.wikipedia.org/wiki/Radio_Active_(radio_series)
 the incompetent hospital-radio trained Martin Brown (Stevens)

P.P.S.  In case it is not utterly clear, I am not the above author.

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Exception handling for socket.error in Python 3.5/RStudio

2016-02-05 Thread Martin A. Brown

>except socket.error as e

>line 53 except socket.error as e ^ SyntaxError: invalid syntax
>
>I tried changing socket.error to ConnectionRefusedError. and still 
>got the same error.

>Please tell me if the problem is with Rstudio, Python version or 
>the syntax.

Syntax.

Your code has, unfortunately, suffered a colonectomy.

When you transplant a colon, it is more likely to function properly 
again.  For example:

   except socket.error as e:

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Exception handling for socket.error in Python 3.5/RStudio

2016-02-05 Thread Martin A. Brown

Hi there Shaunak,

I saw your few replies to my (and Nathan's) quick identification of 
syntax error.  More comments follow, here.

>I am running this python script on R-studio. I have Python 3.5 installed on my 
>system.
>
>count = 10
>while (count > 0):
>try :
># read line from file:
>print(file.readline())
># parse
>parse_json(file.readline())
>count = count - 1
>except socket.error as e
>print('Connection fail', e)
>print(traceback.format_exc())
>
># wait for user input to end
># input("\n Press Enter to exit...");
># close the SSLSocket, will also close the underlying socket
>ssl_sock.close()
>
>The error I am getting is here:
>
>line 53 except socket.error as e ^ SyntaxError: invalid syntax
>
>I tried changing socket.error to ConnectionRefusedError. and still got the 
>same error.

We were assuming that line 53 in your file is the part you pasted 
above.  That clearly shows a syntax error (the missing colon).

If, after fixing that error, you are still seeing errors, then the 
probable explanations are:

  * you are not executing the same file you are editing

  * there is a separate syntax error elsewhere in the file (you sent
us only a fragment)

Additional points:

  * While the word 'file' is not reserved in Python 3.x, it is in 
Python 2.x, so, just be careful when working with older Python 
versions.  You could always change your variable name, but you 
do not need to.

  * When you catch the error in the above, you print the traceback 
information, but your loop will continue.  Is that what you 
desired?

I might suggest saving your work carefully and make sure that you 
are running the same code that you are working on.  Then, if you 
are still experiencing syntax errors, study the lines that the 
interpreter is complaining about.  And, of course, send the list an 
email.

Best of luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Exception handling for socket.error in Python 3.5/RStudio

2016-02-05 Thread Martin A. Brown

Hi there,

>Thanks for the detailed reply. I edited, saved and opened the file 
>again. Still I am getting exactly the same error.
>
>Putting bigger chunk of code and the error again:

[snipped; thanks for the larger chunk]

>Error:
>except socket.error as e:
> ^
>SyntaxError: invalid syntax

I ran your code.  I see this:

  $ python3 shaunak.bangale.py 
  Connecting...
  Connection succeeded
  Traceback (most recent call last):
File "shaunak.bangale.py", line 23, in 
  ssl_sock.write(bytes(initiation_command, 'UTF-8'))
  NameError: name 'initiation_command' is not defined

Strictly speaking, I don't think you are having a Python problem.

  * Are you absolutely certain you are (or your IDE is) executing 
the same code you are writing?

  * How would you be able to tell?  Close your IDE.  Run the code on 
the command-line.

  * How much time have you taken to work out what the interpreter is 
telling you?

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Suggested datatype for getting latest information from log files

2016-02-11 Thread Martin A. Brown
;: ")
pprint.pprint(marblehistory)

if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
count = int(sys.argv[1])
else:
count = 30
marblegame(count)

# -- end of file

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Make a unique filesystem path, without creating the file

2016-02-14 Thread Martin A. Brown

Good evening/morning Ben,

>> > I am unconcerned with whether there is a real filesystem entry of
>> > that name; the goal entails having no filesystem activity for this.
>> > I want a valid unique filesystem path, without touching the
>> > filesystem.
>>
>> Your phrasing is ambiguous.
>
>The existing behaviour of ‘tempfile.mktemp’ – actually of its 
>internal class ‘tempfile._RandomNameSequence’ – is to generate 
>unpredictable, unique, valid filesystem paths that are different 
>each time.
>
>That's the behaviour I want, in a public API that exposes what 
>‘tempfile’ already has implemented, documented in a way that 
>doesn't create a scare about security.

If your code is not actually touching the filesystem, then it will 
not be affected by the race condition identified in the 
tempfile.mktemp() warning anyway.  So, I'm unsure of your worry.

>> But if you explain in more detail why you want this filename, perhaps
>> we can come up with some ideas that will help.
>
>The behaviour is already implemented in the standard library. What 
>I'm looking for is a way to use it (not re-implement it) that is 
>public API and isn't scolded by the library documentation.

I might also suggest the (bound) method _create_tmp() on class 
mailbox.Maildir, which achieves roughly the same goals, but for a 
permanent file.

Of course, that particular method also touches the filesystem.  The 
Maildir naming approach is based on the assumptions* that time is 
monotonically increasing, that system nodes never share the same 
name and that you don't need more than 1 uniquely named file per 
directory per millisecond.

If so, then you can use the 9 or 10 lines of that method.

Good luck,

-Martin

  * I was tempted to joke about these two guarantees, but I think 
that undermines my basic message.  To wit, you can probably rely 
on this naming technique about as much as you can rely on your 
system clock.  I'll assume that you aren't naming all of your 
nodes 'franklin.p.gundersnip'.

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: asyncio - run coroutine in the background

2016-02-20 Thread Martin A. Brown

Hello there,

I realize that this discussion of supporting asynchronous name 
lookup requests in DNS is merely a detour in this thread on asyncio, 
but I couldn't resist mentioning an existing tool.

>> getaddrinfo is a notorious pain but I think it's just a library 
>> issue; an async version should be possible in principle.  How 
>> does Twisted handle it?  Does it have a version?
>
>In a (non-Python) program of mine, I got annoyed by synchronous 
>name lookups, so I hacked around it: instead of using the regular 
>library functions, I just do a DNS lookup directly (which can then 
>be event-based - send a UDP packet, get notified when a UDP packet 
>arrives). Downside: Ignores /etc/nsswitch.conf and /etc/hosts, and 
>goes straight to the name server. Upside: Is able to do its own 
>caching, since the DNS library gives me the TTLs, but 
>gethostbyname/getaddrinfo won't.

Another (non-Python) DNS name lookup library that does practically 
the same thing (along with the shortcomingsn you mentioned, Chris: 
no NSS nor /etc/hosts) is the adns library.  Well, it is DNS, after 
all.

  http://www.gnu.org/software/adns/
  https://pypi.python.org/pypi/adns-python/1.2.1

And, there are Python bindings.  I have been quite happy using the 
adns tools (and tools built on the Python bindings) for mass lookups 
(millions of DNS names).  It works very nicely.

Just sharing knowledge of an existing tool,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Network Simulator

2016-02-24 Thread Martin A. Brown

>Hi...I need help to design a network simulator consisting for 5 
>routers in python...Any help would be appretiated...

Have you looked at existing network simulators?

On two different ends of the spectrum are:

  Switchyard, a small network simulator intended for pedagogy
  https://github.com/jsommers/switchyard

  NS-3, the researcher's toolkit
  https://www.nsnam.org/
  https://www.nsnam.org/wiki/Python_bindings

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tcp networking question (CLOSE_WAIT)

2016-02-25 Thread Martin A. Brown

>I'm new to python networking. I am waiting TCP server/client app by 
>using python built-in SocketServer. My problem is if client get 
>killed, then the tcp port will never get released, in CLOSE_WAIT

I did not thoroughly review your code (other than to see that you 
are not using SO_REUSEADDR).  This is the most likely problem.

Suggestion:

  man 7 socket

Look for SO_REUSEADDR.  Then, apply what you have learned to your 
code.

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tcp networking question (CLOSE_WAIT)

2016-02-25 Thread Martin A. Brown

Hello again Ray,

>> >I'm new to python networking. I am waiting TCP server/client app by 
>> >using python built-in SocketServer. My problem is if client get 
>> >killed, then the tcp port will never get released, in CLOSE_WAIT
>> 
>> I did not thoroughly review your code (other than to see that you 
>> are not using SO_REUSEADDR).  This is the most likely problem.
>>
>> Suggestion:
>> 
>>   man 7 socket
>> 
>> Look for SO_REUSEADDR.  Then, apply what you have learned to your 
>> code.
>
>it's not I can't bind the address, my problem is: server is long 
>run. if client die without "disconnect" then server will leak one 
>socket.

Sorry for my trigger-happy, and incorrect reply.

After so many years, I should know better than to reply without 
completely processing questions.  Apologies.

>by using the built-in thread socket server. the extra tcp port are 
>opened by built-in class itself. if the handler() is finish 
>correctly (the line with break) then this socket will get cleaned 
>up. but if client dies, then I am never get out from that True 
>loop. so the socket will keep in close_wait
>
>I fond the issue. it's my own stupid issue.
>i did "continue" if no data received.
>just break from it then it will be fine

Well, I'm glad you found the issue.

Best of luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: common mistakes in this simple program

2016-02-29 Thread Martin A. Brown
tCatchTVError('str')
  42
  >>> altCatchTVError(dict())
  -42


Interlude and recommendation


As you can see, there are many possible Exceptions that can be raised when you
are calling a simple builtin function, int().

Consider now what may happen when you call out to a different program; you
indicated that your run() function calls out to subprocess.Popen().  There are
many more possible errors that can occur, just a few that can come to my mind:

  * locating the program on disk
  * setting up the file descriptors for the child process
  * fork()ing and exec()ing the program
  * memory issues
  * filesystem disappears (network goes away or block device failure)

Each one of these possible errors may translate to a different exception.  You
have been tempted to do:

  try:
  run()
  except:
  pass

This means that, no matter what happens, you are going to try to keep
continuing, even in the face of massive failure.

To (#1) improve the safety of your program and the environments in 
which it operates, to (#2) improve your defensive programming 
posture and to (#3) avoid frustrating your own debugging at some 
point in the future, you would be well-advised to identify which 
specific exceptions you want to ignore.

As you first try to improve the resilience of your program, you may 
not be certain which exceptions you want to catch and which 
represent a roadblock for your progam.  This is something 
that usually comes with experience. 

To get that experience you can define your own exception (it'll 
never get raised unless you raise it, so do not worry).  Then, 
create your try-except block to catch only that one.  As you 
encounter other exception that you are certain you wish to handle, 
you can do something with them:

  class UnknownException(Exception):
  pass


  def prep_host():
  """
  Prepare clustering
  """
  for cmd in ["ls -al",
  "touch /tmp/file1",
  "mkdir /tmp/dir1"]:
  try:
  if not run_cmd_and_verify(cmd, timeout=3600):
  logging.info("Preparing cluster failed ...")
  return False
  except (UnknownException,):
  pass
  logging.info("Preparing Cluster.Done !!!")
  return True

Now, as you develop your program and encounter new exceptions, you 
can add new except clauses to the above block with appropriate 
handling, or (re-)raising the caught exception.


Comments on shelling out to other programs and using exceptions 
--- 
Exceptions are great for catching logic errors, type errors, 
filesystem errors and all manner of other errors within Python 
programs and runtime environments.  You introduce a significant 
complexity the moment you fork a child (calling subprocess.Popen).

It is good, though, that you are testing the return code of the 
cmd that you pass to the run() function.


Final advice:
-
Do not use a bare try-except.

You will frustrate your own debugging and your software may end up 
trying to excecute code paths (or external programs, as you are 
doing right now) for which you were sanity checking.

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Caching function results

2016-03-03 Thread Martin A. Brown

Greetings Pavel,

> Suppose, I have some resource-intensive tasks implemented as 
> functions in Python. Those are called repeatedly in my program. 
> It's guranteed that a call with the same arguments always produces 
> the same return value. I want to cache the arguments and return 
> values and in case of repititive call immediately return the 
> result without doing expensive calculations.

Great problem description.  Thank you for being so clear.

[I snipped sample code...]

This is generically called memoization.

> Do you like this design or maybe there's a better way with 
> Python's included batteries?

In Python, there's an implementation available for you in the 
functools module.  It's called lru_cache.  LRU means 'Least Recently 
Used'.

> I'd also like to limit the size of the cache (in MB) and get rid 
> of old cached data. Don't know how yet.

You can also limit the size of the lru_cache provided by the 
functools module.  For this function, the size is calculated by 
number of entries--so you will need to figure out memory size to 
cache entry count.

Maybe others who have used functools.lru_cache can help you with how 
they solved the problem of mapping entry count to memory usage.

Good luck,

-Martin

 [0] https://docs.python.org/3/library/functools.html#functools.lru_cache

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Simple exercise

2016-03-10 Thread Martin A. Brown

>>> for i in range(len(names)):
>>> print (names[i],totals[i])
>>
>> Always a code smell when range() and len() are combined.
>
> Any other way of traversing two lists in parallel?

Yes.  Builtin function called 'zip'.

  https://docs.python.org/3/library/functions.html#zip

Toy example:

  import string
  alpha = string.ascii_lowercase
  nums = range(len(alpha))
  for N, A in zip(nums, alpha):
  print(N, A)

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue with csv module (subject module name spelling correction, too)

2016-03-11 Thread Martin A. Brown

Good afternoon Fillmore,

>>>> import csv
>>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>>> reader = csv.reader([s], delimiter='\t')

> How do I instruct the reader to preserve my doublequotes?

Change the quoting used by the dialect on the csv reader instance:

  reader = csv.reader([s], delimiter='\t', quoting=csv.QUOTE_NONE)

You can use the same technique for the writer.

If you cannot create your particular (required) variant of csv by 
tuning the available parameters in the csv module's dialect control, 
I'd be a touch surprised, but, it is possible that your other csv
readers and writers are more finicky.

Did you see the parameters that are available to you for tuning how 
the csv module turns your csv data into records?

  https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters

Judging from your example, you definitely want to use 
quoting=csv.QUOTE_NONE, because you don't want the module to do much 
more than split('\t').

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Perl to Python again

2016-03-11 Thread Martin A. Brown

Good afternoon Fillmore,

> So, now I need to split a string in a way that the first element 
> goes into a string and the others in a list:
>
> while($line = ) {
>
>my ($s,@values)  = split /\t/,$line;
>
> I am trying with:
>
> for line in sys.stdin:
>s,values = line.strip().split("\t")
>print(s)
>
> but no luck:
>
> ValueError: too many values to unpack (expected 2)

That means that the number of items on the right hand side of the 
assignment (returned from the split() call) did not match the number 
of variables on the left hand side.

> What's the elegant python way to achieve this?

Are you using Python 3?

  s = 'a,b,c,d,e'
  p, *remainder = s.split(',')
  assert isinstance(remainder, list)

Are you using Python 2?

  s = 'a,b,c,d,e'
  remainder = s.split(',')
  assert isinstance(remainder, list)
  p = remainder.pop(0)

Aside from your csv question today, many of your questions could be 
answered by reading through the manual documenting the standard 
datatypes (note, I am assuming you are using Python 3).

  https://docs.python.org/3/library/stdtypes.html

It also sounds as though you are applying your learning right away.  
If that's the case, you might also benefit from reading through all 
of the services that are provided in the standard library with 
Python:

  https://docs.python.org/3/library/

In terms of thinking Pythonically, you may benefit from:

  The Python Cookbook (O'Reilly)
  http://shop.oreilly.com/product/0636920027072.do

  Python Module of the Week
  https://pymotw.com/3/

I'm making those recommendations because I know and have used these 
and also because of your Perl background.

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: retrieve key of only element in a dictionary (Python 3)

2016-03-18 Thread Martin A. Brown

>> But, I still don't understand why this works and can't puzzle it
>> out.  I see a sequence on the left of the assignment operator and a
>> dictionary (mapping) on the right.
>
>When you iterate over a dictionary, you get its keys:
>
>scores = {"Fred": 10, "Joe": 5, "Sam": 8}
>for person in scores:
>print(person)
>
>So unpacking will give you those keys - in an arbitrary order. Of 
>course, you don't care about the order when there's only one.

Oh, right!  Clearly, it was nonintuitive (to me), even though I've 
written 'for k in d:' many times.

A sequence on the left hand side of an assignment, will tell the 
right hand side to iterate.

This also explains something I never quite bothered to understand 
completely, because it was so obviously wrong:

  >>> a, b = 72
  TypeError: 'int' object is not iterable

The sequence on the left hand side signals that it expects the 
result of iter(right hand side).  But, iter(72) makes no sense, so 
Python says TypeError.  I'd imagine my Python interpreter is 
thinking "Dude, why are you telling me to iterate over something 
that is so utterly not iterable.  Why do I put up with these 
humans?"

I love being able to iterate like this:

  for k in d:
  do_something_with(k)

But, somehow, this surprised me:

  [k] = d

Now that I get it, I would probably use something like the below.  
I find the addition of a few characters makes this assignment much 
clearer to me.

  # -- if len(d) > 1, ValueError will be raised
  #
  (key,) = d.keys()  

And thank you for the reply Chris,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: retrieve key of only element in a dictionary (Python 3)

2016-03-19 Thread Martin A. Brown

OK, so ... I'll bite!

>>> d = {"squib": "007"}
>>> key, = d

Why exactly does this work?

I understand why the following three are similar and why they all 
work alike in this situation:

   key, = d
   (key,) = d
   [key] = d

I also, intuitively understand that, if the dictionary d contains 
more than 1 key, that the above assignments would cause:

  ValueError: too many values to unpack

But, I still don't understand why this works and can't puzzle it 
out.  I see a sequence on the left of the assignment operator and a 
dictionary (mapping) on the right.

I looked through the dunder methods [0], but none of them explained 
this, apparently, left-hand-side context-sensitive, behaviour to me.

Could somebody explain?

-Martin

  [0] for dict(), I found:  __cmp__, __contains__, __delitem__, 
  __eq__, __ge__, __getattribute__, __getitem__, __gt__, 
  __init__, __iter__, __le__, __len__, __lt__, __ne__, 
      __repr__, __setitem__ and __sizeof__
-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beginner Python Help

2016-03-19 Thread Martin A. Brown

Greetings Alan and welcome to Python,

>I just started out python and I was doing a activity where im 
>trying to find the max and min of a list of numbers i inputted.
>
>This is my code..
>
>num=input("Enter list of numbers")
>list1=(num.split())
>
>maxim= (max(list1))
>minim= (min(list1))
>
>print(minim, maxim)
>
>So the problem is that when I enter numbers with an uneven amount 
>of digits (e.g. I enter 400 20 36 85 100) I do not get 400 as the 
>maximum nor 20 as the minimum. What have I done wrong in the code?

I will make a few points, as will probably a few others who read 
your posting.

  * [to answer your question] the builtin function called input [0]
returns a string, but you are trying to get the min() and max() 
of numbers; therefore you must convert your strings to numbers

You can determine if Python thinks the variable is a string or 
a number in two ways (the interactive prompt is a good place to
toy with these things).  Let's look at a string:

  >>> s = '200 elephants'
  >>> type(s) # what type is s?
 # oh! it's a string
  >>> s   # what's in s?
  '200 elephants' # value in quotation marks!

   The quotation marks are your clue that this is a string, not a 
   number; in addition to seeing the type.  OK, so what about a 
   number, then?  (Of course, there are different kinds of numbers, 
   complex, real, float...but I'll stick with an integer here.)

  >>> n = 42
  >>> type(n) # what type is n?
 # ah, it's an int (integer)
  >>> n   # what's in n?
  42  # the value

  * Now, perhaps clearer?  max(['400', '20', '36', '85', '100'])
is sorting your list of strings lexicographically instead of 
numerically (as numbers); in the same way that the string 
'rabbit' sorts later than 'elephant', so too does '85' sort 
later than '400'

  * it is not illegal syntax to use parentheses as you have, but you
are using too many in your assignment lines; I'd recommend 
dropping that habit before you start; learn when parentheses are 
useful (creating tuples, calling functions, clarifying 
precedence); do not use them here:

   list1 = (num.split())  # -- extraneous and possibly confusing
   list1 = num.split()# -- just right

  * also, there is also Tutor mailing list [1] devoted to helping 
with Python language acquisition (discussions on this main list 
can sometimes be more involved than many beginners wish to read)

I notice that you received several answers already, but I'll finish 
this reply and put your sample program back together for you:

  num = input("Enter list of numbers: ")
  list1 = list(map(int, num.split()))
  print(list1)
  maxim = max(list1)
  minim = min(list1)
  print(minim, maxim)

You may notice that map [2] function in there.  If you don't 
understand it, after reading the function description, I'd give you 
this example for loop that produces the same outcome.

  list1 = list()
  for n in num.split():
  list1.append(int(n))

The map function is quite useful, so it's a good one to learn early.

Good luck,

-Martin

 [0] https://docs.python.org/3/library/functions.html#input
 [1] https://mail.python.org/mailman/listinfo/tutor/
 [2] https://docs.python.org/3/library/functions.html#map

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Path when reading an external file

2016-03-28 Thread Martin A. Brown

Greetings,

> In a program "code.py" I read an external file "foo.txt" supposed 
> to be located in the same directory that "code.py"
>
> python/src/code.py
> python/src/foo.txt
>
> In "code.py": f = open('foo.txt', 'r')
>
> But if I run "python code.py" in an other dir than src/ say in 
> python/, it will not work because file "foo.txt" will be searched 
> in dir python/ and not in dir python/src/
>
> I think it is possible to build an absolute path for "foo.txt" 
> using __file__ so that the program works wherever you launch 
> "python code.py"
>
> Is it the correct way to handle this problem ?

Ayup, I would say so.  My suggested technique:

  here = os.path.dirname(os.path.abspath(__file__))
  foo = os.path.join(here, 'foo.txt')
  with open(foo, 'r') as f:
  pass

Good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Most space-efficient way to store log entries

2015-10-28 Thread Martin A. Brown

Hello Marc,

I think you have gotten quite a few answers already, but I'll add my 
voice.

> I'm writting an application that saves historical state in a log 
> file.

If I were in your shoes, I'd probably use the logging module rather 
than saving state in my own log file.  That allows the application 
to send all historical state to the system log.  Then, it could be 
captured, recorded, analyzed and purged (or neglected) along with 
all of the other logging.

But, this may not be appropriate for your setup.  See also my final 
two questions at the bottom.

> I want to be really efficient in terms of used bytes.

It is good to want to be efficient.  Don't cost your (future) self 
or some other poor schlub future working or computational 
efficiency, though!

Somebody may one day want to extract utility out of the 
application's log data.  So, don't make that data too hard to read.

> What I'm doing now is:
> 
> 1) First use zlib.compress

... assuming you are going to write your own files, then, certainly.

If you also want better compression (quantified in a table below) at 
a higher CPU cost, try bz2 or lzma (Python3).  Note that there is 
not a symmetric CPU cost for compression and decompression.  
Usually, decompression is much cheaper.

  # compress = bz2.compress
  # compress = lzma.compress
  compress = zlib.compress

To read the logging data, then the programmer, application analyst 
or sysadmin will need to spend CPU to uncompress.  If it's rare, 
that's probably a good tradeoff.

Here's my small comparison matrix of the time it takes to transform 
a sample log file that was roughly 33MB (in memory, no I/O costs 
included in timing data).  The chart also shows the size of the 
compressed data, in bytes and percentage (to demonstrate compression 
efficiency).

   formatbytes   pct  walltime
   raw34311602 1.00%  0.0s
   base64-encode  46350762 1.35%  0.43066s
   zlib-compress   3585508 0.10%  0.54773s
   bz2-compress2704835 0.08%  4.15996s
   lzma-compress   2243172 0.07% 15.89323s
   base64-decode  34311602 1.00%  0.18933s
   bz2-decompress 34311602 1.00%  0.62733s
   lzma-decompress34311602 1.00%  0.22761s
   zlib-decompress34311602 1.00%  0.07396s

The point of a sample matrix like this is to examine the tradeoff 
between time (for compression and decompression) and to think about 
how often you, your application or your users will decompress the 
historical data.  Also consider exactly how sensitive you are to 
bytes on disk.  (N.B. Data from a single run of the code.)

Finally, simply make a choice for one of the compression algorithms.

> 2) And then remove all new lines using binascii.b2a_base64, so I 
> have a log entry per line.

I'd also suggest that you resist the base64 temptation.  As others 
have pointed out, there's a benefit to keeping the logs compressed 
using one of the standard compression tools (zgrep, zcat, bzgrep, 
lzmagrep, xzgrep, etc.)

Also, see the statistics above for proof--base64 encoding is not 
compression.  Rather, it usually expands input data to the tune of 
one third (see above, the base64 encoded string is 135% of the raw 
input).

That's not compression.  So, don't do it.  In this case, it's 
expansion and obfuscation.  If you don't need it, don't choose it.

In short, base64 is actively preventing you from shrinking your 
storage requirement.

> but b2a_base64 is far from ideal: adds lots of bytes to the 
> compressed log entry. So, I wonder if perhaps there is a better 
> way to remove new lines from the zlib output? or maybe a different 
> approach?

Suggestion:  Don't worry about the single-byte newline terminator.  
Look at a whole logfile and choose your best option.

Lastly, I have one other pair of questions for you to consider.

Question one:  Will your application later read or use the logging 
data?  If no, and it is intended only as a record for posterity, 
then, I'd suggest sending that data to the system logs (see the 
'logging' module and talk to your operational people).

If yes, then question two is:  What about resilience?  Suppose your 
application crashes in the middle of writing a (compressed) logfile.  
What does it do?  Does it open the same file?  (My personal answer 
is always 'no.')  Does it open a new file?  When reading the older 
logfiles, how does it know where to resume?  Perhaps you can see my 
line of thinking.

Anyway, best of luck,

-Martin

P.S. The exact compression ratio is dependent on the input.  I have 
  rarely seen zlib at 10% or bz2 at 8%.  I conclude that my sample 
  log data must have been more homogeneous than the data on which I 
  derived my mental bookmarks for textual compression efficiencies 
  of around 15% for zlib and 12% for