Re: Looking for Coders or Testers for an Open Source File Organizer
On 13/06/2011 11:55 PM, zainul franciscus wrote: Iknow you guys must be thinking "Hmm, Miranda, isn't that an IM application ?"; Yep I hear you, I'll change the name once I get a good name. I am open for any suggestions. Actually I was thinking "isn't that a functional programming language?" My suggestion: Cruftbuster -- http://mail.python.org/mailman/listinfo/python-list
Re: Finding keywords
On 08/03/2011 8:58 AM, Cross wrote: I know meta tags contain keywords but they are not always reliable. I can parse xhtml to obtain keywords from meta tags; but how do I verify them. To obtain reliable keywords, I have to parse the plain text obtained from the URL. I think maybe what the OP is asking about is extracting key words from a text, i.e. a short list of words that characterize the text. This is an information retrieval problem, not really a Python problem. One simple way to do this is to calculate word frequency histograms for each document in your corpus, and then for a given document, select words that are frequent in that document but infrequent in the corpus as a whole. Whoosh does this. There are different ways of calculating the importance of words, and stemming and conflating synonyms can give you better results as well. A more sophisticated method uses "part of speech" tagging. See the Python Natural Language Toolkit (NLTK) and topia.termextract for more information. http://pypi.python.org/pypi/topia.termextract/ Yahoo has a web service for key word extraction: http://developer.yahoo.com/search/content/V1/termExtraction.html You might want to investigate these resources and try google searches for e.g. "extracting key terms from documents" and then come back if you have a question about the Python implementation. Cheers, Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading/Writing files
On 18/03/2011 5:33 PM, Jon Herman wrote: I am pretty new to Python and am trying to write data to a file. However, I seem to be misunderstanding how to do so. For starters, I'm not even sure where Python is looking for these files or storing them. The directories I have added to my PYTHONPATH variable (where I import modules from succesfully) does not appear to be it. So my question is: How do I tell Python where to look for opening files, and where to store new files? This is how you write to a file in Python myfile = open("path/to/the/file", "wb") myfile.write("Hello world!\n") myfile.close() Beyond that, your message is too vague to offer any real help, but it sounds like you're way off track. If the above code doesn't help, please tell us exactly what you're trying to do, but you might want to read a Python book such as Dive Into Python first. Cheers, Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Tips on Speeding up Python Execution
On 08/04/2011 11:31 AM, Chris Angelico wrote: On Sat, Apr 9, 2011 at 12:41 AM, MRAB wrote: On 08/04/2011 08:25, Chris Angelico wrote: [snip] I don't know what's the most Pythonesque option, but if you already have specific Python code for each of your functions, it's probably going to be easiest to spawn threads for them all. "Pythonesque" refers to "Monty Python's Flying Circus". The word you want is "Pythonic". And the word for referring to the actual snake is "Pythonical" :P -- http://mail.python.org/mailman/listinfo/python-list
Snowball to Python compiler
On the slim chance that (a) somebody worked on something like this but never uploaded it to PyPI, and (b) the person who did (a) or heard about it is reading this list ;) -- I'm looking for some code that will take a Snowball program and compile it into a Python script. Or, less ideally, a Snowball interpreter written in Python. (http://snowball.tartarus.org/) Anyone heard of such a thing? Thanks! Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Snowball to Python compiler
A third (more-than-) possible solution: google("python snowball"); the first page of results has at least 3 hits referring to Python wrappers for Snowball. There are quite a few wrappers for the C-compiled snowball stemmers, but I'm looking for a pure-Python solution. It doesn't seem like there is such a thing, but I figured I'd take a shot here before I think about doing it myself :/ Matt -- http://mail.python.org/mailman/listinfo/python-list
Spamming PyPI with stupid packages
Someone seems to be spamming PyPI by uploading multiple stupid packages. Not sure if it's some form of advertising spam or just idiocy. Don't know if we should care though... maybe policing uploads is worse than cluttering PyPI's disk space and RSS feed with dumb 1 KB packages. > girlfriend 1.0.1 10 A really simple module that allow everyone to > do "import girlfriend" > girlfriends 1.0 4 Girl Friends > car 1.0 2 Car, a depended simple module that allow everyone to do > "import girlfriend" > house 1.0 2 House, a depended simple module that allow everyone to > do "import girlfriend" > money 1.0 2 Money, a depended simple module that allow everyone to > do "import girlfriend" > workhard 1.0 2 Keep working hard, a depended simple module that allow > everyone to do "import girlfriend" Matt -- http://mail.python.org/mailman/listinfo/python-list
Multiprocessing problem
Hi, I'm having a problem with the multiprocessing package. I'm trying to use a simple pattern where a supervisor object starts a bunch of worker processes, instantiating them with two queues (a job queue for tasks to complete and an results queue for the results). The supervisor puts all the jobs in the "job" queue, then join()s the workers, and then pulls all the completed results off the "results" queue. (I don't think I can just use something like Pool.imap_unordered for this because the workers need to be objects with state.) Here's a simplified example: http://pastie.org/850512 The problem is that seemingly randomly, but almost always, the worker processes will deadlock at some point and stop working before they complete. This will leave the whole program stalled forever. This seems more likely the more work each worker does (to the point where adding the time.sleep(0.01) as seen in the example code above guarantees it). The problem seems to occur on both Windows and Mac OS X. I've tried many random variations of the code (e.g. using JoinableQueue, calling cancel_join_thread() on one or both queues even though I have no idea what it does, etc.) but keep having the problem. Am I just using multiprocessing wrong? Is this a bug? Any advice? Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing problem
On 3/2/2010 3:59 PM, Matt Chaput wrote: > I'm trying to use a simple pattern where a supervisor object starts a > bunch of worker processes, instantiating them with two queues (a job > queue for tasks to complete and an results queue for the results). The > supervisor puts all the jobs in the "job" queue, then join()s the > workers, and then pulls all the completed results off the "results" queue. > Here's a simplified example: > > http://pastie.org/850512 I should mention that if I change my code so the workers just pull things off the job queue but don't put any results on the result queue until after they see the None sentinel in the job queue and break out of the loop, I don't get the deadlock. So it's something about getting from one queue and putting to another queue in close proximity. Hopefully I'm making a simple mistake with how I'm using the library and it'll be easy to fix... Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing problem
If the main process doesn't get the results from the queue until the worker processes terminate, and the worker processes don't terminate until they've put their results in the queue, and the pipe consequently fills up, then deadlock can result. The queue never fills up... on platforms with qsize() I can see this. I remove items from the results queue as I add to the job queue, and if I add timeouts everywhere the workers never raise Empty and the supervisor never raises Full. They just deadlock. I've rewritten the code so the worker threads don't push information back while they run, they just write to a temporary file which the supervisor can read, which avoids the issue. But if anyone can tell me what I was doing wrong for future reference, I'd greatly appreciate it. Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Editor/IDE with Python coverage support?
Are there any editors/IDEs with good support for line-coloring from Python test coverage results? (I normally use Eclipse + PyDev but PyDev's current coverage support isn't much better than nothing.) Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Unit testing multiprocessing code on Windows
Does anyone know the "right" way to write a unit test for code that uses multiprocessing on Windows? The problem is that with both "python setup.py tests" and "nosetests", when they get to testing any code that starts Processes they spawn multiple copies of the testing suite (i.e. the new processes start running tests as if they were started with "python setup.py tests"/"nosetests"). The test runner in PyDev works properly. Maybe multiprocessing is starting new Windows processes by copying the command line of the current process? But if the command line is "nosetests", it's a one way ticket to an infinite explosion of processes. Any thoughts? Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Unit testing multiprocessing code on Windows
Does anyone know the "right" way to write a unit test for code that uses multiprocessing on Windows? The problem is that with both "python setup.py tests" and "nosetests", when they get to a multiprocessing test they spawn multiple copies of the testing suite. The test runner in PyDev works properly. Maybe multiprocessing is starting new Windows processes by copying the command line of the current process, but if the command line is "nosetests", it's a one way ticket to an infinite explosion of processes. Any thoughts? Thanks, Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Unit testing multiprocessing code on Windows
On 18/02/2011 2:54 AM, Terry Reedy wrote: On 2/17/2011 6:31 PM, Matt Chaput wrote: Does anyone know the "right" way to write a unit test for code that uses multiprocessing on Windows? I would start with Lib/test/test_multiprocessing. Good idea, but on the one hand it doesn't seem to be doing anything special, and on the other hand it seems to do it's own things like not having its test cases inherit from unittest.TestCase. I also don't know if the Python devs start it with distutils or nosetests, which are the ones I'm having a problem with. For example, starting my test suite inside PyDev doesn't show the bug. My test code isn't doing anything unusual... this is pretty much all I do to trigger the bug. (None of the imported code has anything to do with processes.) from __future__ import with_statement import unittest import random from whoosh import fields, query from whoosh.support.testing import TempIndex try: import multiprocessing except ImportError: multiprocessing = None if multiprocessing: class MPFCTask(multiprocessing.Process): def __init__(self, storage, indexname): multiprocessing.Process.__init__(self) self.storage = storage self.indexname = indexname def run(self): ix = self.storage.open_index(self.indexname) with ix.searcher() as s: r = s.search(query.Every(), sortedby="key", limit=None) result = "".join([h["key"] for h in r]) assert result == "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", result class TestSorting(unittest.TestCase): def test_mp_fieldcache(self): if not multiprocessing: return schema = fields.Schema(key=fields.KEYWORD(stored=True)) with TempIndex(schema, "mpfieldcache") as ix: domain = list(u"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ") random.shuffle(domain) w = ix.writer() for char in domain: w.add_document(key=char) w.commit() tasks = [MPFCTask(ix.storage, ix.indexname) for _ in xrange(4)] for task in tasks: task.start() for task in tasks: task.join() if __name__ == '__main__': unittest.main() -- http://mail.python.org/mailman/listinfo/python-list
Re: Unit testing multiprocessing code on Windows
On 17/02/2011 8:22 PM, phi...@semanchuk.com wrote: Hi Matt, I assume you're aware of this documentation, especially the item entitled "Safe importing of main module"? http://docs.python.org/release/2.6.6/library/multiprocessing.html#windows Yes, but the thing is my code isn't __main__, my unittest classes are being loaded by setup.py test or nosetests. And while I'm assured multiprocessing doesn't duplicate the original command line, what I get sure looks like it, because if I use "python setup.py test" that command seems to be re-run for every Process that starts, but if I use "nosetests" then *that* seems to be re-run for every Process. Matt -- http://mail.python.org/mailman/listinfo/python-list