ElementTree: can't figure out a mismached-tag error
Hi all, I haven't been able to get up to speed with XML. I do examples from the tutorials and experiment with variations. Time and time again I fail with errors messages I can't make sense of. Here's the latest one. The url is "http://finance.yahoo.com/q?s=XIDEQ&ql=0";. Ubuntu 12.04 LTS, Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] >>> import xml.etree.ElementTree as ET >>> tree = ET.parse('q?s=XIDEQ') # output of wget http://finance.yahoo.com/q?s=XIDEQ&ql=0 Traceback (most recent call last): File "", line 1, in tree = ET.parse('q?s=XIDEQ') File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse tree.parse(source, parser) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse parser.feed(data) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror raise err ParseError: mismatched tag: line 9, column 2 Below first nine lines. The line numbers and the following space are hand-edited in. Three dots stand for sections cut out to fit long lines. Line 6 is a bunch of "meta" statements, all of which I show on a separate line each in order to preserve the angled brackets. On all lines the angled brackets have been preserved. The mismatched character is the slash of the closing tag . What could be wrong with it? And if it is, what about fault tolerance? 1 2 3 4 XIDEQ: Summary for EXIDE TECH NEW- Yahoo! Finance 5 content="http://l.yimg.com/a/p/fi/31/09/00.jpg";> http://finance.yahoo.com/q?s=XIDEQ";> href="http://finance.yahoo.com/q?s=XIDEQ";> 8 http://l.yimg.com/zz/ . . . type="text/css"> 9 ^ Mismatch! Thanks for suggestions Frederic -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree: can't figure out a mismached-tag error
On 07/11/2013 10:59 AM, F.R. wrote: Hi all, I haven't been able to get up to speed with XML. I do examples from the tutorials and experiment with variations. Time and time again I fail with errors messages I can't make sense of. Here's the latest one. The url is "http://finance.yahoo.com/q?s=XIDEQ&ql=0";. Ubuntu 12.04 LTS, Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] >>> import xml.etree.ElementTree as ET >>> tree = ET.parse('q?s=XIDEQ') # output of wget http://finance.yahoo.com/q?s=XIDEQ&ql=0 Traceback (most recent call last): File "", line 1, in tree = ET.parse('q?s=XIDEQ') File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse tree.parse(source, parser) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse parser.feed(data) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror raise err ParseError: mismatched tag: line 9, column 2 Below first nine lines. The line numbers and the following space are hand-edited in. Three dots stand for sections cut out to fit long lines. Line 6 is a bunch of "meta" statements, all of which I show on a separate line each in order to preserve the angled brackets. On all lines the angled brackets have been preserved. The mismatched character is the slash of the closing tag . What could be wrong with it? And if it is, what about fault tolerance? 1 2 3 4 XIDEQ: Summary for EXIDE TECH NEW- Yahoo! Finance 5 content="http://l.yimg.com/a/p/fi/31/09/00.jpg";> http://finance.yahoo.com/q?s=XIDEQ";> href="http://finance.yahoo.com/q?s=XIDEQ";> 8 http://l.yimg.com/zz/ . . . type="text/css"> 9 ^ Mismatch! Thanks for suggestions Frederic Thank you all! I was a little apprehensive it could be a silly mistake. And so it was. I have BeautifulSoup somewhere. Having had no urgent need for it I remember shirking the learning curve. lxml seems to be a package with these components (from help (lxml)): PACKAGE CONTENTS ElementInclude _elementpath builder cssselect doctestcompare etree html (package) isoschematron (package) objectify pyclasslookup sax usedoctest I would start with "from lxml import html" and see what comes out. Break time now. Thanks again! Frederic -- http://mail.python.org/mailman/listinfo/python-list
Re: attaching names to subexpressions
On 10/28/2012 06:57 AM, Devin Jeanpierre wrote: line = function(x, y, z) >while line: > do something with(line) > line = function(x, y, z) How about: line = True while line: line = function(x, y, z) do something with(line) ? Frederic -- http://mail.python.org/mailman/listinfo/python-list
Re: [Gimp-user] export to non xcf
On 10/28/2012 09:09 PM, Michael Schumacher wrote: Von: Donald Miller Can't directly save to jpg, so exported. Export to jpg made png. Same for psd. Shouldn't name track chosen format, so no manual override needed? Maybe you had set the file-type chooser to this format? The default "By extension" should do what you want, any other value is for special cases like you've discovered, like ambiguous file name extensions. Regards, Michael ___ gimp-user-list mailing list gimp-user-l...@gnome.org https://mail.gnome.org/mailman/listinfo/gimp-user-list I am contending with a similar malfunction: I can export by extension, but when done Gimp closes. The exported file exists, but I need to restart Gimp after every export. No loss of capability, but annoying. Judging by the flood of Gimp-related posts of late and the great variety of the issues raised, there seem to be major stability or environmental compatibility problems. I am running Gimp 2.6.12 on Ubunty 12.04. I found a Gimp user list (https://mail.gnome.org/mailman/listinfo/gimp-user-list) and intend to check it out the moment I get around to it. Frederic -- http://mail.python.org/mailman/listinfo/python-list
Strange object identity problem
Hi all, Once in a while I write simple routine stuff and spend the next few hours trying to understand why it doesn't behave as I expect. Here is an example holding me up: I have a module "st" with a class "runs". In a loop I repeatedly create an object "ba" and call the method "ba.run ()" which processes the constructor's arguments. Next I store the object in a dictionary "bas". It then turns out that all keys hold the same object, namely the one created last in the loop. Verifying the identity of each object when it is being assigned to the dictionary reveals different identities. Repeating the verification after the loop is done shows the same object in all key positions: >>> bas = {} >>> for year in range (2010, 2013): ba = st.runs ('BA', '%d-01-01' % year, '%d-12-31' % year) ba.run () print year, id (ba) bas [year] = ba 2010 150289932 2011 150835852 2012 149727788 >>> for y in sorted (bas.keys ()): b = bas [year] print y, id (b) 2010 149727788 2011 149727788 2012 149727788 The class "runs" has a bunch of attributes, among which an object "parameters" for tweaking processing runs and a object "quotes" containing a list of data base records. Both objects are created by "runs.__init__ (...)". Trying something similar with a simpler class works as expected: >>> class C: def __init__ (self, i): self.i = i def run (self): self.ii = self.i * self.i >>> cees = {} >>> for year in range (2010, 2013): c = C (year) c.run () print year, id (c) cees [year] = c 2010 150837804 2011 148275756 2012 146131212 >>> for year in sorted (cees.keys ()): print year, id (cees [year]), cees [year].ii 2010 150837804 4040100 2011 148275756 4044121 2012 146131212 4048144 I have checked for name clashes and found none, wondering what to check next for. Desperate for suggestions. Frederic (Python 2.7 on Ubuntu 12.04) -- http://mail.python.org/mailman/listinfo/python-list
Re: Strange object identity problem
On 11/12/2012 02:27 PM, Robert Franke wrote: Hi Frederic, [...] bas = {} for year in range (2010, 2013): ba = st.runs ('BA', '%d-01-01' % year, '%d-12-31' % year) ba.run () print year, id (ba) bas [year] = ba 2010 150289932 2011 150835852 2012 149727788 for y in sorted (bas.keys ()): b = bas [year] Shouldn't that be b = bas[y]? Yes, it should, indeed! What's more, I should have closed and restarted IDLE. There must have been a name clash somewhere in the name space. The problem no longer exists. Sorry about that. And thanks to all who paused to reflect on this non-problem. - Frederic. print y, id (b) 2010 149727788 2011 149727788 2012 149727788 [...] Cheers, Robert -- http://mail.python.org/mailman/listinfo/python-list
Re: Strange object identity problem
On 11/12/2012 06:02 PM, duncan smith wrote: On 12/11/12 13:40, F.R. wrote: On 11/12/2012 02:27 PM, Robert Franke wrote: Hi Frederic, [...] bas = {} for year in range (2010, 2013): ba = st.runs ('BA', '%d-01-01' % year, '%d-12-31' % year) ba.run () print year, id (ba) bas [year] = ba 2010 150289932 2011 150835852 2012 149727788 for y in sorted (bas.keys ()): b = bas [year] Shouldn't that be b = bas[y]? Yes, it should, indeed! What's more, I should have closed and restarted IDLE. There must have been a name clash somewhere in the name space. The problem no longer exists. Sorry about that. And thanks to all who paused to reflect on this non-problem. - Frederic. print y, id (b) 2010 149727788 2011 149727788 2012 149727788 [...] Cheers, Robert The problem was that year was bound to the integer 2013 from the first loop. When you subsequently looped over the keys you printed each key followed by id(bas[2013]). Restarting IDLE only helped because you presumably didn't repeat the error. Duncan That's it! Isn't it strange how on occasion one doesn't see the most obvious and simple mistake, focusing beyond the realm of foolishness. Thanks all . . . Frederic -- http://mail.python.org/mailman/listinfo/python-list
MySQL - "create table" creates malfunctioning tables
The other day, for unfathomable reasons, I lost control over tables which I create. There was no concurrent change of anything on the machine, such as an update. So I have no suspect. Does the following action log suggest any recommendation to experienced SQL programmers? 1. A table: mysql> select * from expenses; +++---+--++-+---+ | id | date | place | stuff| amount | category| flags | +++---+--++-+---+ | 38 | 2013-01-15 | ATT | Sim card | 25.00 | Visa credit | | +++---+--++-+---+ 1 row in set (0.00 sec) 2. I want to delete everything: mysql> delete from expenses; 3. Nothing happens for about one minute. Then this: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction mysql> 4. I want to delete the table: mysql> drop table expenses; 5. Nothing happens indefinitely. I have to hit Ctrl-C ^CCtrl-C -- sending "KILL QUERY 167" to server ... Ctrl-C -- query aborted. ERROR 1317 (70100): Query execution was interrupted mysql> 6. I expect to find "expenses.frm", "expenses.MYD" and "expenses.MYI". I find only the format file: root@hatchbox-one:/home/fr# find / -name 'expenses.*' -ls 1055950 12 -rw-rw 1 mysqlmysql8783 Jan 25 01:51 /var/lib/mysql/fr/expenses.frm root@hatchbox-one:/home/fr# 7. All the tables I regularly work with have a "frm", a "MYD" and a "MYI" file in the directory "/var/lib/mysql/fr/". 8. I'd like to drop the table "expenses", plus a number of other tables I created, hoping to hit it right by chance. I didn't and now I have a bunch of unusable tables which I can't drop and I don't know where their files are hiding. If I did, I could remove them. Grateful for any hint, comment, suggestion Frederic Dell E6500 Ubuntu 10.04 LTS Python 2.7.3 (default, Aug 1 2012, 05:16:07) \n[GCC 4.6.3 MySQL (Can't find a version number) Server version: 5.5.28-0ubuntu0.12.04.3 (Ubuntu) -- http://mail.python.org/mailman/listinfo/python-list
Puzzling PDF
Hi all, Struggling to parse bank statements unavailable in sensible data-transfer formats, I use pdftotext, which solves part of the problem. The other day I encountered a strange thing, when one single figure out of many erroneously converted into letters. Adobe Reader displays the figure 50'000 correctly, but pdftotext makes it into "SO'OOO" (The letters "S" as in Susan and "O" as in Otto). One would expect such a mistake from an OCR. However, the statement is not a scan, but is made up of text. Because malfunctions like this put a damper on the hope to ever have a reliable reader that doesn't require time-consuming manual verification, I played around a bit and ended up even more confused: When I lift the figure off the Adobe display (mark, copy) and paste it into a Python IDLE window, it is again letters (ascii 83 and 79), when on the Adobe display it shows correctly as digits. How can that be? Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Puzzling PDF
On 02/16/2014 05:29 PM, Emile van Sebille wrote: You On 2/16/2014 6:00 AM, F.R. wrote: Hi all, Struggling to parse bank statements unavailable in sensible data-transfer formats, I use pdftotext, which solves part of the problem. The other day I encountered a strange thing, when one single figure out of many erroneously converted into letters. Adobe Reader displays the figure 50'000 correctly, but pdftotext makes it into "SO'OOO" (The letters "S" as in Susan and "O" as in Otto). One would expect such a mistake from an OCR. However, the statement is not a scan, but is made up of text. Because malfunctions like this put a damper on the hope to ever have a reliable reader that doesn't require time-consuming manual verification, I played around a bit and ended up even more confused: When I lift the figure off the Adobe display (mark, copy) and paste it into a Python IDLE window, it is again letters (ascii 83 and 79), when on the Adobe display it shows correctly as digits. How can that be? I've also gotten inconsistent results using various pdf to text converters[1], but getting an explanation for pdf2totext's failings here isn't likely to happen. I'd first try google doc's on-line conversion tool to see if you get better results. If you're lucky it'll do the job and you'll have confirmation that better tools exist. Otherwise, I'd look for an alternate way of getting the bank info than working from the pdf statement. At one site I've scripted firefox to access the bank's web based inquiry to retrieve the new activity overnight and use that to complete a daily bank reconciliation. HTH, Emile [1] I wrote my own once to get data out of a particularly gnarly EDI specification pdf. Emile, thanks for your response. Thanks to Roy Smith and Alister, too. pdftotext has been working just fine. So much so that this freak incident is all the more puzzling. It smacks of an OCR error, but where does OCR come in, I wonder. I certainly suspected that the font I was looking at had fives and zeroes identical to esses and ohs, respectively, but the suspicion didn't hold up to scrutiny. I attach a little screen shot: At the top, the way it looks on the statement. Next, two words marked with the mouse. (One single marking, doesn't color the space.) Ctl-c puts both words to the clip board. Ctl-v drops them into the python IDLE window between the quotation marks. Lo and behold: they're clearly different! A little bit of code around displays the ascii numbers. Isn't that interesting? Frederic No matter. You're both right. There are alternatives. The best would be to get the data in a CSV format. Alas, I am so lightweight a client that banks don't even bother to find out what I am talking about. I know how to access web pages programmatically, but haven't gotten around to dealing with password-protected log-ins and to sending such data as one writes into templates interactively. Frederic <>-- https://mail.python.org/mailman/listinfo/python-list
Re: Storing the state of script between steps
On 02/21/2014 09:59 PM, Denis Usanov wrote: Good evening. First of all I would like to apologize for the name of topic. I really didn't know how to name it more correctly. I mostly develop on Python some automation scripts such as deployment (it's not about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step". Some code: class IStep(object): def run(): raise NotImplementedError() And the certain steps: class DeployStep: ... class ValidateUSBFlash: ... class SwitchVersionS: ... Where I implement run method. Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one. And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some steps have to use the results from previous steps. And now I have problems. Before now I had internal dict in "builder" and named it as "world" and passed it to each run() methods of steps. It worked but I disliked this. How would you solve this problem and how would you do it? I understant that it's more architecture specific question, not a python one. I bet I wouldn't have asked this if I had worked with some of functional programming languages. A few months ago I posted a summary of a data transformation framework inviting commentary. (https://mail.python.org/pipermail/python-list/2013-August/654226.html). It didn't meet with much interest and I forgot about it. Now that someone is looking for something along the line as I understand his post, there might be some interest after all. My module is called TX. A base class "Transformer" handles the flow of data. A custom Transformer defines a method "T.transform (self)" which transforms input to output. Transformers are callable, taking input as an argument and returning the output: transformed_input = T (some_input) A Transformer object retains both input and output after a run. If it is called a second time without input, it simply returns its output, without needlessly repeating its job: same_transformed_input = T () Because of this IO design, Transformers nest: csv_text = CSV_Maker (Data_Line_Picker (Line_Splitter (File_Reader ('1st-quarter-2013.statement' A better alternative to nesting is to build a Chain: Statement_To_CSV = TX.Chain (File_Reader, Line_Splitter, Data_Line_Picker, CSV_Maker) A Chain is functionally equivalent to a Transformer: csv_text = Statement_To_CSV ('1st-quarter-2013.statement') Since Transformers retain their data, developing or debugging a Chain is a relatively simple affair. If a Chain fails, the method "show ()" displays the innards of its elements one by one. The failing element is the first one that has no output. It also displays such messages as the method "transform (self)" would have logged. (self.log (message)). While fixing the failing element, the element preceding keeps providing the original input for testing, until the repair is done. Since a Chain is functionally equivalent to a Transformer, a Chain can be placed into a containing Chain alongside Transformers: Table_Maker = TX.Chain (TX.File_Reader (), TX.Line_Splitter (), TX.Table_Maker ()) Table_Writer = TX.Chain (Table_Maker, Table_Formatter, TX.File_Writer (file_name = '/home/xy/office/addresses-4214')) DB_Writer = TX.Chain (Table_Maker, DB_Formatter, TX.DB_Writer (table_name = 'contacts')) Better: Splitter = TX.Splitter (TX.Table_Writer (), TX.DB_Writer ()) Table_Handler = TX.Chain (Table_Maker, Splitter) Table_Handler ('home/xy/Downloads/report-4214') # Writes to both file and to DB If a structure builds up too complex to remember, the method "show_tree ()" would display something like this: Chain Chain[0] - Chain Chain[0][0] - Quotes Chain[0][1] - Adjust Splits Chain[1] - Splitter Chain[1][0] - Chain Chain[1][0][0] - High_Low_Range Chain[1][0][1] - Splitter Chain[1][0][1][0] - Trailing_High_Low_Ratio Chain[1][0][1][1] - Standard Deviations Chain[1][1] - Chain Chain[1][1][0] - Trailing Trend Chain[1][1][1] - Pegs Following a run, all intermediary formats are accessible: standard_deviations = C[1][0][1][1]() TM = TX.Table_Maker () TM (standard_deviations).write () 0 | 1 | 2 | 116.49 | 132.93 | 11.53 | 115.15 | 128.70 | 11.34 | 1.01 | 0.00 | 0.01 | A Transformer takes parameters, either at construction time or by means of the method "T.set (key = parameter)". Whereas a File Reader doesn't get payload passed and may take a file name as input argument, as a convenient alternative, a File Writer does take payload and the file name must be set by keyword: File_Writer = TX.File_Writer (file_name = '/tmp/memos-with-dates-1') File_Writer (input) # Writes file File_Writer.set ('/tmp/memos-with-dates-2') File_Writer ()
A data transformation framework. A presentation inviting commentary.
Hi all, In an effort to do some serious cleaning up of a hopelessly cluttered working environment, I developed a modular data transformation system that pretty much stands. I am very pleased with it. I expect huge time savings. I would share it, if had a sense that there is an interest out there and would appreciate comments. Here's a description. I named the module TX: The nucleus of the TX system is a Transformer class, a wrapper for any kind of transformation functionality. The Transformer takes input as calling argument and returns it transformed. This design allows the assembly of transformation chains, either nesting calls or better, using the class Chain, derived from 'Transformer' and 'list'. A Chain consists of a sequence of Transformers and is functionally equivalent to an individual Transformer. A high degree of modularity results: Chains nest. Another consequence is that many transformation tasks can be handled with a relatively modest library of a few basic prefabricated Transformers from which many different Chains can be assembled on the fly. A custom Transformer to bridge an eventual gap is quickly written and tested, because the task likely is trivial. A good analogy of the TX methodology is a road map with towns scattered all over it and highways connecting them. To get from any town to any other one is a simple matter of hopping the towns in between. The TX equivalent of the towns are data formats, the equivalent of the highways are TX Transformers. They are not so much thought of in terms of what they do than in terms of the formats they take and give. Designing a library of Transformers is essentially a matter of establishing a collection of standard data formats. First the towns, then the highways. A feature of the TX Transformer is that it retains both its input and output. This makes a Chain a breeze to build progressively, link by link, and also makes debugging easy: If a Chain doesn't work, Chain.show () reveals the failing link as the first one that has no output. It can be replaced with a corrected instance, as one would replace a blown fuse. Running the Chain again without input makes it have another try. Parameter passing runs on a track that is completely separate from the payload track. Parameters can be set in order to configure a Chain prior to running it, or can be sent at runtime by individual Transformers to its siblings and their progeny. Parameters are keyed and get picked up by those Chain links whose set of pre-defined keys includes the parameter's key. Unintended pick-ups with coincidentally shared keys for unrelated parameters can be prevented by addressing parameters to individual Translators. Below an application example. Five custom classes at the end exemplify the pattern. I join the post also as attachment, in case some auto-line-wrap messes up this text. Commentary welcome Frederic An example of use: Download historic stock quotes from Yahoo Finance for a given range of dates and a list of symbols, delete a column and add three, insert the data in a MySQL table. Also write them to temporary files in tabular form for verification. "make_quotes_writer ()" returns a custom transformation tree. "run_quotes ()" makes such a tree, sets it on a given time range and runs it on a list of symbols. (Since Yahoo publishes the data for downloading, I presume it's okay to do it this way. This is a demo of TX, however, and should not be misconstrued as an encouragement to violate any publisher's terms of service.) import TX, yahoo_historic_quotes as yhq def make_quotes_writer (): Visualizer = TX.Chain ( yhq.percent (), TX.Table_Maker (has_header = True), TX.Table_Writer (), name = 'Visualizer' ) To_DB = TX.Chain (yhq.header_stripper(), TX.DB_Writer(table_name = 'quotes'), name = 'To DB') To_File = TX.Chain (Visualizer, TX.File_Writer (), name = 'To File') Splitter = TX.Splitter (To_DB, To_File, name = 'Splitter') Quotes = TX.Chain ( yhq.yahoo_quotes (), TX.CSV_To_List (delimiter = ','), TX.Numerizer (), yhq.wiggle_and_trend (), yhq.symbol (), Splitter, name = 'Quotes' ) return Quotes >>> Quotes = make_quotes_writer () >>> Quotes.show_tree() Quotes Quotes[0] - Yahoo Quotes Quotes[1] - CSV To List Quotes[2] - Numerizer Quotes[3] - Wiggle and Trend Quotes[4] - Symbol Quotes[5] - Splitter Quotes[5][0] - To DB Quotes[5][0][0] - Header Stripper Quotes[5][0][1] - DB Writer Quotes[5][1] - To File Quotes[5][1][0] - Visualizer Quotes[5][1][0][0] - Percent Quotes[5][1][0][1] - Table Maker Quotes[5][1][0][2] - Table Writer Quotes[5][1][1] - File Writer def run_quotes (symbols, from_date = '1970-01-01', to_date = '2099-12-31'): '''Downloads
Re: A data transformation framework. A presentation inviting commentary.
On 08/21/2013 06:29 PM, F.R. wrote: Hi all, In an effort to do some serious cleaning up of a hopelessly cluttered working environment, I developed a modular data transformation system that pretty much stands. I am very . . . etc Chris, Terry, Dieter, thanks for your suggestions. Chris: If my Transformer looks like a function, that's because it is (__call__). My idea was to have something like an erector set of elementary transformation machines that can be assembled into chains. There may be some processing overhead in managing the data flow, but I'm not even sure of that, because the flow needs to be managed somehow and throwing one's stones into someone else's garden doesn't get rid of the stones. My idea was to simplify, generalize and automate in order to deal with the kind of overhead that matters most to me: my own mental overhead. Terry: I am aware of the memory-load aspect. It is no constraint for the things I do. If it became one, I'd develop a translation assembly using a small data sample and when it reaches the stage of reliability, I'd add a line to have each Translator delete its input the moment it is done. I shall certainly look at itertools. Thanks for your suggestions and explanations. Dieter: I wish I could respond to the points you raise. I am unfamiliar with the details and they don't seem like they can be looked up in five minutes. I do make a note of your thoughts. Frederic -- http://mail.python.org/mailman/listinfo/python-list
Where does MySQLdb put inserted data?
Hi, As of late clipboard pasting into a terminal sometimes fails (a known bug, apparently), I use MySQLdb to access MySQL tables. In general this works just fine. But now I fail filling a new table. The table exists. "mysql>EXPLAIN new_table;" explains and "root@blackbox-one:/# sudo/find / -name 'new_table*'" finds "/var/lib/mysql/fr/new_table.frm". So I do "cursor.executemany ('insert into new_table values (%s)' % format, data)". No error occurs and "cursor.execute ('select * from new_table;')" returns the number of records read, and "cursor.fetchall ()" returns all new records. All looks fine, but "mysql>SELECT * FROM new_table;" produces an "Empty set" and "sudo find / -name 'new_table*" still finds only the format file, same as before. Could it have to do with COMMIT. I believe I am using ISAM tables (default?) and those don't recognize undo commands, right?. Anyway, an experimental "cursor.execute ('COMMIT')" didn't make a difference. It looks like MySQLdb puts the data into a cache and that cache should be saved either by the OS or by me. Strange thing is that this is one freak incident in an almost daily routine going back years and involving thousands of access operations in and out acting instantaneously. I seem to remember a similar case some time ago and it also involved a new empty table. Thanks for hints Frederic mysql> select version() -> ; +-+ | version() | +-+ | 5.5.31-0ubuntu0.12.04.1 | +-+ 1 row in set (0.00 sec) -- https://mail.python.org/mailman/listinfo/python-list
Re: Where does MySQLdb put inserted data?
On 10/04/2013 09:38 AM, F.R. wrote: Hi, As of late clipboard pasting into a terminal sometimes fails (a known bug, apparently), I use MySQLdb to access MySQL tables. In general this works just fine. But now I fail filling a new table. The table exists. "mysql>EXPLAIN new_table;" explains and "root@blackbox-one:/# sudo/find / -name 'new_table*'" finds "/var/lib/mysql/fr/new_table.frm". So I do "cursor.executemany ('insert into new_table values (%s)' % format, data)". No error occurs and "cursor.execute ('select * from new_table;')" returns the number of records read, and "cursor.fetchall ()" returns all new records. All looks fine, but "mysql>SELECT * FROM new_table;" produces an "Empty set" and "sudo find / -name 'new_table*" still finds only the format file, same as before. Could it have to do with COMMIT. I believe I am using ISAM tables (default?) and those don't recognize undo commands, right?. Anyway, an experimental "cursor.execute ('COMMIT')" didn't make a difference. It looks like MySQLdb puts the data into a cache and that cache should be saved either by the OS or by me. Strange thing is that this is one freak incident in an almost daily routine going back years and involving thousands of access operations in and out acting instantaneously. I seem to remember a similar case some time ago and it also involved a new empty table. Thanks for hints Frederic mysql> select version() -> ; +-+ | version() | +-+ | 5.5.31-0ubuntu0.12.04.1 | +-+ 1 row in set (0.00 sec) Thank you Chris, thank you Steven, The suggestion to switch to PostgreSQL isn't lost on me. I have it installed, but have been putting off the change, apprehensive of getting slowed down by many annoying side effects for some time to come. This may be the moment . . . Off list? MySQL is. MySQLdb is not. Before I know which of the two is the culprit, I don't know whether I'm off list or not and take the risk, prepared to beg pardon if I am. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Where does MySQLdb put inserted data?
On 10/04/2013 12:11 PM, Chris Angelico wrote: On Fri, Oct 4, 2013 at 8:05 PM, F.R. wrote: Off list? MySQL is. MySQLdb is not. Before I know which of the two is the culprit, I don't know whether I'm off list or not and take the risk, prepared to beg pardon if I am. Just to clarify: Off-topic means discussing stuff that isn't about Python; off-list means sending private emails, not to python-list@python.org / comp.lang.python. You're uncertain as to whether you're off-topic or not, but you're definitely on-list; and my previous mail to you was off-list, so people here are going to be a little confused, as they lack context. (I merely suggested that switching to PostgreSQL would quite probably be a worthwhile time investment.) ChrisA I shall switch and give you credit for the impulse in addition to the terminological clarification on being off something or other . . . Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Where does MySQLdb put inserted data?
On 10/05/2013 12:55 AM, Dennis Lee Bieber wrote: On Fri, 04 Oct 2013 09:38:41 +0200, "F.R." declaimed the following: MySQLdb, as with all DB-API compliant adapters, does NOT do "auto-commit" -- you MUST execute a con.commit() after any query (sequence) that modifies data. Without it, closing the connection will invoke a ROLLBACK operation, removing any attempted changes. That's it! It works! Thank you sooo much. A miracle how I could go without commits for years and never have missing data. Anyway, another lesson learned . . . Thanks Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 07/01/2014 04:04 PM, flebber wrote: What I am trying to do is to reformat a csv file into something more usable. currently the file has no headers, multiple lines with varying columns that are not related. This is a sample Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $10. First $6, second $2, third $1, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 Horse,1,Bennetta,0,"Grahame Begg",Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, Horse,2,Breakfast in Bed,0,"David Vandyke",Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, Horse,3,Capital Commander,0,"Gerald Ryan",Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, Horse,4,Coup Ay Tee (NZ),0,"Chris Waller",Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, Horse,5,Generalife,0,"John O'Shea",Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, Horse,6,He's Your Man (FR),0,"Chris Waller",Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, Horse,7,Hidden Kisses,0,"Chris Waller",Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, Horse,8,Oakfield Commands,0,"Gerald Ryan",Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, Horse,9,Taxmeifyoucan,0,"Gregory Hickman",Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, Horse,10,The Peak,0,"Bart & James Cummings",Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, Horse,11,Tougher Than Ever (NZ),0,"Chris Waller",Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, Horse,12,TROMSO,0,"Chris Waller",Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 Horse,1,Big Bonanza,0,"Don Robb",Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, Horse,2,Casual Choice,0,"Joseph Pride",Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, So what I am trying to so is end up with an output like this. Meeting, Date, Race, Number, Name, Trainer, Location Rosehill, 05/07/14, 1, 1,Bennetta,"Grahame Begg",Randwick, Rosehill, 05/07/14, 1, 2,Breakfast in Bed,"David Vandyke",Warwick Farm, So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. import csv outfile = open("/home/sayth/Scripts/cleancsv.csv", "w") with open('/home/sayth/Scripts/test.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: meeting = row[3] in row[0] == 'Meeting' new = row.insert(0, meeting) while row[1] in row[0] == 'Race' < 9: # pref less than next found row[0] # grab row[1] as id number id = row[1] # from row[0] and insert it in first position new_lines = new.insert(1, id) outfile.write(new_lines) outfile.close() How should I go about this? Thanks Sayth Reformatting is what I do most and over time I have acquired some practice. Complete solutions are not often proposed, possibly sneered on for their officiousness. In that case I apologize. I couldn't resist. It is such a nice example. Having solved it, I figure why not share it . . . Frederic def race_table (csv_text): input_table = [[item.strip(' "') for item in record.split (',')] for record in csv_text.splitlines ()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record [0] == 'Meeting': meeting = record [3] elif record [0] == 'Race': date = record [13] race = record [1] elif record [0] == 'Horse': number = record [1] name = record [2] trainer = record [4] location = record [5] output_table.append ((meeting, date, race, number, name, trainer, location)) return output_table >>> for record in race_table (your_csv_text): print record ('Rosehill Gardens', '5/07/2014', '1', '1', 'Bennetta', 'Grahame Begg', 'Randwick') ('Rosehill Gardens', '5/07/2014', '1', '2', 'Breakfast in Bed', 'David Vandyke', 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1', '3', 'Capital Commander', 'Gerald Ryan', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '4', 'Coup Ay Tee (NZ)', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '5', 'Generalife', "John O'Shea", 'Warwick Far
Re: fixing an horrific formatted csv file.
On 07/02/2014 11:13 AM, flebber wrote: TM = TX.Table_Maker (headings = ('Meeting','Date','Race','Number','Name','Trainer','Location')) TM (race_table (your_csv_text)).write () Where do I find TX? Found this mention in the list, was it available in pip by any name? https://mail.python.org/pipermail/python-list/2014-February/667464.html Sayth I'd have to make it available. I proposed it some time ago and received a couple of suggestions in return. It is a modular transformation framework written entirely in python (2.7). It consists essentially of a base class "Transformer" that handles input and output in such a way that Transformer objects can be chained. It saved me from drowning an a horrible and growing tangle of hacks. Finding something usable I had previously done took time. Understanding how it worked took more time and adapting it took still more time, so that writing yet another hack from scratch was faster. A number of hacks I could quickly wrap into a Transformer object and so could start building a library of standard Transformers. The Table_Maker is one of them. The table making code is quite bad. It suffers from feature overload. I would clean it up for distribution. I'd be happy to distribute the base class and a few standard Translators, such as I use every day. (File Reader, File Writer, DB Run Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a breeze. Testing too, because a Transformer keeps its input and output and, in line with the system's design philosophy, does only its own single thing. A Chain is a list of Transformers that run in sequence. It is itself derived from Transformer and is a functional equivalent. So Chains nest. Fixing a Chain that nothing comes out of is a straightforward matter too. It will still have run up to the failing element. Chain.show () reveals the culprit as the first one to have no output. I am not up to date on distributing and would depend on qualified help on that. Frederic A brief overview The TX solution to your race table would be (TX is the name of the module): class Race_Table (TX.Transformer): ''' In: CSV text Out: Tabular data (2-dimensional list) ''' name = 'Race_Table' @TX.setup # Checks timestamps to prevent needless reruns in the absence of new input def transform (self): for line in self.Input.data: # See my post self.Output.take (output_table) Example file to file: >>> Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (), TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name) >>> Race_Schedule_F2F (input_file_name) # Does it all! Example web to database: >>> Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (), Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name = 'horses')) >>> Race_Schedule_WWW2DB (url) # Does is all! You'd have to write the Race_Schedule_HTML_Reader Verify your table: >>> Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ()) >>> Race_Schedule_WWW2DB.show_tree () # See which one should display Chain Chain[0] - WWW Reader Chain[1] - Race_Schedule_HTML_Reader Chain[2] - Race_Table Chain[3] - DB Writer >>> print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All Transformers keep their data (Display of table) Verify database: >>> print Table_Viewer (TX.DB_Reader (table_name = 'horses')()) (Display of database table) -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 07/04/2014 12:28 PM, flebber wrote: On Friday, 4 July 2014 14:12:15 UTC+10, flebber wrote: I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names. How is it the best to do this? so in my file I had on line 44 this trainer name. "Michael, Wayne & John Hawkes" and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Here is current code. import re from sys import argv SCRIPT, FILENAME = argv def out_file_name(file_name): """take an input file and keep the name with appended _clean""" file_parts = file_name.split(".",) output_file = file_parts[0] + '_clean.' + file_parts[1] return output_file def race_table(text_file): """utility to reorganise poorly made csv entry""" input_table = [[item.strip(' "') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record[0] == 'Meeting': meeting = record[3] elif record[0] == 'Race': date = record[13] race = record[1] elif record[0] == 'Horse': number = record[1] name = record[2] results = record[9] res_split = re.split('[- ]', results) starts = res_split[0] wins = res_split[1] seconds = res_split[2] thirds = res_split[3] prizemoney = res_split[4] trainer = record[4] location = record[5] print(name, wins, seconds) output_table.append((meeting, date, race, number, name, starts, wins, seconds, thirds, prizemoney, trainer, location)) return output_table MY_FILE = out_file_name(FILENAME) # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: # for line in race_table(f_in.readline()): # new_row = line with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: CONTENT = f_in.read() # print(content) FILE_CONTENTS = race_table(CONTENT) # print new_name f_out.write(str(FILE_CONTENTS)) if __name__ == '__main__': pass So I found this on stack overflow In [2]: import string In [3]: identity = string.maketrans("", "") In [4]: x = ['+5556', '-1539', '-99', '+1500'] In [5]: x = [s.translate(identity, "+-") for s in x] In [6]: x Out[6]: ['5556', '1539', '99', '1500'] but it fails in my file, due to I believe mine being a list of list. Is there an easy way to iterate the sublists without flattening? Current code. input_table = [[item.strip(' "') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices identity = string.maketrans("", "") print(input_table) input_table = [s.translate(identity, ",'") for s in input_table] Sayth Take Gregory's advice and use the csv module. Don't reinvent a csv parser. My "csv" splitter was the simplest approach possible, which I tend to use with undocumented formats, tweaking for unexpected features as they come along. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: draw a line if the color of points of beginning and end are différent from white
On 03/06/2013 06:46 PM, olsr.ka...@gmail.com wrote: how can i draw a line if the point of the begining and the end if those points are différent from the white in other exepretion how can i get the color of two points of the begining and the end? please help me This should get you going. If it doesn't work it will still direct you to the relevant chapters in the tutorial. Frederic def draw_line (image): # image is a PIL Image ( ) # Define your colors WHITE = ~0 # Probably white for all modes. LINE_COLOR = 0 # define # Find end points points = [] pixels = image.load () # Fast pixel access for y in range (image.size [1]): for x in range (image.size [0]): if pixels [x, y] != WHITE points.append ((x, y)) # Join end points draw = ImageDraw.Draw (image) draw.line (points, fill = LINE_COLOR) -- http://mail.python.org/mailman/listinfo/python-list
Re: Regex help needed!
On 21.12.2009 12:38, Oltmans wrote: Hello,. everyone. I've a string that looks something like lksjdfls kdjff lsdfs sdjflssdfsdwelcome > From above string I need the digits within the ID attribute. For example, required output from above string is - 35343433 - 345343 - 8898 I've written this regex that's kind of working re.findall("\w+\s*\W+amazon_(\d+)",str) but I was just wondering that there might be a better RegEx to do that same thing. Can you kindly suggest a better/improved Regex. Thank you in advance. If you filter in two or even more sequential steps the problem becomes a lot simpler, not least because you can test each step separately: >>> r1 = re.compile (']*') # Add ignore case and variable white space >>> r2 = re.compile ('\d+') >>> [r2.search (item).group () for item in r1.findall (s) if item] # s is your sample ['345343', '35343433', '8898'] # Supposing all ids have digits Frederic -- http://mail.python.org/mailman/listinfo/python-list