Re: Handle foreign character web input

2019-06-29 Thread Alan Meyer via Python-list

On 6/28/19 4:25 PM, Tobiah wrote:

A guy comes in and enters his last name as RÖnngren.

So what did the browser really give me; is it encoded
in some way, like latin-1?  Does it depend on whether
the name was cut and pasted from a Word doc. etc?
Should I handle these internally as unicode?  Right
now my database tables are latin-1 and things seem
to usually work, but not always.

Also, what do people do when searching for a record.
Is there some way to get 'Ronngren' to match the other
possible foreign spellings?


The first thing I'd want to do is to produce a front-end to discover the 
character set (latin-1, whatever) and convert it to a standard UTF-8.  e.g.:


   data.decode('latin1').encode('utf8')

That gets rid of character set variations in the data, simplifying 
things before any of the hard work has to be done.


Then you have a choice - store and index everything as utf-8, or 
transliterate some or all strings to 7 bit US ASCII.  You may have to 
perform the same processing on input search strings.


I have not used it myself but there is a Python port of a Perl module 
by Sean M. Burke called Unidecode.  It will transliterate non-US ASCII 
strings into ASCII using reasonable substitutions of non-ASCII 
sequences.  I believe that there are other packages that can also do this.


The easy way to use packages like this is to transliterate entire 
records before putting them into your database, but then you may perplex 
or even offend some users who will look at a record and say "What's 
this?  That's not French!"  You'll also have to transliterate all input 
search strings.


A more sophisticated way is to leave the records in Unicode, but add 
transliterated index strings for those index strings that wind up 
containing utf-8 non-ASCII chars.


There are various ways to do this that tradeoff time, space, and 
programming effort.  You can store two versions of each record, search 
one and display the other.  You can just process index strings and add 
the transliterations to the record.  What to choose depends on your 
needs and resources.


And of course all bets are off if some of your data is Chinese, 
Japanese, Hebrew, or maybe even Russian or Greek.


Sometimes I think, Why don't we all just learn Esperanto?  But we all 
know that that isn't going to happen.


Alan
--
https://mail.python.org/mailman/listinfo/python-list


Re: Do I need a parser?

2019-06-29 Thread Alan Meyer via Python-list

On 6/29/19 8:39 AM, josé mariano wrote:

Dear all,

I'm sure that this subject has been addressed many times before on this forum, 
but my poor knowledge of English and of computer jargon and concepts results on 
not being able to find the answer i'm looking for when I search the forum.

So here is my problem: I have this open source project for the scientific 
community were i want to duplicate an old MS-DOS application written in 
Fortran. I don't have the source code. The idea is to re-write the software in 
Python. Originally, the old application would would need to input files: one 
config file, written with a specific format (see below) and a second one, the 
so-called scrip file, that defines the sequence of operations to be performed 
by the main software, also written in a specific format.

To make the transition to the new application as painless as possible to the 
users, because most of them have their collection of scrips (and settings) 
developed over the years and are not willing to learn a new script language, I 
would like to make the new app 100% compatible with the old input files.

The operation of the new software would be like this: From the shell, run 
"my_new_software old_script_file.***". The new software would load the 
old_script, parse it (?), set the internal variables, load the script and run it.

So, to get to my questions:

- To load and read the config file I need a parser, right? Is their a parser 
library where we can define the syntax of the language to use? Are there better 
(meaning easier) ways to accomplish the same result?

- For the interpretation of the script file, I don't have any clue how to 
this... One important thing, the script language admits some simple control 
flow statements like do-wile, again written using a specific sintax.

Thanks a lot for the help and sorry for the long post.

Mariano

   


Example of a config (settings) file

.
CONDAD -11
BURAD2 4 SALT1 1.0 KNO3
ELEC5  -2.0 mV 400 58 0. 0
.


Example of a script
===
!Conductivity titration
cmnd bur1 f
set vinit 100
set endpt 2000
set mvinc 20
set drftim 1
set rdcrit cond 0.5 per_min
set dosinc bur1 0.02 1000
set titdir up
titratc cond bur1


I'll just add a general comment here.

Yes, you do need a parser and that parser should be a separate module or 
separate class from the rest of your program.  As Thomas Jollans wrote, 
str.split() might be enough to do all of the string twiddling for you.


If you have a separate class (or group of classes) that produces a 
configuration object and a script object then, if you discover examples 
of configuration or script files that you weren't aware of when you 
wrote the code, then may you only need to modify your parser code and 
may not have to modify your script execution logic.


Finally, I want to say that I wish everyone in the U.S. had as much 
command of English as you do.  Si pudiera hablar español tan bien como 
usted habla inglés, estaría muy feliz.  (You should have seen what that 
looked like before I applied Google Translate :)


   Alan

--
https://mail.python.org/mailman/listinfo/python-list