from:"sonald"

Unicode support in python

2006-10-20 Thread sonald

Hi,
I am using python2.4.1

I need to pass russian text into python and validate the same.
Can u plz guide me on how to make my existing code support the
russian  text.

Is there any module that can be used for unicode support in python?

Incase of decimal numbers, how to handle "comma as a decimal point"
within a number

Currently the existing code is woking fine for English text
Please help.

Thanks in advance.

regards
sonal

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode support in python

2006-10-20 Thread sonald

Fredrik Lundh wrote:
> >http://www.google.com/search?q=python+unicode
>
> (and before anyone starts screaming about how they hate RTFM replies, look
> at the search result)
>
> 
Thanks!! but i have already tried this...
and let me tell you what i am trying now...

I have added the following line in the script

# -*- coding: utf-8 -*-

I have also modified the site.py in ./Python24/Lib as
def setencoding():
"""Set the string encoding used by the Unicode implementation.  The
default is 'ascii', but if you're willing to experiment, you can
change this."""
encoding = "utf-8" # Default value set by _PyUnicode_Init()
if 0:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]
if 0:
# Enable to switch off string to Unicode coercion and implicit
# Unicode to string conversion.
encoding = "undefined"
if encoding != "ascii":
# On Non-Unicode builds this will raise an AttributeError...
sys.setdefaultencoding(encoding) # Needs Python Unicode build !

Now when I try to validate the data in the text file
say abc.txt (saved as with utf-8 encoding) containing either english or
russian text,

some junk character (box like) is added as the first character
what must be the reason for this?
and how do I handle it?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode support in python

2006-10-25 Thread sonald

Fredrik Lundh wrote:
>
> what does the word "validate" mean here?
>
Let me explain our module.
We receive text files (with comma separated values, as per some
predefined format) from a third party.
for example account file comes as "abc.acc" {.acc is the extension for
account file as per our code}
it must contain account_code, account_description, account_balance in
the same order.

So, from the text file("abc.acc") we receive for 2 or more records,
will look like
A001, test account1, 10
A002, test account2, 50

We may have multiple .acc files

Our job is to validate the incoming data on the basis of its datatype,
field number, etc and copy all the error free records in acc.txt

for this, we use a schema as follows
--
if account_flg == 1:
start = time()

# the input fields
acct_schema = {
0: Text('AccountCode', 50),
1: Text('AccountDescription', 100),
2: Text('AccountBalance', 50)
}

validate( schema= acct_schema,
  primary_keys  = [acct_pk],
  infile= '../data/ACC/*.acc',
  outfile   = '../data/acc.txt',
  update_freq = 1)
--
In a core.py, we have defined a function validate, which checks for the
datatypes & other validations.
All the erroneous records are copied in a error log file, and the
correct records are copied to a clean acc.text file

The validate function is as given below...
---
def validate(infile, outfile, schema, primary_keys=[], foreign_keys=[],
record_checks=[], buffer_size=0, update_freq=0):

show("intitalizing ... ")

# find matching input files
all_files  = glob.glob(infile)
if not all_files:
raise ValueError('No input files were found.')

# initialize data structures
freq   = update_freq or DEFAULT_UPDATE
input  = fileinput.FileInput(all_files, bufsize = buffer_size
or DEFAULT_BUFFER)
output = open(outfile, 'wb+')
logs   = {}
for name in all_files:
logs[name]  = open(name + DEFAULT_SUFFIX, 'wb+')
#logs[name]  = open(name + DEFAULT_SUFFIX, 'a+')

errors = []
num_fields = len(schema)
pk_length  = range(len(primary_keys))
fk_length  = range(len(foreign_keys))
rc_length  = range(len(record_checks))

# initialize the PKs and FKs with the given schema
for idx in primary_keys:
idx.setup(schema)
for idx in foreign_keys:
idx.setup(schema)

# start processing: collect all lines which have errors
for line in input:
rec_num = input.lineno()
if rec_num % freq == 0:
show("processed %d records ... " % (rec_num))
for idx in primary_keys:
idx.flush()
for idx in foreign_keys:
idx.flush()

if BLANK_LINE.match(line):
continue

try:
data = csv.parse(line)

# check number of fields
if len(data) != num_fields:
errors.append( (rec_num, LINE_ERROR, 'incorrect number
of fields') )
continue

# check for well-formed fields
fields_ok = True
for i in range(num_fields):
if not schema[i].validate(data[i]):
errors.append( (rec_num, FIELD_ERROR, i) )
fields_ok = False
break

# check the PKs
for i in pk_length:
if fields_ok and not primary_keys[i].valid(rec_num,
data):
errors.append( (rec_num, PK_ERROR, i) )
break

# check the FKs
for i in fk_length:
if fields_ok and not foreign_keys[i].valid(rec_num,
data):
#print 'here ---> %s, rec_num : %d'%(data,rec_num)
errors.append( (rec_num, FK_ERROR, i) )
break

# perform record-level checks
for i in rc_length:
if fields_ok and not record_checks[i](schema, data):
errors.append( (rec_num, REC_ERROR, i) )
break

except fastcsv.Error, err:
errors.append( (rec_num, LINE_ERROR, err.__str__()) )

# finalize the indexes to check for any more errors
for i in pk_length:
error_list = primary_keys[i].finalize()
primary_keys[i].save()
if error_list:
errors.extend( [ (rec_num, PK_ERROR, i) for rec_num in
error_list ] )

for i in fk_length:
error_list = foreign_keys[i].finalize()
if error_list:
errors.extend( [ (rec_num,

Re: Unicode support in python

2006-10-25 Thread sonald

HI
Can u please tell me if there is any package or class that I can import
for internationalization, or unicode support?

This module is just a small part of our application, and we are not
really supposed to alter the code.
We do not have nobody here to help us with python here. and are
supposed to just try and understand the program. Today I am in a
position, that I can fix the bugs arising from the code, but cannot
really try something like internationalization on my own. Can u help?
Do you want me to post the complete code for your reference?
plz lemme know asap.


John Roth wrote:
> sonald wrote:
> > Hi,
> > I am using python2.4.1
> >
> > I need to pass russian text into python and validate the same.
> > Can u plz guide me on how to make my existing code support the
> > russian  text.
> >
> > Is there any module that can be used for unicode support in python?
> >
> > Incase of decimal numbers, how to handle "comma as a decimal point"
> > within a number
> >
> > Currently the existing code is woking fine for English text
> > Please help.
> >
> > Thanks in advance.
> >
> > regards
> > sonal
>
> As both of the other responders have said, the
> coding comment at the front only affects source
> text; it has absolutely no effect at run time. In
> particular, it's not even necessary to use it to
> handle non-English languages as long as you
> don't want to write literals in those languages.
>
> What seems to be missing is the notion that
> external files are _always_ byte files, and have to
> be _explicitly_ decoded into unicode strings,
> and then encoded back to whatever the external
> encoding needs to be, each and every time you
> read or write a file, or copy string data from
> byte strings to unicode strings and back.
> There is no good way of handling this implicitly:
> you can't simply say "utf-8" or "iso-8859-whatever"
> in one place and expect it to work.
>
> You've got to specify the encoding on each and
> every open, or else use the encode and decode
> string methods. This is a great motivation for
> eliminating duplication and centralizing your
> code!
>
> For your other question: the general words
> are localization and locale. Look up locale in
> the index. It's a strange subject which I don't
> know much about, but that should get you 
> started.
> 
> John Roth

-- 
http://mail.python.org/mailman/listinfo/python-list

how can i change the text delimiter

2006-08-30 Thread sonald

Hi,
Can anybody tell me how to change the text delimiter in FastCSV Parser
?
By default the text delimiter is double quotes(")
I want to change it to anything else... say a pipe (|)..
can anyone please tell me how do i go about it?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how can i change the text delimiter

2006-08-30 Thread sonald

Hi Amit,
Thanks for a quick response...
E.g record is: "askin"em"

This entire text is extracted as one string but since the qualifier is
double quotes("), therefore fastcsv parser is unable to parse it.

If we can change the text qualifier to pipe(|), then the string will
look like this:
|askin"em|

But for this the default text qualifier in fastcsv parser needs to be
changed to pipe(|). how to do this?

Also please note that the string cannot be modified at all. Thanks.

Amit Khemka wrote:
> sonald <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Can anybody tell me how to change the text delimiter in FastCSV Parser
> > ?
> > By default the text delimiter is double quotes(")
> > I want to change it to anything else... say a pipe (|)..
> > can anyone please tell me how do i go about it?
>
> You can use the parser constructor to specify the field seperator:
> Python >>>  parser(ms_double_quote = 1, field_sep = ',', auto_clear = 1)
>
> cheers,
> amit.
>
> --
> 
> Amit Khemka -- onyomo.com
> Home Page: www.cse.iitd.ernet.in/~csd00377
> Endless the world's turn, endless the sun's Spinning, Endless the quest;
> I turn again, back to my own beginning, And here, find rest.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how can i change the text delimiter

2006-08-30 Thread sonald

Hi ,
thanks for the reply...

fast csv is the the csv module for Python...
and actually the string cannot be modified because
it is received from a third party and we are not supposed to modify the
data in any way..


for details on the fast CSV module please visit

www.object-craft.com.au/projects/csv/ or

import fastcsv
csv = fastcsv.parser(strict = 1,field_sep = ',') // part of
configuration

and somewhere in the code... we are using

data = csv.parse(line)

all i mean to say is, csv.reader  is nowhere in the code
and somehow we got to modify the existing code.

looking forward to ur kind reply ...




Fredrik Lundh wrote:
> "sonald" wrote:
>
> > Thanks for a quick response...
> > E.g record is: "askin"em"
>
> that's usually stored as "askin""em" in a CSV file, and the csv module
> has no problem handling that:
>
> >>> import csv, StringIO
> >>> source = StringIO.StringIO('"askin""em"\n')
> >>> list(csv.reader(source))
> [['askin"em']]
>
> to use another quote character, use the quotechar option to the reader
> function:
>
> >>> source = StringIO.StringIO('|askin"em|\n')
> >>> list(csv.reader(source, quotechar='|'))
> [['askin"em']]
>
> > Also please note that the string cannot be modified at all.
>
> not even by the Python program that reads the data?  sounds scary.
>
> what's fastcsv, btw?  the only thing google finds with that name is a
> Ruby library...
> 
> 

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how can i change the text delimiter

2006-08-31 Thread sonald

Hi,
I am using
Python version python-2.4.1 and along with this there are other
installables
like:
1. fastcsv-1.0.1.win32-py2.4.exe
2. psyco-1.4.win32-py2.4.exe
3. scite-1.63-setup.exe

We are freshers here, joined new... and are now into handling this
module which validates the data files, which are provided in some
predefined format from the third party.
The data files are provided in the comma separated format.

The fastcsv package is imported in the code...
 import fastcsv
and
 csv = fastcsv.parser(strict = 1,field_sep = ',')

can u plz tell me where to find the parser function definition, (used
above)
so that if possible i can provide a parameter for
text qualifier or text separator or text delimiter..
just as {field_sep = ','} (as given above)

I want to handle string containing double quotes (")
but the problem is that the default text qualifier is double quote

Now if I can change the default text qualifier... to say pipe (|)
the double quote inside the string may be ignored...
plz refer to the example given in my previous query...

Thanks..

Fredrik Lundh wrote:
> "sonald" wrote:
>
> > fast csv is the the csv module for Python...
>
> no, it's not.  the csv module for Python is called "csv".
>
> > and actually the string cannot be modified because
> > it is received from a third party and we are not supposed to modify the
> > data in any way..
>
> that doesn't prevent you from using Python to modify it before you pass it to
> the csv parser, though.
>
> > for details on the fast CSV module please visit
> >
> > www.object-craft.com.au/projects/csv/ or
>
> that module is called "csv", not "fastcsv".  and as it says on that page, a 
> much
> improved version of that module was added to Python in version 2.3.
> 
> what Python version are you using?
> 
> 

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how can i change the text delimiter

2006-09-04 Thread sonald

Hi,
Thanks a lot for the snips you have included in your post...
those were quite helpful...

And about the 3rd party data
we receive the data in csv format ... but we are not supposed to modify
the files provided by the user directly...

Instead we make another file with the same name & different
extensions... and use the new files created by the python for further
processing

> quote_char
> Defines the character used to quote fields that
> contain the field separator or newlines.  If set to None
> special characters will be escaped using the escape_char.
> # That's what you are looking for #

Yes you got me right
I was indeed looking for the quote_char...

> Aha!! Looks like some misguided person has got a copy of the
> object-craft code, renamed it fastcsv, and compiled it to run with
> Python 2.4 ... so you want some docs. The simplest thing to do is to
> ask it, e.g. like this, but with Python 2.4 (not 2.2) and call it
> fastcsv (not csv):
>

I guess... that's true... ;)

Thank you very much.




Thanks a lot for the reponse
John Machin wrote:

> sonald wrote:
> > Hi,
> > I am using
> > Python version python-2.4.1 and along with this there are other
> > installables
> > like:
> > 1. fastcsv-1.0.1.win32-py2.4.exe
>
> Well, you certainly didn't get that from the object-craft website --
> just go and look at their download page
> http://www.object-craft.com.au/projects/csv/download.html -- stops dead
> in 2002 and the latest windows kit is a .pyd for Python 2.2. As you
> have already been told and as the object-craft csv home-page says,
> their csv was the precursor of the Python csv module.
>
>
> > 2. psyco-1.4.win32-py2.4.exe
> > 3. scite-1.63-setup.exe
> >
> > We are freshers here, joined new... and are now into handling this
> > module which validates the data files, which are provided in some
> > predefined format from the third party.
> > The data files are provided in the comma separated format.
> >
> > The fastcsv package is imported in the code...
> >  import fastcsv
> > and
> >  csv = fastcsv.parser(strict = 1,field_sep = ',')
>
> Aha!! Looks like some misguided person has got a copy of the
> object-craft code, renamed it fastcsv, and compiled it to run with
> Python 2.4 ... so you want some docs. The simplest thing to do is to
> ask it, e.g. like this, but with Python 2.4 (not 2.2) and call it
> fastcsv (not csv):
>
> ... command-prompt...>\python22\python
> Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import csv
> >>> help(csv.parser)
> Help on built-in function parser:
>
> parser(...)
> parser(ms_double_quote = 1, field_sep = ',',
>auto_clear = 1, strict = 0,
>quote_char = '"', escape_char = None) -> Parser
>
> Constructs a CSV parser object.
>
> ms_double_quote
> When True, quotes in a fields must be doubled up.
>
> field_sep
> Defines the character that will be used to separate
> fields in the CSV record.
>
> auto_clear
> When True, calling parse() will automatically call
> the clear() method if the previous call to parse() raised
> an
> exception during parsing.
>
> strict
> When True, the parser will raise an exception on
> malformed fields rather than attempting to guess the right
> behavior.
>
> quote_char
> Defines the character used to quote fields that
> contain the field separator or newlines.  If set to None
> special characters will be escaped using the escape_char.
> # That's what you are looking for #
> escape_char
> Defines the character used to escape special
> characters.  Only used if quote_char is None.
>
> >>> help(csv)
> Help on module csv:
>
> NAME
> csv - This module provides class for performing CSV parsing and
> writing.
>
> FILE
> SOMEWHERE\csv.pyd
>
> DESCRIPTION
> The CSV parser object (returned by the parser() function) supports
> the
> following methods:
> clear()
> Discards all fields parsed so far.  If auto_clear is set to
> zero. You should call this after a parser exception.
>
> parse(string) -> list of strings
> Extracts fields from the (partial) CSV re

=?iso-8859-1?q?How_to_allow_special_character's_like_=EF, =F9, acute_e_etc...?=

2006-09-05 Thread sonald

Dear All,
I am working on a module that validates the provided CSV data in a text
format, which must be in a predefined format.
We check for the :

1. Number of fields provided in the text file,

2. Text checks for max. length of the field & whether the field is
mandatory or optional
Example:
Text('Description', 100, optional=True)
Parameters: "Name of the field" => 'Description'
   "Max length "=> 100
   "Optional" => 'True' (the field is not mandaory)

3. valid-text expressions,
Example:
ValidText('Minor', '[yYnN]')

Parameters:
name=> field name
regex   => the regular expression y/Y for Yes & n/N for No

Recently we are getting data, where, the name contains non-english
characters like: ' ATHUMANIù ', ' LUCIANA S. SENGïONGO '...etc

Using the Text function, these names are not validated as they contain
special characters or non-english characters (ï,ù). But the data is
correct.
Is there any function that can allow such special character's but not
numbers...?

Secondly, If I were to get the data in Russian text, are there any
(lingual) packages available so that i can use the the same module for
validation.
Such that I just have to import the package and the module can be used
for validating russian text or japanese text

Regards,
Sonal.

-- 
http://mail.python.org/mailman/listinfo/python-list

Unicode support in python

Re: Unicode support in python

Re: Unicode support in python

Re: Unicode support in python

how can i change the text delimiter

Re: how can i change the text delimiter

Re: how can i change the text delimiter

Re: how can i change the text delimiter

Re: how can i change the text delimiter

=?iso-8859-1?q?How_to_allow_special_character's_like_=EF, =F9, acute_e_etc...?=

10 matches

Site Navigation

Mail list logo

Footer information