Joseph L. Casale wrote: >> I'm not sure what exactly you're asking for. >> Especially "is not being interpreted as a string requiring base64 encoding" >> is >> written without giving the right context. >> >> So I'm just guessing that this might be the usual misunderstandings with use >> of base64 in LDIF. Read more about when LDIF requires base64-encoding here: >> >> http://tools.ietf.org/html/rfc2849 >> >> To me everything looks right: >> >> Python 2.7.3 (default, Apr 14 2012, 08:58:41) [GCC] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8') >> u'det\\3310wbb\\pg' >>>>> >> >> What do you think is a problem? > > Thanks for the reply. The issues I am sure are in my code, I read the ldif > source file and up > with a values such as 'det\3310wbb\pg' after the base64 encoded entries are > decoded. > > The problem I am having is when I add this to an add/mod entry list and write > it back out. > As it does not get re-encoded to base64 the ldif file ends up seeing a text > entry with a ^] > character which if I re-read it with the parser it causes the handle method > to break midway > through the entry dict and so the last half re-appears disjoint without a dn. > > Like I said, I am pretty sure its my poor misunderstanding of decoding and > encoding. > I am using the build from http://www.lfd.uci.edu/~gohlke/pythonlibs/ on a > windows > 2008 r2 server. > > I have re-implemented handle to create a cidict holding all the dn/entry's > that are parsed as > I then perform some processing such as manipulating attribute values in the > entry dict. I > am pretty sure I am breaking things here. The data I am reading is coming > from utf-16-le > encoded files and has Unicode characters as the source directory is globally > available, being > written to in just about every country.
Processing LDIF is one thing, doing LDAP operations another. LDIF itself is meant to be ASCII-clean. But each attribute value can carry any byte sequence (e.g. attribute 'jpegPhoto'). There's no further processing by module LDIF - it simply returns byte sequences. The access protocol LDAPv3 mandates UTF-8 encoding for Unicode strings on the wire if attribute syntax is DirectoryString, IA5String (mainly ASCII) or similar. So if you're LDIF input returns UTF-16 encoded attribute values for e.g. attribute 'cn' or 'o' or another attribute not being of OctetString or Binary syntax something's wrong with the producer of the LDIF data. > Is there a process for manipulating/adding data to the entry dict before I > write it out that I > should adhere to? For example, if I am adding a new attribute to be composed > of part of > another parsed attr for use in a modlist: > > {'customAttr': ['foo.{}.bar'.format(entry['uid'])]} > > By looking at the value from above, 'det\3310wbb\pg', I gather the entry dict > was parsed > into byte strings. I should have decoded this, where as some of the data is > Unicode and > as such I should have encoded it? I wonder what the string really is. At least the base64-encoding you provided before decodes as UTF-8 but I'm not sure whether it's the right sequence of Unicode code points you're expecting. >>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8') u'det\\3310wbb\\pg' I still can't figure out what you're really doing though. I'd recommend to strip down your operations to a very simple test code snippet illustrating the issue and post that here. Ciao, Michael. -- http://mail.python.org/mailman/listinfo/python-list