Please find attached the file I'm trying to parse. It is extracted from a OAI Data Provider in oai_dc format. The challenge is to preserve the Thai characters encoded in UTF-8.
I also have the same data in MARC21 format (UTF-8) encoding. But when I extract the required fields in a delimited file the Thai characters become junk. I was using version 1.38 of MARC::Record. Does the latest version of MARC::Batch handle UTF-8 text correctly?
Thanks for any help in advance.
Regards,
Saiful
On 11/12/05, Edward Summers <[EMAIL PROTECTED]> wrote:
Saiful wrote:
>I looked at DublinCore::Record and couldn't any method for opening
a file for reading.
That's because there are so many ways DC data can be encoded. If you
are interested in extracting DC from HTML you can take a look at
HTML::DublinCore [1]. What kind of file are you trying to read your
DublinCore from?
//Ed
[1] http://search.cpan.org/dist/HTML-DublinCore/
<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2005-11-10T19:02:52Z</responseDate> <request verb="GetRecord" identifier="oai:tcdc.or.th:4989" metadataPrefix="oai_dc">http://61.90.146.67:8000/cgi-bin/oai2/VTLS/Vortex.pl</request> <GetRecord> <record> <header> <identifier>oai:tcdc.or.th4989</identifier> <datestamp>2005-09-15</datestamp> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:type>text</dc:type> <dc:source>http://203.154.235.51:8000/cgi-bin/gw/chameleon?host=localhost+1111+DEFAULT&sessionid=VTLS&function=CARDSCR&search=KEYWORD&pos=1&u1=12101&t1=4989</dc:source> <dc:language>en</dc:language> <dc:identifier>9742722749 :</dc:identifier> <dc:creator>สุà¹à¸à¸à¸£ à¸à¸¸à¸à¸´à¸à¸à¸£à¸²à¸à¸à¸à¹.</dc:creator> <dc:title>à¹à¸à¸£à¸·à¹à¸à¸à¸à¸à¸à¸à¸£à¸¸à¸à¸¨à¸£à¸µà¸à¸¢à¸¸à¸à¸¢à¸² : à¸à¸¡à¸à¸°à¸¨à¸´à¸¥à¸à¹à¹à¸à¹à¸à¸à¸´à¸à¸ªà¸¢à¸²à¸¡ = The immortal art of Ayutthaya gold / </dc:title> <dc:publisher>à¸à¸£à¸¸à¸à¹à¸à¸à¸¯ : à¹à¸à¸¥à¸ à¹à¸¡à¸à¸´à¸,</dc:publisher> <dc:date>2546.</dc:date> <dc:contributor>à¹à¸¢à¸²à¸§à¸à¸¸à¸ à¹à¸§à¸¨à¸£à¹à¸ าà¸à¸². </dc:contributor> <dc:contributor>ภูวà¸à¸¥ สุวรรà¸à¸à¸µ. </dc:contributor> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>