Hi Ed,

Please find attached the file I'm trying to parse. It is extracted from a OAI Data Provider in oai_dc format. The challenge is to preserve the Thai characters encoded in UTF-8.

I also have the same data in MARC21 format (UTF-8) encoding. But when I extract the required fields in a delimited file the Thai characters become junk. I was using version 1.38 of MARC::Record. Does the latest version of MARC::Batch handle UTF-8 text correctly?

Thanks for any help in advance.

Regards,
Saiful

On 11/12/05, Edward Summers <[EMAIL PROTECTED]> wrote:
Saiful wrote:
>I looked at DublinCore::Record and couldn't any method for opening
a file for reading.

That's because there are so many ways DC data can be encoded. If you
are interested in extracting DC from HTML you can take a look at
HTML::DublinCore [1]. What kind of file are you trying to read your
DublinCore from?

//Ed

[1] http://search.cpan.org/dist/HTML-DublinCore/
<?xml version="1.0" encoding="UTF-8"?>

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>

<responseDate>2005-11-10T19:02:52Z</responseDate>
<request verb="GetRecord" identifier="oai:tcdc.or.th:4989" metadataPrefix="oai_dc">http://61.90.146.67:8000/cgi-bin/oai2/VTLS/Vortex.pl</request>

<GetRecord>
<record>
<header>
<identifier>oai:tcdc.or.th4989</identifier>
<datestamp>2005-09-15</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"; xmlns:dc="http://purl.org/dc/elements/1.1/"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
   <dc:type>text</dc:type>
   <dc:source>http://203.154.235.51:8000/cgi-bin/gw/chameleon?host=localhost+1111+DEFAULT&amp;sessionid=VTLS&amp;function=CARDSCR&amp;search=KEYWORD&amp;pos=1&amp;u1=12101&amp;t1=4989</dc:source>
   <dc:language>en</dc:language>
   <dc:identifier>9742722749 :</dc:identifier>
   <dc:creator>สุเนตร ชุตินธรานนท์.</dc:creator>
   <dc:title>เครื่องทองกรุงศรีอยุธยา : อมตะศิลป์แผ่นดินสยาม = The immortal art  of Ayutthaya gold / </dc:title>
   <dc:publisher>กรุงเทพฯ : แปลน โมทิฟ,</dc:publisher>
   <dc:date>2546.</dc:date>
   <dc:contributor>เยาวนุช เวศร์ภาดา. </dc:contributor>
   <dc:contributor>ภูวดล สุวรรณดี. </dc:contributor>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>

</OAI-PMH>






Reply via email to