I found it. I thought if any web site might be able to handle unicode, it would be erlang.org, so I went and grabbed some of the header text:
<?xml version='1.0' encoding='utf-8'?> <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'> <html xmlns='http://www.w3.org/1999/xhtml'> <head> <title>test</title> <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/> </head> and it works correctly now. thanks On Fri, Apr 6, 2012 at 3:18 PM, Kresten Krab Thorup <k...@trifork.com> wrote: > It looks like you may have missed specifying the charset when importing your > data; could that be the case? > > You need to specify the charset when importing 8-bit text. It looks like > your xml is utf-8 encoded, so it should be imported using something like this: > > curl -H 'Content-Type: text/html;charset=UTF-8' -X PUT @datafile.xml > http://host:port/riak/bucket/key > > The various language clients have different ways of specifying the charset > for a value; so if you imported the xml using some other method you need to > find out where to specify it. > > Perhaps to verify, you can check the result of a curl -v (verbose, print the > headers) for one of your values. If it does not come back with a charset=XXX > in the Content-Type header, then this is your problem. > > Kresten > > > > On Apr 6, 2012, at 4:44 PM, Wes James wrote: > > I imported many records, one of which looks like this: > > <add> > <doc> > <field name='id'>0</field> > <field name='title'>Ekologie lučních porostů (A)</field> > <field name='author_editor'>Rychnovská, Milena, Emilie Balátová-Tuláčková, > Blanka Úlehlová, Jaroslav Pelikán</field> > <field name='date_of_publication'>1985</field> > <field name='publisher'>Academia</field> > <field name='keywords'>-</field> > <field name='notes'>amazon 5/22/09 Category: Ecology (Y)</field> > <field name='valuation'>8.00</field> > <field name='purchase_price'>10.00</field> > </doc> > </add> > > with > > bin/search-cmd solr books books.xml > > Notice the characters above. In the riak -> cowboy -> webpage it looks like: > > Id: 0 > Title: title: Ekologie luÄ nÃch porostů (A) > Auther Editor: author_editor: Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä > ková, Blanka Úlehlová, Jaroslav Pelikán > Date of Publication: date_of_publication: 1985 > Notes: publisher: Academia > Notes: notes: amazon 5/22/09 Category: Ecology (Y) > Purchase Price: purchase_price: 10.00 > Valuation: valuation: 8.00 > > Is there a way I can fix this? > > Doing an io:format it it looks like: > > Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä ková, Blanka Úlehlová, > Jaroslav Pelikán > > Thanks, > > Wes > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab > Trifork A/S | Margrethepladsen 4 | DK- 8000 Aarhus C | Phone : +45 8732 > 8787 | www.trifork.com<http://www.trifork.com> > > > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com