星期日, 7月 16, 2006

Investigations on Perl, MySQL & UTF-8

http://lists.gnu.org/archive/html/koha-devel/2006-03/msg00027.html

Because the story of Perl, MySQL, UTF-8 and Koha is becoming more and
more complicated, I've decided to start my tests outside of Koha or any
web server. I wanted to check that Perl and MySQL could communicate
with UTF-8 data.

What I did :

1. copy some UTF-8 strings from
http://www.columbia.edu/kermit/utf8-t1.html paste into a UTF-8 text
file utf8.txt (open/past in UTF-8 console, with Vim having :set
encoding=utf-8)

2. create a UTF-8 database with a simple table having a TEXT field

$ mysql --user=root --password=xxx
mysql> CREATE DATABASE `utf8_test` CHARACTER SET utf8;
mysql> connect utf8_test
mysql> create table strings (id int, value text);
mysql> quit

(no need to set connection character set to utf-8 in that case, default
latin1 is fine)

Note: my MySQL server is latin1...

$ mysql --user=root --password=xxx utf8_test
mysql> status
Server characterset: latin1
Db characterset: utf8
Client characterset: latin1
Conn. characterset: latin1
mysql> set names 'UTF8';
mysql> status
Server characterset: latin1
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8

3. write and execute a Perl script which reads the UTF-8 text file,
insert UTF-8 strings into database, retrieve UTF-8 strings from
database, print UTF-8 strings to STDOUT. See details in attached file
readfile_insertdb.pl. Important note: "set names 'UTF8';" is mandatory.

Everything is *working fine*. My output is in UTF-8, I'm 100% sure of
it.

DBD::mysql : 2.9007
Perl : 5.8.7
MySQL : 4.1.12-Debian_1ubuntu3.1-log
DBI : 1.48

(find your local versions with attached script versions.pl)

I suspect that Paul's data stored in MySQL are not truely UTF-8. Maybe
I miss the point, but it seems Perl, MySQL and UTF-8 are not working so
badly altogether.

沒有留言: