UTF-8 support for CodeIgniter

Posted: 2009-08-17
Category: CodeIgniter

Writing an application is easy. Writing an application that supports all characters from multiple languages? Not so easy.

The main problem comes from way back, when the main language in computing was English. The ASCII characters were given numbers from 1 to 128 a-z, A-Z, 0-9 and punctuation. That is fine for the English language but pretty much every other language out there has characters that don't fit in there. To address this, we have UTF-8, which can store extra characters as multiple-bits and is backwards compatible with ASCII.

To make your CodeIgniter application play nicely with UTF-8 you have a few things to think about.

Set the HTTP header in index.php

All requests to CodeIgniter are made through the index.php file which by default sits outside the system/ folder. For this reason it makes a perfect place to add a PHP header for me.

header('Content-Type: text/html; charset=utf-8');

Tell CodeIgniter what's going on

CodeIgniter by default is set to use UTF-8 for much of its internal functionality, so just make sure the charset is set to UTF-8 in your application/config/config.php file.

$config['charset'] = "UTF-8";

Configure database connection

$db['default']['char_set'] = "utf8";
$db['default']['dbcollat'] = "utf8_unicode_ci";

The default here is normally utf8_general_ci which is a pretty weak for of Unicode. This is a quote from StackOverflow which I think explains things pretty well.

"utf8_unicode_ci is generally more accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well."

Set up or convert your MySQL

To get this working you need to use MySQL 4.1 (or higher) with utf8 support enabled, but this is pretty standard for most web-hosts.

CREATE DATABASE example
    CHARACTER SET utf8
    DEFAULT CHARACTER SET utf8
    COLLATE utf8_unicode_ci
    DEFAULT COLLATE utf8_unicode_ci;

If you already have the database set up and running, you can use the following code to convert the database to use UTF-8.

ALTER DATABASE example
    CHARACTER SET utf8
    DEFAULT CHARACTER SET utf8
    COLLATE utf8_unicode_ci
    DEFAULT COLLATE utf8_unicode_ci;

Now the database is ready, you need to add in some tables.

CREATE TABLE blog (
    id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    title VARCHAR(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
    body TEXT COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
As with the database, if your tables already exist then you can use the following code to convert your data. You might get some fairly strange results if it is full of data in another non-English character set, but on the whole this has never been an issue for me. ALTER TABLE blog (
    id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    title VARCHAR(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
    body TEXT COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

And finally, set up your Meta Data.

he last place to set for UTF-8 support is in your HTML in the <head> tag. You can use CodeIgniter's meta() function in the HTML helper or plain HTML. <html>
  <head>
    <?php echo meta('Content-type', 'text/html; charset='.config_item('charset'), 'equiv');?>
    <!-- or -->
    <meta http-equiv="content-type" content="text/html; charset=<?php echo config_item('charset');?>" />
  </head>

I have used the config item instead of just putting in UTF-8 as it makes more sense from a programming point of view. If for any reason your charset is changed in the future, that is one less place to change it.

Now your CodeIgniter application is ready, you need to make sure your database GUI is too. I regularly use Navicat and phpMyAdmin and on both you can set the "MySQL connection collation", so make sure that is set to "utf8_unicode_ci" too or it could show characters wrong and will most likely corrupt your data as you work on it.

For more help developing with UTF-8, take a look at "Handling UTF-8 with PHP" which will explain some of the problems of using the normal string functions on UTF-8 strings, then take a look at PHP's "Multibyte String Functions" manual pages to learn how to handle your new happily stored UTF-8 data.

Comments

Gravatar
Jeffrey

2009-08-17

Hacking index.php is a bad idea because your changes will be overwritten the next time codeigniter is updated.

Gravatar

2009-08-17

You do have a point but I feel index.php is a file that should be kept under control of the developer. The index.php file has never changed substantially in the 2 years I have been using CodeIgniter and I don't believe it ever really will.

Looking at it from another angle, the CodeIgniter User Guide actively encourages you to modify the index.php file to point it to the location of your system/ and application/ folders.

Worst comes to the worst, you download a new version of index.php and put this same line back in.

Gravatar
Mei

2009-08-18

I set the header in MY_Controller with

$this->output->set_header('Content-Type: text/html; charset=utf-8');

Gravatar

2009-08-19

@Mei: My_Controller is the perfect place for this code thanks for mentioning it.

I felt it might be best to leave it out of the instructions as I would then have to explain how to set up MY_Controller and it is not worth doing that for this one line.

If you have MY_Controller, jam it in there. :-)

Gravatar
Brian Gottier

2009-10-09

You could also use a hook to set the header pre_controller. I don't think it matters where you set the header, as long as you remember where you did it. For this reason I use hooks, but it's probably just personal preference.

Also, Phil, in the example to convert a database to UTF-8, is the less than sign at the end a typo?

Gravatar
Oliver

2009-10-19

One additional thing: Make sure the editor you use will save the files utf-8 encoded, too. I had once troubles with broken characters until I fould out that the editor (Eclipse in my case) was set to iso-latin. Changing that fixed the problem.

Oliver

Gravatar

2009-10-19

@Brian: Oops, removed! Thanks for the spot.

@Oliver: Good call. I Completely forgot about setting your text-editor to UTF-8. This has got me in the past several times.

Gravatar
Coolgeek

2009-12-23

Hey, Phil, thanks for this little tutorial.

In the process of refactoring an app that I'm readying for production, I've noticed that the CI code itself does not seem to use the mb_ string functions. For example, substr() is used 112 times, but mb_substr() is not used at all.

I'm new to i10n/l18n, so perhaps I am confused by your pointers to mb_ references. Were they just for background, or do I in fact need to use the mb_functions? If the latter, why isn't the CI code using them?

Also, what are your thoughts on this:

http://hash-bang.net/2009/02/utf8-with-codeigniter/

Is this unnecessary with CI and MySQL configured as you have laid out above?

Gravatar
Sam It

2010-03-02

@Jeffrey - your changes wouldn't be overwritten the next time Codeigniter is updated.

CI doesn't update itself. When you update, you can simple use a DIFF tool with GUI (depending on your OS) to compare the changes.

I have a highly modified (unfortunately) version of CI. That's what happens when you need a different bootstrap and like the idea of hooks better than MVC (which of course, could be cobbined).

I've updated CI twice, last time it took me approx 15 minutes. Both index.php and Codeigniter.php files are modified with many changes, so, from my experience, it wasn't as bad as people make it sound (though, I agree that replacing a directory is way quicker/easier).

Gravatar
Joost

2010-04-21

Thanks Phil, very nice work!

Gravatar
Pedro Luz

2010-11-19

Thanks for the article, really helpfull

Gravatar
Ran

2010-11-24

very helpful article ! thanks ! i was stuck with German umlauts ä ü ö ß !

Gravatar
Nedim

2011-02-26

Thank you ! Helped a lot.

Gravatar
Miciah Gris T. Amberong

2011-03-26

Thank you.

Gravatar
Jesus Trujillo

2011-04-14

Hello and Thanks for this nice article, it helped me. But I was having problems to import a csv file to my database (database table), because the csv file had "accents" like "á" or "ó". I fixed the problem with PHP before the sql insert queries with this sentence:

mysql_query ("SET NAMES 'utf8'");

Thanks,
Jesus Trujillo

Gravatar
Vishal

2011-05-26

UTF-8 not working. when i post ! it converts !. Please suggest where is the problem.

Gravatar
Vishal

2011-05-26

UTF-8 not working. when i post "!" it converts !. Please suggest where is the problem.

Gravatar
Mashrura Tasnim

2012-06-13

THANK YOU SO MUCH !!!!!

Gravatar
Zack

2012-06-18

Old article but still very helpful. Thanks a lot :)

Gravatar
Elvis

2013-01-02

Hello Phil, I need to get a $POST data from a form ($POST['string'] ) and convert it to UTF-8 because I receive it as ASCII instead UTF-8, and also need use rijndael256 to encrypt it and later compare with the same string encoded as UTF-8 and encrypted with rijndael256.

At this time the form is sending data as ASCII and when I apply the encryption, the result is not matching with the UTF-8 encrypted string I've tried the iconv() to convert the $_POST from ascii to utf-8 but is not working at all.

Posting comments after three months has been disabled.