Not everyone uses the normal latin characters that most of the western world uses - there are many languages that use other characters such as Russia - it uses Cyrillic, and there are less extreme examples too such as France having characters with accents and cedillas on them. To make sure characters are displayed correctly you should use UTF-8 instead of ISO-8859.
The first thing you need to do is to specify in your metadata what the content type is. This should be done before the title tag in your header as your title may contain UTF-8 characters too. The result of doing this will look something like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml;charset=utf-8" />
<title>My title</title>
... rest of head goes here ...
</head>
<body>
... body goes here ....
</body>
</html>
Most people leave it there and assume that will make the page work with UTF-8 content, but I'm guessing that if you're reading this then there's a good chance you've found this to be otherwise. Don't worry - most people make this same assumption. If you're using Firefox then you can use an extension called Live HTTP Headers to see what sort of content is being sent. If you open up the extension and refresh the page you're working on it will likely still say ISO-8859-1 and that is because the server isn't set up to serve pages as that as default. The solution is to send the character set as part of the content type which can be done in PHP using a function like:
header('Content-Type: application/xhtml+xml;charset=utf-8');
Remember though, this will cause headers to be sent to the browser whether any have already been sent or not so you don't want this to be used before a page redirect as it will cause an error on the page. Also, note that I have used application/xhtml+xml as the content type - this is not mandatory and text/html should suffice depending on your Doctype. The alternative to setting this in either every PHP script, or in a script included by all pages is to change the configuration of Apache.
AddDefaultCharset UTF-8;
This will change Apaches default characters set to be UTF-8 and is generally the best solution for achieving the desired outcome.
From PHP's point of view, this should now be everything you need to get multi-byte characters to display correctly though you may still find yourself running in to problems if you use any of PHP's string functions (we'll come back to this in a minute). If you're loading the data from a database system such as MySQL then you will also need to ensure the table collation is set to UTF-8 so that the data can also be stored correctly. An alternative to this is to change the configuration of MySQL as suggested with Apache. To do this you will need to find your MySQL config file which will differ in location depending on your setup. If you're running Windows then it will be named my.ini and will either be in your MySQL folder, or in your Windows folder. If you're using a UNIX based system then the file to edit is normall /etc/my.cnf. If it doesn't exist then it should be okay to create the file. Find the section labelled as mysqld and change it to the following, or if it doesn't exist then add it.
[mysqld]
default-character-set=utf8
This will then tell MySQL to default to UTF-8 for all transactions. Looking back at PHP's string functions the problem is two-fold. Firstly there is the problem that functions such as htmlentities will default to using ISO-8859-1 and so will cause characters to be displayed correctly after they have been used in these functions. This can be remedied though by specifying the character set in the function call. The next problem is not so easily solved however. If you take a string that contains multi-byte characters then each "character" could take up one or actual characters to produce that character, and every one of these will be counted when using functions such as strlen. In these cases there are sometimes equivalent functions such as mb_strlen that will allow a character set to be specified so that the result is more accurate.
Another tip worth mentioning is when using regular expressions in PHP it is best to convert multi-byte characters to their Unicode such as \u20a0 to represent a Euro currency symbol. You can't use the ord() function to do this, but if you look up the ord() function on php.net you will find some useful examples on how to achieve the same thing.













