Selecting a font for server side images based on available characters

There are many scripts available for rendering image headers or buttons on the server to ensure the correct font is displayed to the user. Typically these take at least two parameters: the text to be rendered, and the font to render them in.

Problems can occur (especially with multi-lingual sites) when the text to be rendered contains characters not available in the first choice font (such as Japanese or Russian text). Here is one solution:

For each font we want to use we need to extract the CMAP. This defines which character codes can be rendered in the font. The following Perl script (courtesy of David Chan) will take an true-type file and output a text file containing all the characters available:

#!/usr/bin/perl
 
use strict;
#use warnings; # Font::TTF::Font spews warnings, so we can't enable this
use Font::TTF::Font;
 
die "Usage: $0 file.ttf\n" unless 1 == @ARGV;
my $ttfFile = $ARGV[0];
my $f = Font::TTF::Font->open($ttfFile) or die "Cannot open $ttfFile: $!";
$f->tables_do(sub { $_[0]->read });
my @tables = @{$f->{cmap}{'Tables'}};
my %codepoints;
for my $table (@tables) {
    for my $codepoint (keys %{$table->{val}}) {
        my $glyphNo = $table->{val}{$codepoint};
        next if $glyphNo == 0;
        # 0 = unknown glyph. XXX should U+FFFD map to this? Don't care anyway
        $codepoints{$codepoint}++;
    }
}
 
binmode STDOUT, ":utf8";
printf "%s", chr($_) for sort {$a <=> $b} keys %codepoints;

This script is used like this:

 perl cmap.pl font.ttf > font.cmap.txt

Once you have generated CMAP text files for all the fonts you want to use, you can modify the font selection part of you image generation script, for example:

<?php
/* ensure mb_internal_encoding is set to UTF-8 */
 
/* returns true if all the characters in $text are available in the $cmap */
function in_cmap($cmap, $text) {
	$cmap = file_get_contents($cmap);
	for($i = 0; $i < mb_strlen($text); $i++) {
		$char = mb_substr($text, $i, 1);
		if(mb_strpos($cmap, $char) === false)
			return false;
	}
	return true;
}
 
/* returns the first fully compatible font from the array $font_list that can render $text */
function get_compatible_font($font_list, $text, $path = 'fonts') {
	foreach($font_list as $font)
		if(in_cmap("{$path}/{$font}.cmap.txt", $text))
			return "{$path}/{$font}.ttf";
	return "{$path}/arialuni.ttf";
}

This script assumes their is a folder that contains ttf/cmap pairs named font.ttf and font.cmap.txt. It also uses arialuni (Arial Unicode) as the ultimate fall-back font. Other unicode fonts could be used.

All scripts are Public Domain.