Font Handling

Context

Fonts come up in docx4all (and docx4j):

- rendering the docx on the screen

- choice of fonts in drop down list

- rendering as HTML or PDF

Philosophy

Today, most docx documents are created using Microsoft applications: Word 2007, Word 2003 (with the compatibility pack) or Word 2008 (on the Mac).

Those applications use Microsoft's ClearType collection fonts by default (Calibria and Cambria). Those fonts are not readily available if you aren't using one of those programs (or other parts of the same versions of Office) or Vista.

We don't want to encourage the use of those fonts unless/until Microsoft makes them freely available. See this well-written post on the issues.

At the same time, docx4all is likely to be used to work on documents that people using Word 2007 will also work on.

So we don't particularly want to encourage people to use fonts available on their Mac or Linux box, which won't be familiar to Word 2007 users.

Approach

Our approach is to give people a drop down list of common Word 2007 fonts.

This list will not (but might, as at 2008 03 21) include the ClearType collection fonts unless they are already in use in the document (although there will be an option you can explicitly turn on to make them available).

If you create a new document from scratch in docx4all, the default font will be Times New Roman (unless "Enable Microsoft closed fonts" is checked - TODO). That's the default font from earlier versions of Word, and is able to be installed on non-Windows systems.

If the actual font is available on the computer, we will use it.

If the actual font is embedded in the document, we're also able to use it. Interestingly, the license for the !Cleartype collection fonts is "editable embedding allowed", which allows them to be "installed temporarily on the remote system". So docx4all will install them temporarily (think IE/Windows' Temporary Internet Files).

If the actual font is not available, we use the Panose system to find corresponding fonts on the local system. (This works for PDF output, and the Swing AWT GUI; for HTML output, a TODO is to specify a fallback in the font family cascade).

At the moment, we're focusing on Western fonts (though contributions which extend our font handling to Asian, Hebrew etc are welcome).

Bold and Italic Handling

If we know that a font has bold, italic and bold italic TTF files, we get those and can use them for PDF rendering. There is no reason that this wouldn't work even if we weren't sure that a bold version exists - but if it doesn't, some other font would get substituted.

Not sure what happens to bold for a font for which we haven't provided an embedding.

docx4all currently uses a different mechanism to display bold etc.

Status

As at 21 March 2008, all fonts in the docx4all-fonts-bolditalic.docx document are displayed correctly on Vista in both the editing panel and PDF output, with the following exceptions:

- Arial Black (displayed in editor, but not PDF - strange, since substituter does find it via panose)

- Impact - Illegal Panose Array: Invalid value 9 > 8 in position 5 of [ 2 11 8 6 3 9 2 5 2 4 ] -- should ask the Microsoft font people what the story is with this (though they got it from Monotype Corp)

- Wingdings, Marlett, Symbol - look ok in editor, but symbols not displayed in PDF - fop.fonts.autodetect.FontInfoFinder? can't load either, because Unicode cmap table not present.

- Palatino Linotype - bold, italic, bolditalic matched LucidaBright? font family, which we can't fix without specifically handling 2 special cases.
PalatinoLinotype?-Bold .. [ 2 4 7 2 6 3 5 10 2 4 ] (I think the 10 here in bold is an error in the Panose info contained in the font)
PalatinoLinotype?-Italic .. [ 2 4 5 2 5 3 5 10 3 4 ] (problem here is 5 _3_ 5 -- the normal form has 555)