Wednesday, January 19, 2011

PL5: Expanding your universe

I expect this next test in the series doesn't cover the universe, but it will get you further towards whole Earth applications.

The pseudo-localization applied to your text resources should grow the length of the strings. Depending on your application, I recommend the length is increased by about 50%, at least for short strings. The additional text should also include Asian characters, that is, characters from the Han range of Unicode. Let me explain why.

One aspect of testing is edge cases - testing limits and restrictions to see if they are working and how. This includes minimum, below minimum, maximum, and above maximum lengths. Translation usually grows the length of strings, both in byte count and in display. Fields in the user interface (UI) are often limited in length (byte and display), due to display constraints. Expanded pseudo-localized text will show how the UI could change. And more.

What are you looking for? For starters, you're looking at the layout and aesthetics of the UI with the expanded text. Is text wrapping awkwardly? Is it overlapping other text or screen elements? Is it pushing other objects out of position, skewing the overall layout? Do items line up where they're supposed to? Is the text truncated? Obviously limitations are necessary due to screen size, especially for mobile applications. But translators aren't mind readers - they're probably only seeing text in a resource file or translation tool. The only indication they'll have of a length restriction is the comment that you write next to the string to be translated. Which of course you have done.

Remember that I recommended using Asian text in the pseudo-l10n? This is useful for checking that these characters are legible in the space provided; by legible I mean that the stroke lines are distinct and the tops or bottoms of the character are not cut off. Asian characters are often more intricate and complex, requiring a larger font and additional vertical space to be rendered legibly. However, Asian translations will frequently shrink in width. A pseudo-l10n won't bear that out because it usually adds on to the existing string length. But you could run one that replaces the string with shorter Asian character strings, and see what that does to the UI.

We all need space for understanding, don't we?

Tuesday, January 11, 2011

PL4: For display only

This next pseudo-localization test in the series is very straightforward. Maybe.

You can verify whether your software correctly handles a particular character encoding by pseudo-localizing the resource files into that encoding. Use a broad spectrum of characters from the encoding, if not all of them, to check that the entire set is handled properly. Simply pseudo-localize the resources, bring up the pseudo-localized user interface, and view the text. Of course, you have to know what you're looking at, but if you've familiarized yourself with the set of characters used in the pseudo-localization, that shouldn't be a problem.

But that's not all! Since all of you are using a Unicode encoding for your user interface (a-hem), there's something else you can test. Some characters use a different font depending on the language they are representing. So for example, if characters from the Han section of Unicode are used in your pseudo-localization, you can set the locale to a Chinese region and verify the characters are in a Chinese font. Then change the locale to a Japanese region and see if a Japanese font is used to display them. Again, you have to know what you're looking at, but if your software is responsible for display, this is an important test. (Take a look at these charts, particularly the second and third sections.) If necessary, select a few key Han characters that differ in Chinese and Japanese, make up an image chart, and familiarize yourself with them. As they're used over and over in the pseudo-localization, you'll get to know them well.

Your customers will be happy you did.

PL3: May we dance with your dates?

This is the third entry in my series of the 9 coding areas pseudo-localization can test.

Amongst ourselves in internationalization, we refer to certain types of data as locale-sensitive. This is data that changes in some way from region to region (locale to locale), typically in format. The classic example is a date. And the classic example of this example is the short format in the US, month/day/year, that becomes day/month/year in most European countries; today is January 11, 2011, written 1/11/11 in the US, but 11/1/11 in the UK. (Often this is accompanied by some US baiting about how illogical that format is, but we won't go there and mention that in human terms, the day becomes less important beyond a week or so, and the month comes into prominence, lasting an entire year, when finally the year is needed, so it's stuck on the end. And it's written month day, year in longer formats, as above. But I won't mention any of that here.)

There are many more locale-sensitive pieces of data: numbers, time, prices, measurements, weights, telephone numbers, addresses, sizes, etc. Some of these data formats have been standardized, so that programs can select them using locale identifiers and apply them to the data. The Common Locale Data Repository (CLDR) is a public database with locale formats for many locales throughout the world (though admittedly there are no sizes, nor telephone numbers, nor postal addresses). Internationalization library functions and methods use it for formatting and parsing locale-sensitive data.

As these formats can be accessed programmatically for most of the locales of the world, they should be. That is, rather than externalize a date format for a localizer to alter to suit another locale, dates should be programmatically formatted. Why? The long answer is another blog entry, however the short answer is that there are many more formats than there are localizations. For example, the English localization is usually a single localization, but formats for English-speaking locales vary (see the short date format example above). The same is true for French, Spanish, Traditional Chinese, and so on.

But what does all this have to do with pseudo-localization testing?

You may remember my post about using pseudo-l10n to test whether localized files are picked up when they should be. Or perhaps you don't, in which case you might want to review it. Anyway, once you've set up your system such that the pseudo-localized files are mimicking an actual locale, you can exploit that setup to check on whether locale-sensitive formats are programmatically determined.

Run the system with the locale set to the pseudo-localized files' locale, and pay special attention to the formats of dates and numbers. Now change the locale setting to one that has different formats and run again. Check the dates and numbers - have they changed format? You might want to verify the formats against those listed in the CLDR; if you're looking online, check the summary charts. Start with the language base link, then drill down to get to the specific locale you're looking for.

Change the locale to a third value and check again. The formats should be changing appropriately each time. If not, to your customers today might just look like November 1, 2011.