TechSoup Stock connects nonprofits and public libraries with donated and discounted technology products. Choose from over 240 products from companies such as Microsoft, Adobe, and Symantec. Visit TechSoup Stock.
Full list of partners and products.
Learn about TechSoup Global
Microsoft Windows XP
Upgrade to the latest multilingual edition of the Windows Operating System.
Admin fee $8 (retail $199)
Message Boards
Multilingual Web Sites
Publishing a community Web site in multiple languages
September 21, 2004
Multikulti started in 1999 after a discussion between Dan McQuillan, a senior IT consultant at the London Advice Services Alliance, and a Citizens Advice Bureau (CAB) manager. The manager's CAB found it almost impossible to provide up-to-date information on benefit law in community languages. Leaflets quickly became out-of-date and were expensive to reprint and distribute. Dan thought the Web would be an ideal tool for updating multilingual information quickly and cheaply.
However, a quick examination of the Web showed just how few UK sites provided good independent information in community languages. The Web may have transformed access to information for millions of people worldwide, but many still remain excluded because they don't speak the right language. A group of us got ambitious and decided to make an attempt at closing that gap. The idea was to create one site that provided good regularly updated advice on matters important to the main non-English speaking communities in the UK -- bread-and- butter matters like debt, employment, housing, health, immigration issues, and welfare benefits.
The pilot project tested out the idea amongst community groups in Haringey, London. They were frustrated with the quality of translated material available. Often it was inaccurate, out of date, or written in a way that clients couldn't understand. For us, the difficulties of dealing with so many languages and so many language communities quickly became apparent. We started with 14 languages and were translating about six information leaflets in each language (the leaflets covered topics like eligibility for welfare benefits, how to register with a general practitioner, asylum status etc.). These translations went through various drafts, which had to be proofread, corrected, sent back to the translator or on to our editorial board. Keeping track of this circulation of documents was a major job in itself.
However, our problems weren't limited to workflow issues. The Internet has gone through several phases, but all the way through it's been an English-based medium. Ideally, we wanted to make the technology work with people's cultures rather than the culture with the technology. This commitment forced us to confront some challenging technical issues, particularly around multilingual scripts (and the languages we'd chosen included Bengali, Gujerati, Arabic, Farsi, Chinese -- all of them non-Roman scripts).
The Internet remains a mainly European-language technology. Every letter displayed on a Web site is sent to a computer as a number. The highest number allowed in HTML was originally 255, which means there aren't enough spare numbers for the different characters that make up non-European languages. One way around this is for each language to have its own "character set." However, we found that character sets weren't available for many of the languages we wanted, and they can only ever display European letters along with one other script -- they aren't multilingual character sets.
We tried another approach, Unicode. This is a different way of linking numbers to characters so that more than 255 numbers are available. But, disappointingly, Unicode wasn't a satisfactory technological solution at the start of the project, partly because the standard was incomplete and partly because browsers implemented it poorly. Since Multikulti was dealing with many languages we needed a simple solution that could also produce high quality print-outs. We decided on Adobe PDF (Portable Document Format) as an alternative. The multilingual content -- our translated information leaflets -- was converted into PDF files and made accessible through the language categories on the Web site.
This worked reasonably well, but PDFs have limitations. They take multilingual content and wrap it in a kind of software cellophane. Furthermore, you need an extra piece of software (Acrobat Reader) to read it, which for site users complicates the process of accessing documents. For us, the Multikulti site in its original form wasn't ideal. It remained a basically English site with multilingual content hanging off it.
Unicode technology, however, moved on. We wanted the information on the site to be available as real text in all the languages, instead of being zipped inside a PDF package. So we began the development of a new site using Unicode (with funds provided by the New Opportunities Fund). A Unicode site would enable Multikulti to comply not only with our own commitment to a culturally appropriate technology but also with accessibility standards like the Web Accessibility Initiative from the World Wide Web Consortium (W3C) from the World Wide Web Consortium (W3C).The Unicode content should be searchable, making Multikulti's resources easier to find via Internet search engines -- see the Dublin Core Metadata standards . Also, as the text is directly editable within Multikulti's Content Management System, it should make the job of keeping the content up to date easier.
To comply with these standards and to make the site truly multilingual, we opted for the Unicode UTF-8 standard UTF-8 standard, which had already been adopted as the standard for HTML 4. Theoretically, all user agents and browsers should now be able to understand Unicode UTF-8 (though we've been waiting some time for browsers and operating systems to catch up).
We divided the conversion to Unicode into three distinct technical areas:
- Generating text
- Storing and delivering text
- Rendering and displaying text
First of all, we made a technical partnership with a team of developers who addressed the second area. They had already tackled the challenge of creating a database and content management system managed by UTF-8. We shouldered the burden of the first area -- generating Unicode content in all languages -- ourselves. For some languages this was relatively easy as Windows Operating System binaries have been based on Unicode since Windows 98/Office 97. But successfully generating complex scripts, such as Indic languages, is difficult (put another way -- it's at the cutting-edge of technological development in this field. Software for Bengali and Gujerati is, even now, being re-written on a weekly basis by the developers).
Our task was to ensure that people could actually read the text. Initially we regarded this as a font issue, but soon discovered it was about much more than fonts. Some character sets require complex character re-ordering and combining. (SeeIntroduction to Writing Systems (pdf) for details.) What you see on screen is very different from what streams in as a coded Web page. Browsers can only build the right glyphs with the help of the operating system (the engineering required to create the correct script comes from the underlying operating system, working at the level of raw character streams). Fortunately, font standards have advanced to accommodate this. The True Type font standard has now blossomed into the Open Type font standard (PDF)which includes all the rules required to make letters. However, an engine inside the operating system is still required to do the work. Our attempted solution was two-fold:
- Identify and make accessible the free and compliant font resources across all site languages.
- Determine by experiment the baseline levels of operating system and browser software needed to view the different languages.
Our experiments revealed that to view certain scripts (especially Bengali and Gujerati), users need to be running Windows XP Pro on a powerful PC. However, experience working with local community groups has shown us that you can't just say, "Get a new PC with Windows XP Pro," and leave it at that. Time, money, and training resources are usually stretched, and there are always other more pressing priorities. As a safety net, we decided to continue to make PDFs available for computers that cannot render the correct script.
So we still have some problems with a fully Unicode-enabled site. We need to make suitable Unicode fonts available to users for all languages, and only recent browser versions will properly display right-to-left and complex scripts. Most people are using old operating systems that have never heard of Open Type fonts. Also the current compliant fonts for these languages are proprietary, and cannot be given away. Our hope for an open and accessible font solution lies in projects like "Freebanglafont ". But we're making some progress. We are extending our system to include remote workflow for translators and proofreaders, and we are developing a glossary system for the translation of key jargon terms (like "home Office" or "maternity leave"). The full Unicode version of the site should be with us this year.
Currently the Multikulti site contains translations of leaflets in ten community languages, covering six key areas:
- Immigration and asylum
- Welfare benefits
- Employment
- Housing
- Health
- Debt
The languages we chose were Albanian, Arabic, Bengali, Chinese, Farsi, French, Gujerati, Somali, Spanish, and Turkish. We selected languages according to whether they represented a newly settled community in the UK, or if the longer-established communities had low levels of spoken English and few translated materials available. Our funding was provided by the New Opportunities Fund.
Article in collaboration with London Advice Services Alliance (Lasa) Knowledgebase , your free online guide to IT for the not-for-profit sector.
Copyright ©2000-2004, Lasa. All Rights Reserved.