Sample Ad Advertise your business on myplick. Only $2.00 a month.
Comments:
Notes:
Slide 1: Kent State University
Gregory M. Shreve
Software Localization and Internationalization: How and Why
1
Shreve
11/7/2004
Slide 2: Kent State University
Internet, E-Commerce & Foreign Markets
Internet World Stats estimates the current number of WWW users at 785 million. Of these, 29% reside in North America, 27.7% reside in Europe, and 31% reside in Asia with penetration rates of 69.8%, 29.9% and 6.7% respectively. With 58.7% of current users residing in regions with an average penetration rate of only 18.3%, it is clear that these foreign markets offer substantial rewards for those prepared to enter them.
2 Shreve
The growth of the Internet and e-commerce over the next decade will be driven by the expansion of foreign markets.
11/7/2004
Slide 3: Kent State University
Consumer as Foreigner
In 2003 e-commerce sales to foreign customers exceeded domestic sales. This year the European Internet economy is expected to break the 4 trillion dollar mark, growing at a compound annual rate of 87%. Western Europe is expected to lead all regions with 692 billion dollars in global online exports in 2004. North America will move 23% of its exports online, with the U.S. pumping 210 billion dollars into cross border e-commerce. The Asia-Pacific region will reach 219 billion dollars in 2004, sparked by 57 billion dollars in Japanese online exports.
3
Shreve
11/7/2004
Slide 4: Kent State University
Global, Globalize, Globalization
Companies that intend to sell online will have to globalize their web presence and their products to reach the majority of the online marketplace. They will have to make their web sites, software interfaces, and product documentation available in the languages and cultural styles of an increasingly diverse and international market by applying a process called localization – the translation of content and adaptation of interface and form to reflect the expectations of one or many given locales. For global-strategy American companies, over 40% of total revenue comes from international sales. These companies market hightechnology products such as software, medical instrumentation, CAD / CAM devices, and so on.
4 Shreve 11/7/2004
Slide 5: Kent State University
Global, Globalize, Globalization
Most of these products have a high document overhead, with instructions on the assembly, use, maintenance, and repair of the products delivered via off- and on-line electronic documentation. Most are marketed and supported online. Further, many products may have embedded software components and user interfaces use online databases. These products and documents must be delivered to locales, target markets with different cultural and linguistics contexts. Support customer, technical, web UI user interfaces CBT computer-based-training
5 Shreve
Marketing packages, web Documentation manuals, help files
11/7/2004
Slide 6: Kent State University
Language Industry
While global marketing existed before the 1990’s, the translation / software localization industry (or “language industry” for short) today has evolved primarily as a result of the rapid global expansion of the computer software market and the increasing use of the Internet as a global marketing and customer service tool – all part of globalization. The corporate problem is, of course, that many companies do not understand HOW to prepare their many products, documents, web pages and database interfaces for distribution in other linguistic and cultural locales – hence the need for the services of the language industry.
6 Shreve 11/7/2004
Slide 7: Kent State University
New Media, New Markets
Experts estimate the current worth of the U.S. language industry at just under $2 billion annually, with the global market worth approximately $6 billion. Indications are that growth will continue to be strong into the next decade because of new electronic media and markets. Consider the case of massively multiplayer online games (MMOGs): the language industry enables the publishers of these games to leverage their initial development investment by translating and adapting the games for international locales. Industry projections are that MMOGs will post a 52% cumulative annual growth rate between 2002 and 2006.
7 Shreve 11/7/2004
Slide 8: Kent State University
Initial Definitions
This presentation examines the issues and processes involved in software internationalization and localization. There are three related major processes to consider. We have already discussed globalization. • globalization, a strategic decision to reach an international audience or to include different linguistic and cultural materials in a product, software application, web site or digital collection; • internationalization, a design process intended to enable efficient and cost-effective subsequent linguistic and cultural adaptation; • localization, the preparation of locale-specific versions of an application’s interface and content.
G11N
8 Shreve
L10N
I18N
11/7/2004
Slide 9: Kent State University
Internationalization & Localization
Localization is the preparation of locale-specific versions of a software application, electronic document, internet resource, or digital collection. It consists of the translation of textual material into the language and textual conventions of the target locale and the adaptation of non-textual materials and delivery / display mechanisms to take into account the cultural requirements of that locale. globalization internationalization localization translation Internationalization is an “upstream” engineering process that should precede localization. Its aim is to make subsequent localization/translation easier, more efficient, and less costly.
9 Shreve 11/7/2004
Slide 10: Kent State University
Scope of Processes
Each of these processes has a different scope and occurs at a different point in the business and document cycles of an organization. globalization
organizational policies & strategies
Earlier
business, IT, & document processes
internationalization
Later
localization
10 Shreve
translation
documents, interfaces, tools
11/7/2004
Slide 11: Kent State University
Evolution of Software Localization
Software localization developed as part of the globalization of the personal computer software market. Software applications and supporting electronic documents were the first “localized” products. The growth of the Internet and the World Wide Web created a demand for localized web pages and sites. Digital multimedia and digital repositories (including digital libraries) are emerging foci of localization.
2005 repositories multimedia WWW
PC software
1980
11
Shreve
11/7/2004
Slide 12: Kent State University
display
Document: Display and Content
non-linguistic
document document document documents
color, graphics, icons, symbols, display organization date, time, calendar, currency, number, address interface: menus, dialogs, messages, prompts, alerts, document organization, writing system metadata, vocabularies content: help files, auxiliary documents, HTML / XML document content
content
Localization focuses on both display (appearance, presentation) and content. Thus, localization includes a cultural adaptation as well as a linguistic translation component.
12 Shreve
linguistic
11/7/2004
Slide 13: Kent State University
Localizing Software Applications
Software applications were the first localized “electronic documents Early localization included finding all “strings” embedded in code:
#include <stdio.h> main() { int n; char y[5]; printf("This program converts decimal numbers to hexadecimal\n\n"); while(1) { printf("\nEnter decimal number: "); scanf("%d",&n); printf("\nNumber entered is <%d> decimal and <%x> hexa",n,n); printf("\nDo you want to continue? "); scanf("%s",y); if(strcmp(y,"yes")) { printf("\n exiting ..\n"); exit(); } } source.c }
13 Shreve
strings are directly in code
11/7/2004
Slide 14: Kent State University
Extract Localizable Resources
PortfolioMenu MENU BEGIN POPUP "&File" BEGIN MENUITEM "&Add Student",1 MENUITEM SEPARATOR MENUITEM "&Delete Student", 2 MENUITEM SEPARATOR MENUITEM "&Update Student", 3 MENUITEM "E&xit", 4 END POPUP "&Tools" BEGIN MENUITEM "Add &Portrait", 5 END POPUP "&Help" BEGIN MENUITEM "About Portfolio", 6 MENUITEM SEPARATOR MENUITEM "Contents", 7 END END
Strings are not the only localizable material: • dialog boxes • controls • labels • menus • icons • graphics • tooltips
RESOURCES
14
Shreve
11/7/2004
Slide 15: Kent State University
Localizing Web Pages
Web sites are also now being localized. The link below points to a commented HTML file that gives a simple introduction to localizing an HTML web page. At the localizer’s level some of the issues (not an exhaustive list) are: character sets localizing tag content recognizing which tags have localizable content not breaking tags looking for text generated by attributes (title, alt) looking for text generated by scripts (server-side, client-side) evaluating CSS and stylesheet changes making changes to graphics dealing with graphics with integral text
Localization of HTML
15 Shreve 11/7/2004
Slide 16: Kent State University
A Solution: Re-Engineer the Software
As one could imagine, localizing directly in code led to problems. First, translator / localizers were quite capable of “breaking code.” There were also problems associated with the necessity for multiple “re-builds” of the basic software for each language version. Language expansion (differences in textual volume) created sizing problems in dialogs and controls. Localization was labor-intensive, difficult and expensive. A solution was to reengineer the software with the intent of separating language resources from the underlying delivery mechanism.
16 Shreve 11/7/2004
Slide 17: Kent State University
Internationalization: Separate Resources
Internationalization is a reengineering and re-design process intended to make localization and translation easier, faster and more costeffective. A first step in the internationalization of software applications is the separation or extraction of linguistic and cultural resources from the application, leaving a “neutral” software kernel. Extraction requires specialized localization tools. application software
resources
kernel
17
Shreve
11/7/2004
Slide 18: Kent State University
Extract Localizable Materials
#include <stdio.h> extern unsigned char *intl_m_msg(), *intl_f_msg(); main() { int n; char y[5]; printf(intl_m_msg("","mypg" ,1)); while(1) { printf(intl_m_msg("","mypg" ,2)); scanf("%d",&n); printf(intl_m_msg("","mypg",3), n,n); printf(intl_m_msg("","mypg",4)) ; scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)) ; exit(); } source.c } }
18 Shreve
EXTRACT
1 2 3 4 5 6
This program converts decimal numbers to hexadecimal\n\n" \n Enter decimal number: \n Number entered is <%d> decimal and <%x> hexa \n Do you want to continue? \n exiting ..\n yes"
mypg.en
11/7/2004
Slide 19: Kent State University
Extract Localizable Materials
#include <stdio.h> extern unsigned char *intl_m_msg(), *intl_f_msg(); main() { int n; char y[5]; printf(intl_m_msg("","mypg" ,1)); while(1) { printf(intl_m_msg("","mypg" ,2)); scanf("%d",&n); printf(intl_m_msg("","mypg",3), n,n); printf(intl_m_msg("","mypg",4)) ; scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)) ; exit(); } source.c } }
19 Shreve
TRANSLATE
1 2 3 4 5 6
Ce programme convertit les nombres décimaux en hexadécimal\n\n \nEntrer le nombre décimal: \nLe nombre entré est <%d> décimal et <%x> hexadécimal \nVoulez vous continuer? \nSortie ..\n oui
mypg.fr
11/7/2004
Slide 20: Kent State University
Content and Display in Web Pages
Web pages share the problem of “separation of content and coding” with application software. You can see from our web page example how true this is. Internationalization solutions in web pages also involve the “extraction” of linguistic and cultural material from the software vehicle. Cutting edge solutions create dynamic HTML from XML-based language content.
<gradinquiry> <name> <firstname>Joan </firstname> <lastname>Smith</lastname> </name> <address> <addressline1>266 South Prospect Street</addressline1> <addressline2/> <BODY> <city>Kent</city> <TABLE> <state>Ohio</state> <TR><TD>Joan</TD><TD>Smith</TD></TR> <zip>44240</zip> <TR><TD>266 South Prospect Street</TD></TR> </address> <TR><TD>Kent</TD></TR> <country>USA</country> <TR><TD> Ohio</TD></TR> <phone>330-673-9999</phone> <TR><TD> 44240</TD></TR> <fax>330-672-4017</fax> . <email>gshreve@neo.rr.com</email> . </gradinquiry> . <TABLE> <BODY> Shreve 11/7/2004
HTML
XML
20
Slide 21: Kent State University
Principle of separating linguistic from software elements as used in software localization content is “dynamically” inserted in generated local page templates XSL transforms
Two Multilingual Web Architectures
Multiple static versions of pages stored in a folder hierarchy by language and navigated by selection mechanism
language selection
OLD
static web page is selected and displayed
NEW
multilingual XML content
21
Shreve
11/7/2004
Slide 22: Kent State University
I18N Content Management
Style Sheet Repository
format
Dynamic Pages
deploy translation
Content Repository (archive, database)
localization
Display Medium
organize, classify
XML Representation (content only, strip format)
22 Shreve
acquire information
This system assumes an Internationalized dynamic web page architecture
11/7/2004
Slide 23: Kent State University
Internationalization: Control
Truly effective internationalization also involves early intervention in and re-design of “upstream” business and document processes like authoring to exert greater control and to reduce variability. creation: authoring
storage
document document document documents
retrieval
rendering distribution
23 Shreve
acquisition
11/7/2004
Slide 24: Kent State University
Internationalization & Authoring
For instance, intervention in and re-design of document creation processes (authoring) can yield significant “downstream” benefits for localization. Controlled language and terminology control are two strategies.
technical writers dependency
I18N
controlled languages terminology control
help text software documents
machine translation
L10N
24 Shreve
localization vendor
11/7/2004
Slide 25: Kent State University
Internationalization & Localization
technical writers controlled languages terminology control
help text software documents
Internationalization engineers work with or for clients to create internationalized products.
I18N
resources
L10N
localization vendor
software internationalization tools
25 Shreve
internationalization engineers
localizable software distribution
11/7/2004
Slide 26: Kent State University
Localization Management & Tools
A localization project requires its own processes and tools.
project management tools
QA/testing / validation tools
L10N
localizable software distribution
localization project
workflow management
localization tools
document / version control
translators / localizers
26 Shreve 11/7/2004
Slide 27: Kent State University
Localization Management & Tools
project manager localization engineer localizable software distribution
localization project
localization tool (enterprise)
Translation memories and terminology managers are important tools for maintaining standardized translations and glossaries. TMs provide the focus of QA, ensure replicability / repeatability, and allow re-use of linguistic and cultural materials.
localization tool (translator)
translation memory terminology manager
27 Shreve
localization toolkit (distribution) translators / localizers
11/7/2004
Slide 28: Kent State University
Localization Management & Tools
Specialized localization for alignment and term extraction are used to automate the construction of TMs.
text alignment tool translation memory terminology manager term extraction tool localization toolkit (distribution) translators / localizers localization tool (translator)
28
Shreve
11/7/2004
Slide 29: Kent State University
Reusability
translation memory
30% change
new version uses 70% same text
latest version uses 80% same text as previous 20% change
Version 2
Version 3
Version 1
Reusability is an especially important objective of internationalization and reduces the cost of localization.
11/7/2004
initial translation with TM tool
29
Shreve
Slide 30: Kent State University
Goals of Internationalization
The goals of internationalization are:
reusability translations
scalability
I18N solution
authority / quality
equivalence
accessibility
cross-language
accuracy / acceptability
target culture(s) target document
control
These goals are met by separating content from display, defining and extracting culturally variable material from fixed or neutral material, intervening in the document cycle to exert control over document processes, and using translation memories and terminology management to ensure critical characteristics such as authority and reusability
11/7/2004
30
Shreve
Slide 31: Kent State University
Enhanced Corpora
Future directions in internationalization will involve exploiting document corpora more effectively and extracting useful linguistic and textual objects for control and re-use. Control of the document cycle begins with understanding the documents we already “own” and enhancing them.
31 Shreve 11/7/2004
Slide 32: Kent State University
New Localization Objects
Corpus
Many linguistic objects useful in computer-assisted authoring and translation, web page localization, machine translation and crosslanguage information retrieval (including browsing) can be extracted from a wellunderstood and deliberately structured document corpus.
11/7/2004
32
Shreve
Slide 33: Kent State University
Corpus Replication
Using statistical techniques it is possible to replicate the contents of a monolingual corpus and add multilingual equivalents for terms, phrases, document segments and other objects to it.
33
Shreve
11/7/2004
Slide 34: Kent State University
What The Industry is Doing Now
The language industry currently relies on using translation memories and terminology managers. There are significant drawbacks to this method that prevent new gains in cost reduction and profitability – the goal of internationalization.
34 Shreve 11/7/2004
Slide 35: Kent State University
A New Model
New approaches to internationalization and automatic localization leverage the linguistic value of existing corpora and allow the creation of “enhanced” corpora whose contents are understood and controlled. Statistical corpus linguistics and XML combine to allow the next step in localization technology.
35 Shreve 11/7/2004
Slide 36: Kent State University
Peer-to-Peer Localization Resources
A peer-to-peer networking platform with a security and digital rights management layer can be used to link clients in an XML resource network. A vendor can assess per transaction charges for access to corpus object stores.
36
Shreve
11/7/2004
Slide 37: Kent State University
Socio-Cultural Style Sheets
The peer-to-peer networking platform can also be used to provide new capabilities for next generation localization. Client-Side Socio-Cultural Stylesheets (CSSCS) can provide for automated solutions to on-the-fly provision of web content in the languages and formats desired by and expected by web users all over the world.
37 Shreve 11/7/2004