TAUS Data Association

Friday
Jul 30th
Text size
  • Increase font size
  • Default font size
  • Decrease font size

Management Questions

Why should we join TDA?

Companies join TDA to increase translation efficiency and improve translation quality. The relative small cost of contribution fees buys members a share in the invested capital from the industry at large in translation automation and business innovation.

 

Will we lose our IP rights to the TMs we share?

Every member agrees to the TDA Data Provider & Pooling Conditions. This is a contract that protects the IP rights of the data owner. The data owner does not give up its IP rights, only agrees to sharing the translated data to support the development of new services and technologies.

 

Is membership in TDA limited to 'clients' or the owners of the data?

No, TDA is open to all stakeholders in the global translation industry: buyers of translation as well as providers of services and technologies, translators, consultants and universities. There are different membership levels. See Membership Program.

 

Where are the data physically stored?

TDA works with one of the largest independent data hosting companies in the world, called Rackspace. The TDA servers are located at the Rackspace Dallas (Texas) facilities.

 

How is the security of data guaranteed?

Only members of TDA can share (upload and download) TMX files on the TDA platform and the system is password protected.  Members can use HTTPS or Secure Socket Layer (SSL) connection during upload and download to encrypt the data.  Once uploaded to the TDA platform, the TMX files are stored behind a firewall managed by a reputation hosting company.  Our Language Search Engine is available to the public for searching bilingual terms and phrases.  However, users cannot download data via the Language Search Engine.

 

Can I restrict access to my data on the TDA platform?

Access to the data on the TDA platform is restricted to TDA members. All members have access to all data on the TDA platform. The Language Search is available to the general public at no cost, but the Language Search only allows look-up of terms and phrases.

 

What is the membership duration?

Upon joining TDA a member signs up for one year. Membership starts upon receipt of payment. A reminder for renewal will be issued by TDA in the 12th month.

 

What is the difference between TAUS and TAUS Data Association (TDA)?

TAUS is a 'sister' organization that is doing research, publishes reports and organizes industry meetings and forums for its members. The idea for the TAUS Data Association (TDA) was originated at a TAUS Executive Forum. TDA exclusively aims at providing an industry platform for storing and sharing of language data and developing member services. Many TDA members also subscribe to TAUS.

 

Should a service or technology provider also submit data?

TDA is built on the principle of reciprocity, in other words you can take data but you also have to submit data. Service and technology providers can ask their clients approval to submit data on their behalf on the TDA platform.

 

How are decisions about TDA taken?

TDA is a not-for-profit Association established in the Netherlands. The Statutes define a two layer management level consisting of a Supervisory Board and an Executive Director. The Supervisory Board is elected by the General Meeting. The General Meeting of all members is held at least once a year, normally in October. The General Meeting will approve the financial report, discuss and amend the Member Regulations and the technology roadmap of the association.

 

How does the Language Search Engine work?

The Language Search Engine (LSE) is a free service open to the general public. It allows translators, validators, support professionals, developers, customers (basically anyone) to look up terms and phrases and their translations. The LSE is a very sophisticated linguistic search, allowing users to see part-of-speech, all inflections of terms, industry category, context and 'calculated' translations. Future releases will also provide attributes, such as frequency, data owner, quality, synonyms. The LSE helps to improve quality of translation and solve bottleneck in review and validation of translation work.

 

Can I ask my service provider to submit my TMs for me?

Yes, data owners do not have to become members of TDA. They can ask their language service provider (LSP) to submit their TMs for them. The TDA system will issue an email to the data owner every time data have been submitted on their behalf.

 

How do I decide whether I should opt for a Special, Corporate or Regular Membership level?

Special Members gets 10 votes in the General Meeting and may nominate candidates for the Supervisory Board. Corporate Members have unlimited data pooling rights, while Regular Members may download maximum 5 million words per year.

 

User Questions

Does TDA apply quality assurance on the data?

Quality Assurance on the data shared in TDA is focused on the most general and common quality criteria, such as:

1.       TMX compliance (filtering tool-specific tags)

2.       Missing translations (empty segments)

3.       Filtering duplicate records

4.       Filtering invalid XML tags

Furthermore TDA has implemented a peer review QA tool, that allows users to share quality feedback.

 

Which locale codes does the TDA platform support?

A.   Name : Code

Arabic : ar-AR                    
Arabic (Egypt) : ar-EG            
Arabic (Saudi Arabia) : ar-SA     
Bulgarian : bg-BG                 
Chinese (Hong Kong) : zh-HK       
Chinese (PRC) : zh-CN             
Chinese (Taiwan) : zh-TW          
Czech : cs-CZ                     
Danish : da-DK                    
Dutch (Belgium) : nl-BE           
Dutch (Netherlands) : nl-NL       
English (Australia) : en-AU       
English (Canada) : en-CA          
English (United Kingdom) : en-GB  
English (United States) : en-US   
Estonian : et-EE                  
Finnish : fi-FI                   
French (Canada) : fr-CA           
French (France) : fr-FR           
German (Germany) : de-DE          
Greek : el-GR                     
Hungarian : hu-HU                 
Italian (Italy) : it-IT           
Japanese : ja-JP                  
Korean : ko-KR                    
Latvian : lv-LV                   
Lithuanian : lt-LT                
Maltese : mt-MT                   
Polish : pl-PL                    
Portuguese (Brazil) : pt-BR       
Portuguese (Portugal) : pt-PT     
Romanian : ro-RO                  
Russian : ru-RU                   
Slovak : sk-SK                    
Slovene : sl-SI                   
Spanish (Castilian) : es-ES       
Spanish (Mexico) : es-MX          
Swedish : sv-SE                   
Turkish : tr-TR                   
Ukranian : uk-UA                  
Welsh : cy-GB

You want to make sure that the locale codes in your TMX files match the locale codes above.  Otherwise, the data provision job will fail.  Please email CustomerService@tausdata.org if you would like to add another locale code.

 

How do I remove invalid XML characters (mostly in Trados TMX files)?

1.  Many of the invalid XML characters are between the <prop type="RTFFontTable"> and the </prop> tags.  They are located between the <header></header> tags.  I think those are font props for MS Word.  We don't need the data.  You can just delete everything from <prop type="RTFFontTable"> to </prop> (including the prop tags).  That will take out most of the invalid XML characters.  So, you will end up with something like this,

<header
creationtool .....
.........
>
</header>

2.  Then, run the file through LISA's TMXCheck tool.  Sometimes you will still see some invalid XML characters in the body.  You need to delete them.

3.  Do NOT worry about the other validation errors such as "The element has been deprecated ..."  You only need to take out the invalid XML characters.