Latest Entries

Member use cases

We just published a few case studies on tausdata.org for a some of the more advanced implementations. A few are in summary form, one provides an extensive debrief of a Moses implementation which used TDA data, and another is more in the form of a testimonial. The links below will take you to individual cases.

Cisco
Logrus
Microsoft
Pangeanic
PROMT
Tilde

In terms of what to expect by way of future reports:

There are another 20 or so more members’ MT use cases in the making. There are of course also the API implementations and integration initiatives. So lots of collective knowledge to impart, more translation innovations to come. We are also uncovering a number of general interoperability issues that the industry needs to deal with. These will be picked up under TAUS’ ongoing advocacy and lobbying work on open translation platforms.

And if you want to get the lowdown in person on the TDA roadmap, the integration possibilities and projects, and member use cases join us in Portland (OR), 3-6 October at the TAUS User Conference and Pre-conference sessions. It’s the best opportunity this year to get a real feel for the TAUS community and our approach to providing shared resources and services for the benefit of everyone.

Everyone’s sharing translations…

What if everyone starts sharing their translations?
Worried? Excited?

Guess what. It’s already happening… whether you agree or not. Use Engkoo or Linguee to search translations of terms and phrases and you’ll be navigating the translations of… well, basically everyone who has published on the internet in certain languages. After the launch and, in the mould of TAUS Search, it seems a new generation of terminology tools is now born… Great!

It’s of course no surprise in this age of unlimited connectivity and cloud computing that translation memories find their way into the cloud. This brings tremendous benefits for leveraging and translation automation. But at the same time, it raises lots of questions, especially about ownership and legality. Mining the web, aligning translations, and sharing translation memories are quickly becoming common practice. We all need to be really practical and responsible about this new reality.

TAUS Data Association has launched a TM sharing platform with a contracts based legal framework for sharing high quality language data. Over 70 organizations globally have now taken a lead, been practical, taken ownership and joined. Six in the last month alone.

As ever, we’d like to help guide the industry towards practical solutions. And so the link below takes you to TDA’s TM sharing guidelines. Please take a look and share your feedback. I am sure it can be improved overtime. And if you have the courage, ask your clients to take a look and include the TM sharing clause in service agreements. Perhaps this sounds like crazy advice. But then how else do you benefit from shared resources and keep ownership!

Download TDA TM sharing guidelines

One is ten

Translation Matching by TAUS Data Association

We are on track for TAUS Data Association (TDA) to deliver Translation Matching as a new service in October. This means that everyone – translators, agencies, buyers – can retrieve and reuse translations directly from the industry supercloud of translation memories. This means that the same sentence never has to be translated again, never mind who translated it before. This is synergy in optima forma, a boost in productivity for the translation industry. All we ask for is that users follow the reciprocity principle: you can take if you also give. The formula at TAUS Data is simple and straightforward: you can download ten times more translations than you upload. One is ten!

Benefits

Pilot projects undertaken by TAUS Data members show that benefits of additional leveraging from using large shared industry TMs range from 5% to 50%. This is a very wide range of course. The factors at play here are the volume and the domain proximity of the TMs, but also the power of the translation leveraging technology. We are at the beginning of a new era. Cloud-based TM is the new trend. The more TMs are being shared, the higher the profits.

Technology – open source

TAUS Data’s Translation Matching is powered by a new version of the of the TM used in the open source GlobalSight translation management system. This new version has greatly improved scalability, allowing users to leverage translation jobs against TMs containing hundreds of millions of words per language pair. TAUS Data will use this technology to return exact and fuzzy matches from an industry cloud-TM that users would otherwise never have access to. The TM improvements developed for TAUS Data will be available as part of future GlobalSight releases, and as the open source platform continues to mature, both TAUS Data and GlobalSight will be enriched with more powerful features.

Executing on standards

Ah, standards… the headache of so many in the translation industry. Is it not just a matter of execution? TAUS Data is hundred percent compliant with TMX and XLIFF. All translation memories are stored in TMX format in the TAUS Data repository. And for Translation Matching we follow XLIFF. This means that every translation memory tool that is XLIFF compliant can work directly with the TAUS Data supercloud of TMs. An XLIFF file is exported from the local TM tool of choice and uploaded to TAUS Data for leveraging, and TAUS Data returns an XLIFF file with all the translation matches which can then be used directly in the user’s own TM tool. No headaches, seamless integration and a big win in matches.

Search

The TAUS Search – already used by thousands of translators everyday – will also be upgraded with the power of the new Translation Matching service. Users will be able to copy complete segments in the Search box and retrieve full and fuzzy matches. This enriched search capability will be publicly available, just like the current term and phrase look-up.

TAUS Search

Try TAUS Search now.

What to share and what not share

You wonder why not… why not share everything you have got and profit to the maximum. TAUS Data has implemented a clear and transparent legal framework for sharing translation memories, agreed upon by all members. IP rights and data ownership are recognized. Shared data may be used for translation production and for the development of derivative work. Before you share translation memories, please review the Data Sharing and Pooling Conditions and consult with the data owner.

Join

TAUS Data is the only industry-sanctioned organization for sharing of translation memories. If you are interested in TM sharing and optimizing translation efficiency, please consider joining TAUS Data. TAUS Data is not a commercial private organization. TAUS Data is a non-profit organization with no other objective than hosting and sharing the industry’s translation memories and providing intelligent access to high quality data. By joining as a member you support the future of your business. Translators join for € 50 per year. For organizations membership fees start at € 250 per year.

6 ways TDA differs from other sharing initiatives

We are increasingly being asked how TAUS Data differs from other sharing initiatives in the industry. The difference is actually very clear:

  • Neutral and independent: TAUS Data is a non-profit member organization. No risk of being bought. No profit motive.
  • Reliable sources – quality data: The source and identity of translation data are known by users.
  • Industry taxonomy: All translation data are stored with details of industry/domain.
  • Legal framework: TAUS Data has established a uniform set of conditions for sharing. IP rights and data ownership are recognized and secure through use of standard contracts.
  • Focus: TAUS Data is focused on hosting and sharing the industry’s translation memories and providing intelligent access to high quality data.
  • Support from a range of industry leaders: TAUS Data was founded and is supported by many of the leading IT companies. It is easier to count major IT companies that aren’t members, and ask yourself why not.

Free APIs for TAUS Data technology

If you haven’t heard, many of the features on www.tausdata.org are now available for third-party developers with the release of the TAUS Data API.

Anyone can start developing now by downloading the API documentation and using the simple REST web service. The documentation is full of examples and walk-throughs to get you up and running.

The API includes TM Sharing and Data Pooling–and we’ll have more to say about those later–but the most accessible feature is TAUS Search. Now, it’s easy to add TAUS Search to web sites and desktop applications, whether as a simple search box, or an automatic, context-sensitive operation in a terminology-aware tool. The results are returned as data (not HTML), so applications can format or use them in any way they like. Imagine hovering over a term and seeing example usage and translations from different organizations in your industry. We look forward to working with developers to find creative uses for TAUS Search.

But not everything in TAUS Search is offered through the API. That’s because TAUS Search is sophisticated and ever-changing, so we only put in the API the parts that we can support into the future. For example, have you noticed the “Search all word forms” option you get if you click “more options” in TAUS Search? Try selecting that option and searching for “moved” from English (United States) to French (France). Notice that the results include several verb forms, like “move”, “moves”, “moved” and “moving”, but no uses of “move” as a noun. That’s because “moved” is only a verb form. For the same reason, computed translations are only shown for move as a verb.

We love putting these advanced features into TAUS Search, but they are closely tied to the linguistic tools we use to analyze our corpus, and we may decide to change them if we find better ones. If we gave developers access to all these bells and whistles, they might be left high and dry.

That said, we encourage users and developers to tell us what features they want to see in third-party connectors, and we’ll try to find a way to put them in the API. This is just version one.

The API documentation is here - http://www.tausdata.org/apidoc

How people use TAUS Data today

This post summarizes the how people use TAUS Data today.

Benefits for terminology / from TAUS Search are obtained easily, quickly and free of charge. Benefits from translation memory leveraging may require investment depending on the tool(s) currently used. Benefits from machine translation often require planning and investment of time and resources.

Click on image below to enlarge.

How people use TAUS Data today

Now translated in seventeen languages

Thank you to everyone who has taken the time to translate the TAUS animation. It is now available in Armenian, Basque, Chinese (Traditional), Dutch, English (Original), French (France), Greek, HebrewNEW, Italian, JapaneseNEW, Korean, Portuguese (Brazil), Portuguese (Portugal), Romanian, Russian, Slovak, Spanish and VietnameseNEW.


 
Continue reading…

Translation Matching

Translation Matching is a feature to enable more granular data selection for those seeking to reuse/leverage data in the repository to increase productivity. This will allow users to find full and fuzzy matches and download only the data that is needed for specific work.

Currently, users cannot do translation leveraging on the TAUS Data platform. Today, users can select industry, domain and data owner attributes to find best possible matching TMX files and load them in their own translation memory editor for leveraging of segments and phrases. In practice this means that users may have to download relatively large volume files in order to find sufficient matches for the new translations to be produced. Continue reading…

Matching Scores

Matching Scores is a feature to enable more granular data selection for the training of customized machine translation engines and leveraging. This will help users to download the specific data that is needed by them.

TAUS Data stores all translation data with attributes for industry and – if available – domain. These attributes are essential for one of the main purposes of the TAUS Data platform: increasing translation efficiency and accuracy. Users need to be able to retrieve data by industry and domain. However, for data providers it is often difficult or even impossible to select an industry label from the pre-configured list of 17 categories, because their file is diverse or does not unambiguously match with one specific industry. Continue reading…

TM Cleaning

“Some dirt are universal, others aren’t” Yan Yu, TAUS Data Association.

TAUS Data currently filters tags, removes corrupt characters, removes corrupt XML, reject duplicates, flags missing translations, and allows users report errors via TAUS Search.This largely deals with the universal dirt, the new TM Cleaning feature will be a statistical tool to filter out suspicious translation units so that users can decide what is dirty data for their needs. Continue reading…



Copyright © 2004–2009. All rights reserved.

RSS Feed. This blog is proudly powered by Wordpress and uses Modern Clix, a theme by Rodrigo Galindez.