Introduction

The TAUS Data Association (TDA) Web Service allows programs to use many features of TDA, including:

The Service has a "Representational State Transfer", or REST, interface. Parameters as passed as in HTML forms. Responses are returned as either JSON or XML documents, at your choice.

Getting Started

It is assumed that the reader is familiar with web programming. Experience with REST services is helpful but not required. We have tried to keep the Service simple so that you can get up and running quickly, using your platform and tools of choice. No programming language bindings are provided at this time.

You can begin developing with the TDA Web Service using just this document. However, we encourage you to contact TDA and tell us about your application, so that we can provide you with:

An Example Call

In a REST web service, requests are made over HTTP (or HTTPS), and the resource to access is given in the URL. In our example, the resource we wish to access is a listing of locales (termed "langs" in the Service) in TDA. This resource is named /lang. To simply retrieve information about the resource, we use the GET HTTP method. To ask for the response in JSON format, we can add the .json extension to the resource. No additional parameters are required, but to limit the listing to langs for a particular language, we can add the language parameter in the query string. The full request is:

GET .../lang.json?language=en

Click on the link to try it! Enter your TDA username and password when prompted. (If you don't have a TDA account, go get one, it's free!)

The response looks like:

HTTP/1.1 200 success
Content-type: application/json; encoding=UTF-8

{ "status": 200,
  "reason": "success",
  "lang": [
    { "id": "en-AU", "name": "English (Australia)" },
    { "id": "en-CA", "name": "English (Canada)" },
    { "id": "en-GB", "name": "English (United Kingdom)" },
    { "id": "en-US", "name": "English (United States)" }
  ]
}

(Response documents are reformatted for readability.)

Or, for the response in XML, the request is:

GET .../lang.xml?language=en

The response looks like:

HTTP/1.1 200 success
Content-Type: text/xml; charset=UTF-8

<result>
  <status>200</status>
  <reason>success</reason>
  <lang>
    <name>English (Australia)</name>
    <id>en-AU</id>
  </lang>
  <lang>
    <name>English (Canada)</name>
    <id>en-CA</id>
  </lang>
  <lang>
    <name>English (United Kingdom)</name>
    <id>en-GB</id>
  </lang>
  <lang>
    <name>English (United States)</name>
    <id>en-US</id>
  </lang>
</result>

Or, for the response in human-friendly HTML, the request is:

GET .../lang.html?language=en

Next Steps

If you want to understand the Service from the ground up, read this document in order. If you want to get a high-level of what you'll need to do, start with the use cases and refer to other sections as needed. Everyone will at some point want to read about access control, app keys, and the terms of use.

The remaining sections are:

Interface Details
The technical nuts and bolds.
Access Control
That necessary annoyance.
App Keys
How to identify your application to TDA (and why you would want to do this).
Concepts
Discussion of various TDA entities and how they are used in the Service.
Use Cases
Task-oriented walk-throughs.
Developer Guidance
Possibly helpful suggestions on writing your application.
Terms of Use
A few requirements.
Compatibility
Our promise, within reason, not to break your application.
Call Listing
Specification for every call in the Service.

Interface Details

All requests are made over HTTP (or HTTPS) and follow common REST conventions.

Requests

URL

The base URL for the Service is http://www.tausdata.org/api or https://www.tausdata.org/api. The secure version is recommended, as passwords and other private data may be exchanged. If you request access to a development sandbox, you will get a different URL for your sandbox. Following the base URL is the resource name, as given in the call documentation. For most calls, you must add an extension giving the requested response format: .json, .xml, or .html. Resource names and extensions are case-sensitive.

Method

The Service uses two HTTP methods, GET and POST. GET is used to access resources without modifying them. POST is used to create new resources and modify existing ones. The documented method must be used for every call.

Parameters

Calls take named parameters, where keys and values are Unicode strings. For GET requests, the parameters must be query-string encoded in the URL. Since there is no means to specify the character encoding of the query string, it is always taken to be in UTF-8. For POST requests, the parameters may either be query-string encoded in the URL, or sent in the body with content type application/x-www-form-urlencoded or multipart/form-data. You may specify the character encoding for these requests, however UTF-8 is recommended. You may put some parameters in the query string and others in the body (though the same parameter should not appear in both places). A parameter may given multiple times only where specifically documented. Parameter names are case-sensitive.

Some special parameters are used for access control.

Unknown parameters in the request are ignored by default. However, there will be a message about them in the X-TDA-Warning header. You may request that they be considered errors by setting the X-TDA-Strict-Parameters request header to true.

If you cannot avoid passing extra parameters, you can avoid the risk that they will have meaning in a future version of the Service by giving them names beginning with an underscore ("_"), which will never be used by the Service.

Responses

Status

The HTTP status line contains a numeric status code and a string reason code. You may use the reason to discriminate the status beyond the code. For convenience, the status code and reason are both duplicated in the response body.

Status codes are classed by their numeric range. Codes 200-299 reflect success. For other codes, an explanatory message (in English) is given in the body. Codes 400-499 reflect invalid requests. The message may help you find and fix the error. Codes 500-599 reflect problems that are not your fault. Codes in other ranges are not used by the Service.

A common set of codes and reasons is given here. Others are documented for specific calls.

200 success
The response body contains the successful result of the call
201 created
One or more resources were created. The Location header points to the primary created resource. The response body contains the successful result of the call, and should describe all created resources.
400 invalid_params
Parameters were missing, of the wrong types, or otherwise incorrect.
400 no_app_key
An app key was not found where required.
400 bad_app_key
The app key given is not valid.
401 no_credentials
The request requires authentication but no credentials were supplied.
401 bad_username_password
The username and password given are not a valid TDA login.
401 bad_auth_key
The auth key given is not valid.
403 permission_denied
This user is not allowed to make this request.
404 no_such_resource
The resource named in the URL does not exist or is invisible to this user.
405 method_not_allowed
This method is not allowed for this resources.
409 state_error
The resource is in the wrong state for this request.
500 system_error
An unexpected error happened in the system. TDA will investigate the problem.
503 down_for_maintenance
The Service is down for mainenance and will be available again shortly.
503 not_ready
The resource you requested is not currently ready, but will be later.

Headers

Information is not generally returned in HTTP headers. The Content-type header is always set, and you may check to see that the body has the expected content type. (In extreme error cases, the Service may not be able to return a body of the requested type.) The Location header is sent along with a status 201 response, pointing to a created resource; however more complete information is available in the result.

Body

A few calls return special document types, eg. zipped TMX files. Those calls are termed "unusual", and their response formats are specified in the call documentation.

Other, "usual," results are structured as values built from lists, maps (aka. dictionaries, objects, associative arrays) from keys (aka. properties, fields) to values, and the basic types. All lists contain elements of only one type. Note there are no "null" or undefined values.

At the top level, every usual result is a map with at least two keys, status (type natural) and reason (type string), whose values are the HTTP status code and reason string. Most calls add additional keys. Error respones add a message (type string) key explaining what went wrong.

Response formats

Results come in one of three formats: "JavaScript Object Notation" (JSON), XML, or HTML. For all formats, the character encoding is UTF-8. The HTML format is a human-readable rendering for testing purposes, and is not further described. The JSON format follows the result structure directly, with maps represented by JSON objects, lists by JSON arrays, and basic values by JSON strings, numbers, and the boolean literals true and false. In the XML format, a map is represented by a series of XML elements whose names are the keys of the map. If the value for a key is a basic value or another map, there is a single XML element for the key, containing the representation of the value; if the value for a key is a list, there is an XML element for every element of the list, containing the representation of the list element. Basic values are represented as character data. The whole thing is wrapped in a top-level <result> element. Note that not all possible results described above can be represented in XML this way, eg. a list within a list. However, only results that can be represented in the XML format (as well-formed XML) will be produced. The exact representation of basic types in either format is given below.

Basic Types

There only a few basic types. Each has a string representation, used for parameters and XML results, and a JSON representation.

string
An arbitrary Unicode string. String representation is itself; JSON representation is a JSON string.
natural
A positive integer (1, 2, ...) less than 2^63. String representation is the decimal string; JSON representation is a JSON number.
boolean
True or false. String representation is one of the strings "true" or "false". JSON representation is one of the JSON literals true or false.
lang
A language-region pair. See below.
enum
One from a given list of strings.

The file Type

There is another type, file, that is only used for parameters. The value for this parameter is expected to be the contents of a (possibly large) file. file parameters are only used with POST requests, and when one is part of a request, the body must use the multipart/form-data content type, and the part containing the file must have a filename attribute in the Content-Disposition header.

Access Control

Access control is based upon TDA user accounts. The Service does not provide a way to register new users, but you may direct users to register themselves.

Authentication

Most calls to the Service must contain valid authentication credentials. For convenience, flexibility, and security, the Service provides several methods of authentication. All are intended to be used with HTTPS to keep credentials private.

Username and Password

In this method, the username and password of a registered user are passed with each request. The recommended form is HTTP basic authentication. (Other HTTP authentication methods, such as digest, are not supported.) In support of HTTP basic, the Service may return a WWW-Authenticate header with value Basic realm="TAUS Data Association Web Service" in response to an unauthenticated request. However, since some platforms (in particular, web browsers) will pop up an unwanted password entry form when receiving this header, it is returned only when HTTP basic authentications was tried, or no authentication method was tried, in the request.

HTTP basic authentication is also handy for testing the Service directly in a web browser, as you may have found if you clicked the links in the introductory example.

If HTTP basic is not feasible, you may pass the username and password as auth_username and auth_password parameters.

Login

In this method, you first make a POST /auth_key?action=login request that is authenticated by username and password. An "auth key" is returned that can be used (without the username and password) to authenticate future requests. To use the auth key, pass it as the X-TDA-Auth-Key request header or, if that is not feasible, the auth_auth_key parameter.

Currently, this is only marginally more secure than username and password authentication. The application must possess the username and password and send it with the initial POST /auth_key; and the auth key never expires. However, it's use is consistent with the connect method.

Connect

The connect method enables an application to act on behalf of a user, without the user disclosing their username and password to the application. You first make a POST /auth_key?action=connect request. This request requires no authentication, but it must contain an app key. An "auth key" is returned that can be used to authenticate future requests as above, but first it must be activated. The application should send the user to the manage_url field of the result, where they can activate the auth key. After the user activates the auth key, they will be sent to the redirect_url given in the request.

Permissions

Many calls may be performed by any authenticated user; some, however, are restricted. To determine the user's permissions, call GET /user with the self parameter set.

App Keys

There is no registration requirement to use the Service, beyond a user account at www.tausdata.org. However, you are encouraged to register your application and get an app key. (For now, please contact TDA for an app key.) An app key identifies your application to TDA, and helps us diagnose any problems you may have. An app key also enables you to use the connect authentication method. There may be additional benefits in the future, such as logs and statistics of the calls made by your app.

To use your app key, pass it as the X-TDA-App-Key request header or, if that is not feasible, the auth_app_key parameter.

Concepts

Organizations

An organization is the most important unit of identity in the Service. An organization may or may not be a TDA member, and may really represent a single individual. Every user account belongs to an organization.

Langs

What we refer to as a "lang" and use throughout the Service is actually a language-region pair, often called a locale. For example fr-CA for Canadian French. When referring to the language only (eg. French), we always call it a "language" to distinguish it from a "lang".

Languages are represented as ISO 639-1 two letter language codes. Regions are represented by ISO 3166-1 alpha-2 two letter country codes (or in a few cases, by non-standard codes such as XL for Latin America). Langs are represented as a language, followed by a dash ("-"), followed by a region. Langs, languages, and regions are case-insensitive.

Not all possible langs are supported by TDA. The current list is:
TDA Langs
ar-aeArabic (U.A.E.)
ar-arArabic
ar-egArabic (Egypt)
ar-saArabic (Saudi Arabia)
be-byBelarusian
bg-bgBulgarian
cs-czCzech
cy-gbWelsh
da-dkDanish
de-deGerman (Germany)
el-grGreek
en-auEnglish (Australia)
en-caEnglish (Canada)
en-gbEnglish (United Kingdom)
en-usEnglish (United States)
es-emSpanish (International)
es-esSpanish (Spain)
es-mxSpanish (Mexico)
es-xlSpanish (Latin America)
et-eeEstonian
eu-esBasque
fa-irFarsi
fi-fiFinnish
fr-beFrench (Belgium)
fr-caFrench (Canada)
fr-frFrench (France)
he-ilHebrew (Israel)
hr-hrCroatian
ht-htHaitian
hu-huHungarian
id-idIndonesian
is-isIcelandic
it-itItalian (Italy)
ja-jpJapanese
ko-krKorean
lt-ltLithuanian
lv-lvLatvian
mt-mtMaltese
nb-noNorwegian (Bokmal)
nl-beDutch (Belgium)
nl-nlDutch (Netherlands)
nn-noNorwegian (Nynorsk)
no-noNorwegian
pl-plPolish
pt-brPortuguese (Brazil)
pt-ptPortuguese (Portugal)
ro-roRomanian
ru-ruRussian
sk-skSlovak
sl-siSlovene
sv-seSwedish
th-thThai
tr-trTurkish
uk-uaUkranian
vi-vnVietnamese
zh-cnChinese (PRC)
zh-hkChinese (Hong Kong)
zh-twChinese (Taiwan)
You may need to map your application's language codes to TDA's. New langs are considered for addition on a case-by-case basis.

Attrs

Every TM in TDA has several attributes, or "attrs". They are: industry, content_type, owner, product, provider. This list is not expected to change. Attrs have numerical values, and each value is associated with a name for display.

The values for provider and owner are organizations. All of the attrs for a TM are chosen by the uploader, except for provider which is the organization of the uploader. The uploader can specify as the owner one of the organizations on whose behalf they are permitted to upload. The uploader can specify as the product one of the products belonging to the owner.

Attr values will never be reused, ie, a given attr value will always refer to the same entity. The list of values for the industry and content_type attrs will change rarely. The current lists are:
TDA Industries
1Automotive Manufacturing
2Consumer Electronics
3Computer Software
4Computer Hardware
5Industrial Manufacturing
6Telecommunications
7Professional and Business Services
8Stores and Retail Distribution
9Industrial Electronics
10Legal Services
11Energy, Water and Utilities
12Financials
13Medical Equipment and Supplies
14Healthcare
15Pharmaceuticals and Biotechnology
16Chemicals
17Undefined Sector
18Leisure, Tourism, and Arts
TDA Content Types
1Instructions for Use
2Sales and Marketing Material
4Policies, Process and Procedures
5Software Strings and Documentation
6Undefined Content Type
7News Announcements, Reports and Research
8Patents
9Standards, Statutes and Regulations
10Financial Documentation
12Support Content
You may wish to map these to categories in your application

Listing Langs and Attrs

The Service provides calls for listing langs and attr values. The GET /lang call lists langs, and the GET /attr/<attr> family of calls lists attrs, for example GET /attr/industry lists industries. All of these calls are similar: they take as parameters other langs and attrs that may limit the results, and return a list of objects of the type requested. Depending on the purpose for which you are listing them, you may want different listings. For example, if you want to download data, you probably only want to see combinations of langs and attrs for which data is available. The different ways you can list langs and attrs, and what you get back, are described here.

The "purpose" of the listing is given by the for parameter. When for is not given, the default is to list every value in the system. For example, GET .../attr/industry.json lists all industries. However, for privacy reasons, you may not list all owners, products, or providers.

When the for parameter is upload, all combinations of attrs that could be applied to an upload by this member are listed:

When the for parameter is download, only langs and attrs for which there is data to download are listed. When listing langs for download, you must set the side parameter to either source to list langs for which there is data with that source lang, or target to list langs for which there is data with that target lang. When listing langs, you may also set the source_lang or target_lang parameter (whichever is the opposite of the side parameter) to list only langs for which there is data with that source or target lang. For example, to list all langs for which there is data with that lang as the target lang and en-US as the source lang, GET .../lang.json?side=target&source_lang=en-US. When listing attrs for download, the lang and other attr parameters limit the listing to values for which there is data with all of those langs and attrs. For example, to list all owners for which there is data with that owner, en-US as the source lang fr-FR as the target lang, and 3 as the industry, GET .../attr/owner.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3.

When the for parameter is segment, only langs and attrs for which there may be search results are listed. This works exactly like listing for download, but the results exclude combinations that have not been indexed for search or are not supported for search.

Miscellany

Some calls return English language messages and names. Messages are usually diagnostics, such as the contents of the message key that comes with error responses. Names are the human-readable designations for langs, attrs, and other objects. There is no way to request messages and names in other languages or to translate them. Pedantically, messages do not begin with a capital or end with punctation.

All keywords are singular, even when they would be more natural as plural, in order to avoid dealing with grammatical irregularities.

Use Cases

The following sections walk through typical use cases.

Search

In this use case, you wish to search for segments containing a given term. Please note the Service provides only a simplified version of TAUS Search. Word attributes (lemma and part of speech) are not available, nor are computed translations.

The first step is to identify the langs and attributes of the data you wish to search. Source and target langs are required, and all other attrs are optional. If your application already knows the langs and attrs of the data to search, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to segment, to get langs and attrs for which data is searchable.

To search, call GET /segment with the term you wish to find, the langs, and the attr criteria. The result is a list of segments, with their attrs. The source and target text are available as plain text strings. To report problems with segments, see POST /segment/<id>?action=report_problem.

Download

In this use case, you wish to download TM data. The first step is to identify the langs and attributes of the data you wish to download. Source and target langs are required, and all other attrs are optional. If your application already knows the langs and attrs of the data to download, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to download, to get langs and attrs for which data is available.

There are several things you need to know about downloads. First, every download includes only data that has not already been downloaded by the same member. In order to get data that was already downloaded, you must re-request the old download. Second, some members have download limits and may be charged for excess downloads. If a download would exceed a hard limit, it is disallowed, but there is no way to tell if a member is over a soft limit and will be charged for a download. Download limits may vary depending on the source and target langs of the download. Third, there is no way to request only some of the words meeting the given criteria. To limit the number of words, you must refine the criteria. Fourth, even data that was provided by the member creating the download counts towards their download limits if it is included in the download. There is an option to exclude data provided by the same member requesting the download.

You may wish to check how much data is available, and whether it is within the member's limit, before creating a download.. For this, call GET /counts with the chosen langs and attrs. For example, to get counts for data with en-US as the source lang fr-FR as the target lang, and 3 as the industry, GET .../counts.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3. The result has several useful fields. word_count is the total number of words meeting the given criteria. new_word_count and new_segment_count are the number of words and segments that have not been downloaded by the member (and would be included in a new download); old_segment_count is the number of words that have already been downloaded by the member. If the member has a download limit for this source and target lang, it is given by limit. Finally, if the download would be allowed according to the member's limits, within_limit is true; otherwise it is false.

When you are ready to download, make a POST /download?action=create request with the desired criteria. This fixes the exact set of data to be downloaded. You may examine the result to make sure the download has the amount of data expected (in case it has changed since your last call to /counts), or perhaps confirm it with the user. Then you must make a POST /download/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the download. If a download is never confirmed, it does not count towards the member's limit.

Once the download is confirmed, you should make polling GET /download/<id> requests until the download is in the ready state. At that point, you may retrieve the download as a zipped TMX file by calling GET /download/<id>.zip. Alternately, you may simply make GET /download/<id>.zip requests, and if the file is not yet ready, the response will have a 503 not_ready status code. Please use a poll interval of at least one minute.

Upload

In this use case, you wish to upload a TM to TDA. The first step is to choose the langs and attributes of the TM. Both source and target langs and all attributes except provider and owner are required. provider will be the organization performing the upload, and cannot be set. owner may be set to any of the organizations on whose behalf the provider is permitted to upload; if not given, it defaults to the provider. If your application already knows the langs and attrs of the TM, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to upload.

(There is no way to add new owners and products using the Service. There is also no way to grant permission to another organization to upload on your behalf. Please contact TDA for help with this.

There are several things you need to know about uploads. First, every upload must be a zip archive containing a single TMX file. The TMX file must be valid TMX, and should contain data for the source and target langs only. Second, TDA filters duplicate data, so if you upload a TM and then re-upload it after adding new translations, only the new translations will be saved. On the other hand, it will waste network bandwidth and TDA resources, so please make a reasonable effort to upload only new data. Third, uploaded TMs are normally available for TDA members to download and for the public to search; however, there is an option to make a TM available only for search.

When you are ready to upload, make a POST /upload?action=create request with the zipped TMX file, along with the langs and attrs. A successful response indicates that the file has passed the first round of checks: the TMX file was extracted, the beginning of the file was parsed, and the lang codes in the file match the source_lang and target_lang parameters. Then you must make a POST /upload/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the upload.

TDA will begin processing the file. To monitor progress, you should make polling GET /upload/<id> requests until the upload leaves the processing state. Please use a poll interval of at least one minute. If there is a problem, the state will be error, and the reason and message fields will contain information about the problem. Otherwise, the state will be ready, and the user who created the upload will get an email notification with more details. Finally, you must make a POST /upload/<id>?action=approve request, perhaps after receiving positive approval from the user. You can skip the approval step by setting the approve parameter when creating the upload. At this point, the upload will be credited to the member's account and the data will be available for other members to download (unless the upload was for search only). There may be a delay before the data is available for search.

Although it will probably not be an issue, you may wish to know how TDA interprets lang codes within the TMX file. We look for lang codes that are "compatible" with the source and target langs given in POST /upload?action=create. The current definition of compatible is that the language prefixes are the same (case-insensitive). For example, if you upload a TM giving a target_lang of es-XL the TMX file may use the code es-XL, es-AR or just es. However, the code must be be formatted either as a two-letter language code; or a two-letter language code, followed by a dash ("-"), followed by a two-letter region code. So Spanish would not be recognized and the TMX file would not be accepted. Also, the same lang code must be used throughout the TMX file; it may not start with es-XL and switch to es. TDA will relax these restrictions to accommodate real-world needs. We will try never to reject lang codes that were previously accepted.

Note that while the TDA's Data Pooling web interface supports automatic lang detection, this function is not available in the Service. We expect that applications using the upload API already know the langs of the TMs they are uploading. If this is not the case for your application and you would benefit from lang detection, contact TDA. Also, note that there is an upload state incomplete to accommodate cases where the langs can't be detected; the source_lang and target_lang fields are optional in upload results for the same reason. You may run across this case now, with uploads created using the Data Pooling web interface; however there is nothing you can do with them using the Service.

Developer Guidance

Choosing a Response Format

We recommend the JSON format, as it closely corresponds to the structure of the result and will require less decoding on your end. But JSON and XML are equally supported.

Choosing an Authentication Method

We recommend using HTTP basic authentication for desktop apps, and connect authentication for web apps (including AJAX apps).

TDA Registration and Membership

Your users must have TDA user accounts in order for your application to authenticate with the Service. TDA user accounts are free to the public. You may direct users to http://www.tausdata.org/index.php/component/user/register to register for an account. To download data, users must have a TDA membership (or belong to a member organization). You may direct users to https://www.tausdata.org/index.php/members/join-tda for information on joining TDA.

Lang and Attr Selection Forms

When writing a user interface for selecting langs and attrs, we recommend modeling them on the UIs used at tausdata.org. This will provide users with a consistent experience. We recommend that every lang or attr constrain the listings following it on the screen (and only these).

Cross-site Scripting

If you are developing an AJAX application, cross-site scripting restrictions in the browser will likely prevent you from calling the Service directly. In this case, the best solution is to construct a simple proxy that accepts calls on your web server, and forwards them to www.tausdata.org/api. Further information.

AJAX File Upload

It is not possible to use the main engine of AJAX applications, XMLHttpRequest to perform file uploads, due to browser security restrictions. (There is no way to send the file contents.) It is necessary instead to arrange for the browser to submit your request as a form. To prevent the result from appearing to the user, the form submission is usually targeted to a iframe that is not displayed. Accessing the result of the submission is tricky, and many complicated schemes are employed by web developers. However, we have found that a call to the Service can be loaded right into an iframe and processed reliably. We describe the method here. (Please contact TDA if you would like these secrets revealed!)

Terms of Use

Users Must Use Their Own TDA Accounts

Every call to the Service must be authenticated with the account of the user who caused the request to be made. Do not use your own TDA account to authenticate calls made for other users. Contact TDA if you need an exception.

TM Sharing Conditions

Users who do not belong to a TDA member must read and agree to the TM Sharing Conditions before they may upload TMs. If your applications allows users to upload TMs, it must enforce this requirement as well. Specifically, before submitting an upload, you must perform these steps:

  1. Check whether the current user needs to agree to the TM Sharing Conditions. Call GET /user with the self parameter set. The must_agree_to_upload_terms field of the result tells you whether the user needs to agree to the TM Sharing Coditions.
  2. Display the TM Sharing Conditions so that the user can easily read them. The TM Sharing Conditions are available by calling GET /upload_terms.txt. You should request the TM Sharing Conditions every time you need to display them, so that any changes will be reflected in your application.
  3. Give the user a way to positively and unambiguously agree to the terms. For example, you may display a check-box next to the words, "Check to indicate that you have read and agree to the TM Sharing Conditions.", and only make the upload if the check-box is checked.

Compatibility

We intend to support the Service as specified here indefinitely, with reasonable allowances for growth, including: Calls may be added. Parameters may be added to calls. In results, keys may be added to maps and values may be added to enums. Errors may be added. Various limits may be imposed for performance reasons.

The search algorithm in GET /segment may be changed and the query language may be extended.

Call Listing

The heading of every call contains the method, the resource name, and for POST calls the action parameter. The method listed must be used; you cannot use GET for a call documented to use POST. If the resource has an extension, the call has an unusual result. Otherwise, you must choose the result format by adding a .json, .xml, or .html extension. If the resource contains <id> an entity id is part of the resource name, for example GET .../lang/en-US.json for GET /lang/<id>. The following information is then listed, as applicable:

Id Type:
The type of the <id> in the resource name.
Parameters:
For each named parameter accepted by the call, its name; type (one of the basic types); its default value (if optional); and whether it can be repeated multiple times. Optional parameters are in italics. For optional parameters, if you give the empty string as the parameter value, it will be treated the same as if you gave no value at all, and the default (if any) will be used.
Result:
The schema of the result, in a JSON-like notation. Keys that may not be present in every result are in italics. The following represents a map, with one key lang, whose value is a list of maps, each of which has key id with value of type lang and another key name with value of type string.
{ lang: [{ id: lang,
           name: string }] }
The status and reason keys are implicit.
Errors:
Possible errors that are specific to the call. As with common errors, they are returned in the HTTP status line and the status and reason keys of the result, and an explanatory message is returned in the message key of the result.

GET /status

Get the status of the Service. Useful for checking that the system is alive and you can talk to it. Example: GET .../status.json.

GET /user

Parameters:
Result:
{ user: [{ id: natural,
           organization: { id: natural,
                           name: string },
           can_search: boolean,
           can_download: boolean,
           can_upload: boolean,
           must_agree_to_upload_terms: boolean }] }

Get a listing of users. Currently, only the authorized user of the request is returned. The self parameter must be set to true to make this explicit (setting it to false is currently an error).

The can_* fields indicate whether the user is allowed to perform those functions. Currently, can_search is always true; can_download is true for TDA member users only, except for those who have been explicitly denied download permission; and can_upload is always true, except for TDA member users who have been explicitly denied upload permission.

See the terms of use for the must_agree_to_upload_terms field.

POST /auth_key?action=connect

Parameters:
Result:
{ auth_key: { id: string,
              manage_url: string } }
Create an auth key by the connect method. This call does not require authentication, but you must supply an app key. The created auth key will be inactive; direct the user to the manage_url to activate it. After activation, the user will be redirected to the redirect_url.

POST /auth_key?action=login

Result:
{ auth_key: { id: string,
              manage_url: string } }
Create an auth key by the login method. The created auth key will active.

GET /lang

Parameters:
Result:
{ lang: [{ id: lang,
           name: string }] }
Get a listing of langs, sorted by name. See "Listing Langs and Attrs". Example: GET .../lang.json.

GET /lang/<id>

Id Type:
string
Result:
{ lang: { id: lang,
          name: string } }
Get a lang. Example: GET .../lang/en-US.json.

GET /attr

Result:
{ attr: [string] }
Get a listing of attrs. Currently, the attrs are: industry, content_type, owner, product, provider, and this list is not expected to change. Example: GET .../attr.json.

GET /attr/industry

Parameters:
Result:
{ industry: [{ id: natural,
               name: string }] }
Get a listing of industry, sorted by name. See "Listing Langs and Attrs".

GET /attr/content_type

Parameters:
Result:
{ content_type: [{ id: natural,
                   name: string }] }
Get a listing of content_type, sorted by name. See "Listing Langs and Attrs".

GET /attr/owner

Parameters:
Result:
{ owner: [{ id: natural,
            name: string }] }
Get a listing of owner, sorted by name. See "Listing Langs and Attrs".

GET /attr/product

Parameters:
Result:
{ product: [{ id: natural,
              name: string }] }
Get a listing of product, sorted by name. See "Listing Langs and Attrs".

GET /attr/provider

Parameters:
Result:
{ provider: [{ id: natural,
               name: string }] }
Get a listing of provider, sorted by name. See "Listing Langs and Attrs".

GET /counts

Parameters:
Result:
{ word_count: natural,
  new_word_count: natural,
  new_segment_count: natural,
  old_word_count: natural,
  limit: natural,
  within_limit: boolean }
Get counts on the data meeting the given criteria. Example: GET .../counts.json?source_lang=en-US&target_lang=fr-FR. The result fields are described in the download use case.

GET /segment

Parameters:
Result:
{ segment: [{ id: string,
              source_lang: { id: lang,
                             name: string },
              target_lang: { id: lang,
                             name: string },
              source: string,
              target: string,
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string } }] }

Search for segments matching the given query, meeting the given criteria. The query q is a space-separated list of words (even for languages that do not normally separate words by space, such as zh-CN). Example: GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q=data+center.

Only segments containing the exact sequence of words in the query (case-insensitive), with no intervening punctuation, will be returned. So a search for "web service" will find segments with the word "web" followed immediately by the word "service". Punctuation in the query is not well-supported. If you wish to match punctuation, separate the punctuation from words with space. For example, to find "hello, world!", your query should be "hello , world !". Only segments with exactly this punctuation will be returned.

Queries with more than a few words may take a long time, and are not likely to return any results because only exact matching is supported. There is a limit of 10 words in a query.

The results will typically contain segments from a variety of data owners, industries, etc. Other than that, the segments returned are effectively random, and their order is not significant. However, the results for the same query will usually remain similar over time.

In the future, the search algorithm may be enhanced. To continue searching for an exact sequence of words, surround them with double-quotes. Example: GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q="data+center". Also, some characters that don't normally appear in words (eg. ":") may have special meaning in the future.

The limit parameter is a hint for how many segments you want. The result may in fact contain more or fewer.

GET /segment/<id>

Id Type:
string
Result:
{ segment: { id: string,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             source: string,
             target: string,
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string } } }
Get the segment with this id.

POST /segment/<id>?action=report_problem

Id Type:
string
Parameters:
Submit a problem report for a segment. The email address on file for the user will be used if one is not given.

GET /download

Result:
{ download: [{ id: natural,
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (unconfirmed, not_ready, ready),
               word_count: natural,
               segment_count: natural }] }
Get a listing of (confirmed) downloads created by this member.

POST /download?action=create

Parameters:
Result:
{ word_count: natural,
  segment_count: natural,
  download: [{ id: natural,
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (unconfirmed, not_ready, ready),
               word_count: natural,
               segment_count: natural }] }
Errors:
400 no_data
There is no data meeting the given criteria.
400 over_limit
Creating the download would put the member over its download limit.

Create a download of data meeting the given criteria. The exclude_own parameter excludes data uploaded by this member from the download. By default, the download will be in the unconfirmed state and not yet charged to the member. However, setting the confirm parameter has the same effect as immediately calling POST /download/<id>?action=confirm.

Only TDA members may download, and some users may not have permission to download. See GET /user.

Note that the result schema allows multiple downloads to be created. Currently, you will never get more than one, but this may change in the future.

GET /download/<id>

Id Type:
natural
Result:
{ download: { id: natural,
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (unconfirmed, not_ready, ready),
              word_count: natural,
              segment_count: natural } }
Get information about a download.

GET /download/<id>.zip

Id Type:
natural
Download the data as a zipped TMX file. The download must be in the ready state. If it is in the not_ready state, you will get a 503 not_ready status, and you should try again in 1 minute or longer.

POST /download/<id>?action=confirm

Id Type:
natural
Result:
{ download: { id: natural,
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (unconfirmed, not_ready, ready),
              word_count: natural,
              segment_count: natural } }
Errors:
400 over_limit
Confirming the download would put the member over its download limit. This can happen if the member has created and confirmed other downloads since this download was created.
Confirm a download. The download must be in the unconfirmed state. After confirmation, the download is be charged to the member. The download may enter either the not_ready or ready state.

GET /upload

Result:
{ upload: [{ id: natural,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string },
             state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
             word_count: natural,
             segment_count: natural,
             reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
             message: string }] }
Get a listing of uploads made by this member.

POST /upload?action=create

Parameters:
Result:
{ upload: [{ id: natural,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string },
             state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
             word_count: natural,
             segment_count: natural,
             reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
             message: string }] }
Errors:
400 bad_zip
The zip file could not be unpacked to produce a TMX file.
400 bad_tmx
The TMX file is invalid.
400 unsupported_tmx
Some features of the TMX file are unsupported by the Service; for example, the TMX file has more than two langs.
400 bad_langs
The lang codes in the TMX file do not match the source_lang and target_lang parameters.
400 duplicate
The TMX file has already been uploaded.

Create a new upload from the given file, having the given attrs. By default, if there is not a problem, the upload will be in the unconfirmed state. However, setting the confirm parameter has the same effect as immediately calling POST /upload/<id>?action=confirm. Setting the approve parameter has the same effect as calling POST /upload/<id>?action=approve as soon as the upload reaches the ready state.

Some users may not have permission to upload. See GET /user.

There are many things that can go wrong with an upload, as reflected by the possible error results. Note that this call checks only the beginning of the file, so even if it successful, there could be problem later on. Currently, an upload created by the Service will never go into the incomplete state.

Note that the result schema allows multiple uploads to be created. Currently, you will never get more than one, but this may change in the future.

GET /upload/<id>

Id Type:
natural
Result:
{ upload: { id: natural,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Get information about an upload.

POST /upload/<id>?action=approve

Id Type:
natural
Result:
{ upload: { id: natural,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Approve an upload. The upload must be in the ready state.

POST /upload/<id>?action=confirm

Id Type:
natural
Result:
{ upload: { id: natural,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Confirm an upload. The upload must be in the unconfirmed state.

POST /upload/<id>?action=cancel

Id Type:
natural
Result:
{ upload: { id: natural,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Cancel an upload. The upload must be in the unconfirmed state or the ready state.

GET /upload_terms.txt

Get the TM Sharing Conditions as UTF-8 text. Example:GET .../upload_terms.txt.