Introduction

The TAUS Data Web Service allows programs to use many features of TAUS Data, including:

The Service has a "Representational State Transfer", or REST, interface. Parameters as passed as in HTML forms. Responses are returned as either JSON or XML documents, at your choice.

Getting Started

It is assumed that the reader is familiar with web programming. Experience with REST services is helpful but not required. We have tried to keep the Service simple so that you can get up and running quickly, using your platform and tools of choice. No programming language bindings are provided at this time.

You can begin developing with the TAUS Data Web Service using just this document. However, we encourage you to contact TAUS Data and tell us about your application, so that we can provide you with:

An Example Call

In a REST web service, requests are made over HTTP (or HTTPS), and the resource to access is given in the URL. In our example, the resource we wish to access is a listing of locales (termed "langs" in the Service) in TAUS Data. This resource is named /lang. To simply retrieve information about the resource, we use the GET HTTP method. To ask for the response in JSON format, we can add the .json extension to the resource. No additional parameters are required, but to limit the listing to langs for a particular language, we can add the language parameter in the query string. The full request is:

GET .../lang.json?language=en

Click on the link to try it! Enter your TAUS Data username and password when prompted. (If you don't have a TAUS Data account, go get one, it's free!)

The response looks like:

HTTP/1.1 200 success
Content-type: application/json; encoding=UTF-8

{ "status": 200,
  "reason": "success",
  "lang": [
    { "id": "en-AU", "name": "English (Australia)" },
    { "id": "en-CA", "name": "English (Canada)" },
    { "id": "en-GB", "name": "English (United Kingdom)" },
    { "id": "en-US", "name": "English (United States)" }
  ]
}

(Response documents are reformatted for readability.)

Or, for the response in XML, the request is:

GET .../lang.xml?language=en

The response looks like:

HTTP/1.1 200 success
Content-Type: text/xml; charset=UTF-8

<result>
  <status>200</status>
  <reason>success</reason>
  <lang>
    <name>English (Australia)</name>
    <id>en-AU</id>
  </lang>
  <lang>
    <name>English (Canada)</name>
    <id>en-CA</id>
  </lang>
  <lang>
    <name>English (United Kingdom)</name>
    <id>en-GB</id>
  </lang>
  <lang>
    <name>English (United States)</name>
    <id>en-US</id>
  </lang>
</result>

Or, for the response in human-friendly HTML, the request is:

GET .../lang.html?language=en

Next Steps

If you want to understand the Service from the ground up, read this document in order. If you want to get a high-level of what you'll need to do, start with the use cases and refer to other sections as needed. Everyone will at some point want to read about access control, app keys, and the terms of use.

The remaining sections are:

Interface Details
The technical nuts and bolds.
Access Control
That necessary annoyance.
App Keys
How to identify your application to TAUS Data (and why you would want to do this).
Concepts
Discussion of various TAUS Data entities and how they are used in the Service.
Use Cases
Task-oriented walk-throughs.
Developer Guidance
Possibly helpful suggestions on writing your application.
Terms of Use
A few requirements.
Compatibility
Our promise, within reason, not to break your application.
Call Listing
Specification for every call in the Service.

Interface Details

All requests are made over HTTP (or HTTPS) and follow common REST conventions.

Requests

URL

The base URL for the Service is http://www.tausdata.org/api or https://www.tausdata.org/api. The secure version is recommended, as passwords and other private data may be exchanged. If you request access to a development sandbox, you will get a different URL for your sandbox. Following the base URL is the resource name, as given in the call documentation. For most calls, you must add an extension giving the requested response format: .json, .xml, or .html. Resource names and extensions are case-sensitive.

Method

The Service uses two HTTP methods, GET and POST. GET is used to access resources without modifying them. POST is used to create new resources and modify existing ones. The documented method must be used for every call.

Parameters

Calls take named parameters, where keys and values are Unicode strings. For GET requests, the parameters must be query-string encoded in the URL. Since there is no means to specify the character encoding of the query string, it is always taken to be in UTF-8. For POST requests, the parameters may either be query-string encoded in the URL, or sent in the body with content type application/x-www-form-urlencoded or multipart/form-data. You may specify the character encoding for these requests, however UTF-8 is recommended. Unicode characters U+10000 and above are not allowed. You may put some parameters in the query string and others in the body (though the same parameter should not appear in both places). A parameter may given multiple times only where specifically documented. Parameter names are case-sensitive.

Some special parameters are used for access control.

Unknown parameters in the request are ignored by default. However, there will be a message about them in the X-TDA-Warning header. You may request that they be considered errors by setting the X-TDA-Strict-Parameters request header to true.

If you cannot avoid passing extra parameters, you can avoid the risk that they will have meaning in a future version of the Service by giving them names beginning with an underscore ("_"), which will never be used by the Service.

Responses

Status

The HTTP status line contains a numeric status code and a string reason code. You may use the reason to discriminate the status beyond the code. For convenience, the status code and reason are both duplicated in the response body.

Status codes are classed by their numeric range. Codes 200-299 reflect success. For other codes, an explanatory message (in English) is given in the X-TDA-Message header and in the body. Codes 400-499 reflect invalid requests. The message may help you find and fix the error. Codes 500-599 reflect problems that are not your fault. Codes in other ranges are not used by the Service.

A common set of codes and reasons is given here. Others are documented for specific calls.

200 success
The response body contains the successful result of the call
201 created
One or more resources were created. The Location header points to the primary created resource. The response body contains the successful result of the call, and should describe all created resources.
400 bad_request
Some part of the request was malformed and could not be understood.
400 invalid_params
Parameters were missing, of the wrong types, or otherwise incorrect.
400 no_app_key
An app key was not found where required.
400 bad_app_key
The app key given is not valid.
401 no_credentials
The request requires authentication but no credentials were supplied.
401 bad_username_password
The username and password given are not a valid TAUS Data login.
401 bad_auth_key
The auth key given is not valid.
403 permission_denied
This user is not allowed to make this request.
403 must_accept_terms
This user must accept the current terms of use for the function.
404 no_such_resource
The resource named in the URL does not exist or is invisible to this user.
405 method_not_allowed
This method is not allowed for this resources.
409 state_error
The resource is in the wrong state for this request.
500 system_error
An unexpected error happened in the system. TAUS Data will investigate the problem.
503 down_for_maintenance
The Service is down for mainenance and will be available again shortly.
503 not_ready
The resource you requested is not currently ready, but will be later.

Headers

Information is not generally returned in HTTP headers. The Content-type header is always set, and you may check to see that the body has the expected content type. (In extreme error cases, the Service may not return a body of the requested type.) The Location header is sent along with a status 201 response, pointing to a created resource (as required by HTTP); however more complete information is available in the result.

Body

A few calls return special document types, eg. zipped TMX files. Those calls are termed "unusual", and their response formats are specified in the call documentation.

Other, "usual," results are structured as values built from lists, maps (aka. dictionaries, objects, associative arrays) from keys (aka. properties, fields) to values, and the basic types. All lists contain elements of only one type. Note there are no "null" or undefined values.

At the top level, every usual result is a map with at least two keys, status (type natural) and reason (type string), whose values are the HTTP status code and reason string. Most calls add additional keys. Error respones add a message (type string) key explaining what went wrong.

Response formats

Results come in one of three formats: "JavaScript Object Notation" (JSON), XML, or HTML. For all formats, the character encoding is UTF-8. The HTML format is a human-readable rendering for testing purposes, and is not further described.

The JSON format follows the result structure directly, with maps represented by JSON objects, lists by JSON arrays, and basic values by JSON strings, numbers, and the boolean literals true and false. All JSON results from the Service are also valid JavaScript expressions (even though this is not true of JSON in general).

In the XML format, a map is represented by a series of XML elements whose names are the keys of the map. If the value for a key is a basic value or another map, there is a single XML element for the key, containing the representation of the value; if the value for a key is a list, there is an XML element for every element of the list, containing the representation of the list element. Basic values are represented as character data. The whole thing is wrapped in a top-level <result> element. Note that not all possible results described above can be represented in XML this way, eg. a list within a list. However, only results that can be represented in the XML format (as well-formed XML) will be produced. The exact representation of basic types in either format is given below.

Basic Types

There only a few basic types. Each has a string representation, used for parameters and XML results, and a JSON representation.

string
An arbitrary Unicode string. String representation is itself; JSON representation is a JSON string.
natural
A positive integer (1, 2, ...) less than 2^63. String representation is the decimal string; JSON representation is a JSON number.
boolean
True or false. String representation is one of the strings "true" or "false". JSON representation is one of the JSON literals true or false.
lang
A language-region pair. See below.
enum
One from a given list of strings.

The file Type

There is another type, file, that is only used for parameters. The value for this parameter is expected to be the contents of a (possibly large) file. file parameters are only used with POST requests, and when one is part of a request, the body must use the multipart/form-data content type, and the part containing the file must have a filename attribute in the Content-Disposition header.

Access Control

Access control is based upon TAUS Data user accounts. See the terms of use for how accounts may be used. The Service does not provide a way to register new users, but you may direct users to register themselves.

Authentication

Most calls to the Service must contain valid authentication credentials. For convenience, flexibility, and security, the Service provides several methods of authentication. All are intended to be used with HTTPS to keep credentials private.

Username and Password

In this method, the username and password of a registered user are passed with each request. The recommended form is HTTP basic authentication. (Other HTTP authentication methods, such as digest, are not supported.) In support of HTTP basic, the Service may return a WWW-Authenticate header with value Basic realm="TAUS Data Web Service" in response to an unauthenticated request. However, since some platforms (in particular, web browsers) will pop up an unwanted password entry form when receiving this header, it is returned only when HTTP basic authentications was tried, or no authentication method was tried, in the request.

HTTP basic authentication is also handy for testing the Service directly in a web browser, as you may have found if you clicked the links in the introductory example.

If HTTP basic is not feasible, you may pass the username and password as auth_username and auth_password parameters.

Login

In this method, you first make a POST /auth_key?action=login request that is authenticated by username and password. An "auth key" is returned that can be used (without the username and password) to authenticate future requests. To use the auth key, pass it as the X-TDA-Auth-Key request header or, if that is not feasible, the auth_auth_key parameter. (Note the confusing name. This is consistent with auth_username and auth_password, that is, auth_ followed by the name of the thing. The intent is to avoid conflicts with the name as a normal parameter.)

Currently, this is only marginally more secure than username and password authentication. The application must possess the username and password and send it with the initial POST /auth_key; and the auth key never expires. However, it's use is consistent with the connect method.

Connect

The connect method enables an application to act on behalf of a user, without the user disclosing their username and password to the application. You first make a POST /auth_key?action=connect request. This request requires no authentication, but it must contain an app key. An "auth key" is returned that can be used to authenticate future requests as above, but first it must be activated. The application should send the user to the manage_url field of the result, where they can activate the auth key. After the user activates the auth key, they will be sent to the redirect_url given in the request.

Security note: If you use connect authentication, you must secure your app key from disclosure. Typically, this means that it should be stored (even if encrypted or obfuscated) only on well-protected systems and used in server-based applications. If an attacker obtains your app key, he may be able to trick users into granting him access to their TAUS Data accounts by impersonating your application. In such an event, TAUS Data will revoke your app key and issue a new one. For these reason, you must specifically request connect access before it will be enabled for your app key.

Permissions

Many calls may be performed by any authenticated user; some, however, are restricted. To determine the user's permissions, call GET /user with the self parameter set.

App Keys

There is no registration requirement to use the Service, beyond a user account at www.tausdata.org. However, you are encouraged to register your application and get an app key. (For now, please contact TAUS Data for an app key.) An app key identifies your application to TAUS Data, and helps us diagnose any problems you may have. An app key also enables you to use the connect authentication method. There may be additional benefits in the future, such as logs and statistics of the calls made by your app. Please note that an app key does not authorize calls to the Service. Users still need TAUS Data accounts as described in access control.

To use your app key, pass it as the X-TDA-App-Key request header or, if that is not feasible, the auth_app_key parameter.

Concepts

Organizations

An organization is the most important unit of identity in the Service. An organization may or may not be a TAUS Data member, and may really represent a single individual. Every user account belongs to an organization.

Langs

What we refer to as a "lang" and use throughout the Service is actually a language-region pair, often called a locale. For example fr-CA for Canadian French. When referring to the language only (eg. French), we always call it a "language" to distinguish it from a "lang".

Languages are represented as ISO 639-1 two letter language codes. Regions are represented by ISO 3166-1 alpha-2 two letter country codes (or in a few cases, by non-standard codes such as XL for Latin America). Langs are represented as a language, followed by a dash ("-"), followed by a region. Langs, languages, and regions are case-insensitive.

Not all possible langs are supported by TAUS Data. The current list is:
TAUS Data Langs
af-ZAAfrikaans
ar-AEArabic (U.A.E.)
ar-ARArabic
ar-EGArabic (Egypt)
ar-SAArabic (Saudi Arabia)
be-BYBelarusian
bg-BGBulgarian
cs-CZCzech
cy-GBWelsh
da-DKDanish
de-DEGerman (Germany)
el-GRGreek
en-AUEnglish (Australia)
en-CAEnglish (Canada)
en-GBEnglish (United Kingdom)
en-USEnglish (United States)
en-ZAEnglish (South Africa)
es-EMSpanish (International)
es-ESSpanish (Spain)
es-MXSpanish (Mexico)
es-XLSpanish (Latin America)
et-EEEstonian
eu-ESBasque
fa-IRFarsi
fi-FIFinnish
fr-BEFrench (Belgium)
fr-CAFrench (Canada)
fr-FRFrench (France)
he-ILHebrew (Israel)
hr-HRCroatian
ht-HTHaitian
hu-HUHungarian
id-IDIndonesian
is-ISIcelandic
it-ITItalian (Italy)
ja-JPJapanese
ko-KRKorean
lt-LTLithuanian
lv-LVLatvian
mk-MKMacedonian
ms-MYMalay
mt-MTMaltese
nb-NONorwegian (Bokmal)
nl-BEDutch (Belgium)
nl-NLDutch (Netherlands)
nn-NONorwegian (Nynorsk)
no-NONorwegian
pl-PLPolish
pt-BRPortuguese (Brazil)
pt-PTPortuguese (Portugal)
ro-RORomanian
ru-RURussian
sk-SKSlovak
sl-SISlovene
sv-SESwedish
th-THThai
tr-TRTurkish
uk-UAUkranian
vi-VNVietnamese
zh-CNChinese (PRC)
zh-HKChinese (Hong Kong)
zh-TWChinese (Taiwan)
You may need to map your application's language codes to TAUS Data's. New langs are considered for addition on a case-by-case basis.

Attrs

Every TM in TAUS Data has several attributes, or "attrs". They are: industry, content_type, owner, product, provider. This list is not expected to change. Attrs have numerical values, and each value is associated with a name for display.

The values for provider and owner are organizations. All of the attrs for a TM are chosen by the uploader, except for provider which is the organization of the uploader. The uploader can specify as the owner one of the organizations on whose behalf they are permitted to upload. The uploader can specify as the product one of the products belonging to the owner.

Attr values will never be reused, ie, a given attr value will always refer to the same entity. The list of values for the industry and content_type attrs will change rarely. The current lists are:
TAUS Data Industries
1Automotive Manufacturing
2Consumer Electronics
3Computer Software
4Computer Hardware
5Industrial Manufacturing
6Telecommunications
7Professional and Business Services
8Stores and Retail Distribution
9Industrial Electronics
10Legal Services
11Energy, Water and Utilities
12Financials
13Medical Equipment and Supplies
14Healthcare
15Pharmaceuticals and Biotechnology
16Chemicals
17Undefined Sector
18Leisure, Tourism, and Arts
TAUS Data Content Types
1Instructions for Use
2Sales and Marketing Material
4Policies, Process and Procedures
5Software Strings and Documentation
6Undefined Content Type
7News Announcements, Reports and Research
8Patents
9Standards, Statutes and Regulations
10Financial Documentation
12Support Content
You may wish to map these to categories in your application

Listing Langs and Attrs

The Service provides calls for listing langs and attr values. The GET /lang call lists langs, and the GET /attr/<attr> family of calls lists attrs, for example GET /attr/industry lists industries. All of these calls are similar: they take as parameters other langs and attrs that may limit the results, and return a list of objects of the type requested. Depending on the purpose for which you are listing them, you may want different listings. For example, if you want to download data, you probably only want to see combinations of langs and attrs for which data is available. The different ways you can list langs and attrs, and what you get back, are described here.

The "purpose" of the listing is given by the for parameter. When for is not given, the default is to list every value in the system. Other parameters are ignored. For example, GET .../attr/industry.json lists all industries. However, for privacy reasons, you may not list all owners, products, or providers.

When the for parameter is upload, all combinations of attrs that could be applied to an upload by this organization are listed:

When the for parameter is download, only langs and attrs for which there is data to download are listed. When listing langs for download, you must set the side parameter to either source to list langs for which there is data with that source lang, or target to list langs for which there is data with that target lang. When listing langs, you may also set the source_lang or target_lang parameter (whichever is the opposite of the side parameter) to list only langs for which there is data with that source or target lang. For example, to list all langs for which there is data with that lang as the target lang and en-US as the source lang, GET .../lang.json?side=target&source_lang=en-US. When listing attrs for download, the lang and other attr parameters limit the listing to values for which there is data with all of those langs and attrs. For example, to list all owners for which there is data with that owner, en-US as the source lang fr-FR as the target lang, and 3 as the industry, GET .../attr/owner.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3.

When the for parameter is segment, only langs and attrs for which there may be search results are listed. This works exactly like listing for download, but the results exclude combinations that have not been indexed for search or are not supported for search.

When the for parameter is leverage, only langs and attrs for which there may be leverage results are listed. This works exactly like listing for download, but the results exclude combinations that have not been indexed for leverage or are not supported for leverage.

Miscellany

Some calls return English language messages and names. Messages are usually diagnostics, such as the contents of the message key that comes with error responses. Names are the human-readable designations for langs, attrs, and other objects. There is no way to request messages and names in other languages or to translate them. Pedantically, messages do not begin with a capital or end with punctation.

All keywords are singular, even when they would be more natural as plural, in order to avoid dealing with grammatical irregularities.

Use Cases

The following sections walk through typical use cases.

Search

In this use case, you wish to search for segments containing or similar to a given query. Please note the Service provides only a simplified version of TAUS Search. Word attributes (lemma and part of speech) are not available, nor are computed translations.

The first step is to identify the langs and attributes of the data you wish to search. Source and target langs are required, and all other attrs are optional. If your application already knows the langs and attrs of the data to search, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to segment, to get langs and attrs for which data is searchable.

To search, call GET /segment with the query, the langs, and the attr criteria. The result is a list of segments, with their attrs. The source and target text are available as plain text strings. To report problems with segments, see POST /segment/<id>?action=report_problem.

Leverage

In this use case, you wish to leverage TAUS Data data for a translation job in XLIFF format. The Service will produce an updated XLIFF file containing matches (both exact and fuzzy) from TAUS Data.

The first step is to identify the langs and attributes of the data you wish to leverage. Source and target langs are required, and all other attrs are optional. If your application already knows the langs and attrs of the data to search, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to leverage, to get langs and attrs for which data can be leveraged.

There are several things you need to know about leverage requests. First, every leverage request must be a zip archive containing a single XLIFF file. The XLIFF file must be valid XLIFF, and must be bilingual in the source and target langs. Second, we only search for matches for data that needs translation, so you may upload the full XLIFF for your translation job, and we will ignore everything that is not translatable or already translated. On the other hand, this will take longer to upload and process, so you may wish to prepare a smaller XLIFF document containing only the data for which you want leverage. Third, the result XLIFF will contain matches in <alt-trans> elements, with the origin attribute set to TAUS Data and the match-quality attribute set to a number from 0 to 99 (never 100), followed by a percent sign (%). The <target> element in the <alt-trans> will have the state-qualifier attribute set to leveraged-repository.

Here is an example XLIFF file that you may use as a starting point. It has a mix of data requiring translation and data not requiring translation. If you submit it for leverage, you will get matches for the <trans-unit>s marked with leverage comments, but none for the <trans-unit>s marked with no leverage comments.

To leverage, make a POST /leverage?action=create request with the zipped XLIFF file, along with the langs and attr criteria. A successful response indicates that the file has passed the first round of checks: the XLIFF file was extracted, the beginning of the file was parsed successfully, and the lang codes in the file match the source_lang and target_lang parameters. Then you must make a POST /leverage/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the leverage request.

TAUS Data will begin processing the leverage request. To monitor progress, you should make polling GET /leverage/<id> requests until the leverage request passes through the the processing and not_ready states. Please use a poll interval of at least one minute. If a problem is found with the XLIFF file, the state will go to error, and the reason and message fields will contain information about the problem. Otherwise, the state will go to ready, and the user who created the leverage request will get an email notification with more details. Finally, you must make a POST /leverage/<id>?action=approve request, sending the leverage request to the success state, perhaps after receiving positive approval from the user. You can skip the approval step by setting the approve parameter when creating the leverage request. On approval, the leverage is charged to the organization's account, and you may retrieve the results as a zipped XLIFF file by calling GET /leverage/<id>/result.xlf.zip.

You can get a summary of the leverage results when the request is in the ready or success state. You may wish to display this to the user before approving the leverage request. First, GET /leverage/<id> will have the segment_count and word_count fields set to the number of segments and words in the original XLIFF file that need translation, and the leverage_segment_count and leverage_word_count fields set to the total number of segments and words found as matches, that will be in the result XLIFF (this is the number of words that will be charged to the organization's account). Second, you can call GET /leverage/<id>/match_counts to get a detailed break-down of matches by score range: how many segments and words were matched in the 95-99 range, in the 85-94 range, etc.

We aim both to follow the XLIFF specification and to accommodate common practice, so that we interoperate with many other tools out of the box. However to ensure compatibility, you may wish to know exactly how we process and produce XLIFF.

The rules for interpreting lang codes in XLIFF are the same as for TMX.

Here is how we determine to which <trans-unit>s to apply leverage:

  1. We skip <trans-unit>s with the translate attribute set to no, either directly or inherited from an enclosing <group>.
  2. We skip <trans-unit>s with a <context> with its match-mandatory attribute set to yes, either directly or inherited from an enclosing <group>.
  3. For the remaining <trans-unit>s, we apply leverage in these cases:
    1. there is no <target>, or
    2. the <target> has its state attribute set to new or needs-translation, or
    3. the <target> has no state attribute, and the text content of the <target> is empty or equal to the text content of the <source>.

For <trans-unit>s with a <seg-source> we apply leverage to each <mrk> element of the <seg-source>. Each <mrk> element must have its mtype attribute set to seg. For <trans-unit>s without a <seg-source> we apply leverage to the <source> element.

The result XLIFF document is UTF-8 encoded, and the XML declaration is changed to reflect this. Otherwise, the result preserves the orginal XLIFF and only makes additions. The additions are:

  1. We add XML namespace declarations for the TAUS Data namespace used below.

  2. We add a <tool> element to each <header>, with the tool-company attribute set to TAUS Data, the tool-name attribute set to Translation Matching, and the tool-version attribute set to the version of the leverage service, currently .99.

  3. We add a <phase-group> element to each <header>, containing a single <phase> element with the tool-id attribute referring to the <tool> element added to the <header>, the process-name attribute set to repository leverage, the job-id attribute set to the leverage request resource name, eg. leverage/17, the company-name, contact-name, contact-email, and contact-phane attributes filled in (as available) with information about the user and organization that created the leverage request, and the date attribute set to the time the XLIFF file was created.

  4. In every <trans-unit> to which we apply leverage, we add zero or more <alt-trans> elements. They are ordered by the corresponding <mrk> element of the <seg-source>, if there is one, then from best match to worst, and placed before any existing <alt-trans> elements. Each contains a <source> and <target>, both containing only text. The tool-id and phase-name attributes refer to the <tool> and <phase> elements added to the <header>, the origin attribute is set to TAUS Data followed by the data owner in parentheses, the alttranstype attribute is set to proposal, the datatype attribute is set to plaintext, and the match-quality attribute is set to a number from 0 to 99 followed by a percent sign (%). (We never set match-quality to 100%.) If the <alt-trans> is for a <mrk> element of a <seg-source>, any mid attribute from the <mrk> is copied to the <alt-trans>. The <alt-trans> element also has several TAUS Data specific attributes in the http://www.tausdata.org/xml/common namespace: segment is the TAUS Data segment ID, and industry, content_type, owner, product, provider are the display names of the segment attr values. The <target> of the <alt-trans> has its state-qualifier attribute set to leveraged-repository.

    Here is an example <alt-trans> generated by leverage:

    <alt-trans 
        tool-id="org.tausdata.leverage"
        phase-name="org.tausdata.leverage"
        origin="TAUS Data"
        alttranstype="proposal"
        datatype="plaintext"
        match-quality="88%"
        tda:segment="en-us_fr-fr_5451541"
        tda:industry="Computer Software"
        tda:content_type="Instructions for Use"
        tda:owner="TDA"
        tda:product="Default"
        tda:provider="TDA">
      <source>source text</source>
      <target state-qualifier="leveraged-repository">target text</target>
    </alt-trans>
    

Note that we never modify the <target> of a <trans-unit>, or its state attribute.

Download

In this use case, you wish to download TM data. The first step is to identify the langs and attributes of the data you wish to download. Source and target langs are required, and all other attrs are optional. If your application already knows the langs and attrs of the data to download, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to download, to get langs and attrs for which data is available.

There are several things you need to know about downloads. First, every download includes only data that has not already been downloaded by the same organization. In order to get data that was already downloaded, you must re-request the old download. Second, some organizations have download limits and may be charged for excess downloads. If a download would exceed a hard limit, it is disallowed, but there is no way to tell if a organization is over a soft limit and will be charged for a download. Download limits may vary depending on the source and target langs of the download. Third, there is no way to request only some of the words meeting the given criteria. To limit the number of words, you must refine the criteria. Fourth, even data that was provided by the organization creating the download counts towards their download limits if it is included in the download. There is an option to exclude data provided by the same organization requesting the download.

You may wish to check how much data is available, and whether it is within the organization's limit, before creating a download.. For this, call GET /counts with the chosen langs and attrs. For example, to get counts for data with en-US as the source lang fr-FR as the target lang, and 3 as the industry, GET .../counts.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3. The result has several useful fields. word_count is the total number of words meeting the given criteria. new_word_count and new_segment_count are the number of words and segments that have not been downloaded by the organization (and would be included in a new download); old_segment_count is the number of words that have already been downloaded by the organization. If the organization has a download limit for this source and target lang, it is given by limit. Finally, if the download would be allowed according to the organization's limits, within_limit is true; otherwise it is false.

When you are ready to download, make a POST /download?action=create request with the langs and attr criteria. This fixes the exact set of data to be downloaded. You may examine the result to make sure the download has the amount of data expected (in case it has changed since your last call to /counts), or perhaps confirm it with the user. Then you must make a POST /download/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the download. If a download is never confirmed, it does not count towards the organization's limit.

Once the download is confirmed, you should make polling GET /download/<id> requests until the download is in the ready state. At that point, you may retrieve the download as a zipped TMX file by calling GET /download/<id>.zip. Alternately, you may simply make GET /download/<id>.zip requests, and if the file is not yet ready, the response will have a 503 not_ready status code. Please use a poll interval of at least one minute.

TMX files downloaded from TAUS Data follow the "Level 1" implementation level defined by TMX 1.4b. All segments are plain text with no inline codes.

Upload

In this use case, you wish to upload a TM to TAUS Data. The first step is to choose the langs and attributes of the TM. Both source and target langs and all attributes except provider and owner are required. provider will be the organization performing the upload, and cannot be set. owner may be set to any of the organizations on whose behalf the provider is permitted to upload; if not given, it defaults to the provider. If your application already knows the langs and attrs of the TM, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to upload.

(There is no way to add new owners and products using the Service. There is also no way to grant permission to another organization to upload on your behalf. Please contact TAUS Data for help with this.

There are several things you need to know about uploads. First, every upload must be a zip archive containing a single TMX file. The TMX file must be valid TMX, and should contain data for the source and target langs only. Second, TAUS Data filters duplicate data, so if you upload a TM and then re-upload it after adding new translations, only the new translations will be saved. On the other hand, this will take longer to upload and process, so make a reasonable effort to upload only new data. Third, uploaded TMs are available for download, leverage, and search.

When you are ready to upload, make a POST /upload?action=create request with the zipped TMX file, along with the langs and attrs. A successful response indicates that the file has passed the first round of checks: the TMX file was extracted, the beginning of the file was parsed successfully, and the lang codes in the file match the source_lang and target_lang parameters. Then you must make a POST /upload/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the upload.

TAUS Data will begin processing the file. To monitor progress, you should make polling GET /upload/<id> requests until the upload leaves the processing state. Please use a poll interval of at least one minute. If a problem is found with the TMX file, the state will go to error, and the reason and message fields will contain information about the problem. Otherwise, the state will go to ready, and the user who created the upload will get an email notification with more details. Finally, you must make a POST /upload/<id>?action=approve request, sending the upload to the success state, perhaps after receiving positive approval from the user. You can skip the approval step by setting the approve parameter when creating the upload. On approval, the upload is credited to the organization's account and the data is available for download. There may be a delay before the data is available for leverage and search.

Although it will probably not be an issue, you may wish to know how TAUS Data interprets lang codes within the TMX file. We look for lang codes that are "compatible" with the source and target langs given in POST /upload?action=create. The current definition of compatible is that the language prefixes are the same (case-insensitive). For example, if you upload a TM giving a target_lang of es-XL the TMX file may use the code es-XL, es-AR or just es. However, the code must be be formatted either as a two-letter language code; or a two-letter language code, followed by a dash ("-"), followed by a two-letter region code. So Spanish would not be recognized and the TMX file would not be accepted. Also, the same lang code must be used throughout the TMX file; it may not start with es-XL and switch to es. TAUS Data will relax these restrictions to accommodate real-world needs. We will try never to reject lang codes that were previously accepted.

Note that while the TAUS Data's Data Pooling web interface supports automatic lang detection, this function is not available in the Service. We expect that applications using the upload API already know the langs of the TMs they are uploading. If this is not the case for your application and you would benefit from lang detection, contact TAUS Data. Also, note that there is an upload state incomplete to accommodate cases where the langs can't be detected; the source_lang and target_lang fields are optional in upload results for the same reason. You may run across this case now, with uploads created using the Data Pooling web interface; however there is nothing you can do with them using the Service.

Developer Guidance

Choosing a Response Format

We recommend the JSON format, as it closely corresponds to the structure of the result and will require less decoding on your end. But JSON and XML are equally supported.

Choosing an Authentication Method

We recommend using HTTP basic authentication for desktop apps, and connect authentication for web apps (including AJAX apps).

TAUS Data Registration and Membership

Your users must have TAUS Data user accounts in order for your application to authenticate with the Service. TAUS Data user accounts are free to the public. You may direct users to http://www.tausdata.org/index.php/component/users/?view=registration to register for an account. To download or leverage data, users must have a TAUS Data membership (or belong to a member organization). You may direct users to https://www.tausdata.org/index.php/members/join-tda for information on joining TAUS Data.

Lang and Attr Selection Forms

When writing a user interface for selecting langs and attrs, we recommend modeling them on the UIs used at tausdata.org. This will provide users with a consistent experience. We recommend that every lang or attr constrain the listings following it on the screen (and only these).

Cross-site Scripting

If you are developing an AJAX application, cross-site scripting restrictions in the browser will likely prevent you from calling the Service directly. In this case, the best solution is to construct a simple proxy that accepts calls on your web server, and forwards them to www.tausdata.org/api. Further information.

AJAX File Upload

It is not possible to use the main engine of AJAX applications, XMLHttpRequest to perform file uploads, due to browser security restrictions. (There is no way to send the file contents.) It is necessary instead to arrange for the browser to submit your request as a form. To prevent the result from appearing to the user, the form submission is usually targeted to a iframe that is not displayed. Accessing the result of the submission is tricky, and many complicated schemes are employed by web developers. However, we have found that a call to the Service can be loaded right into an iframe and processed reliably. We describe the method here. (Please contact TAUS Data if you would like these secrets revealed!)

Terms of Use

Users Must Use Their Own TAUS Data Accounts

Every call to the Service must be authenticated with the account of the user who caused the request to be made, or of a user within the same TAUS Data member. Do not use your own TAUS Data account to authenticate calls made for arbitrary users.

TM Sharing Conditions

Users who do not belong to a TAUS Data member must read and agree to the TM Sharing Conditions before they may upload TMs. If your applications allows users to upload TMs, it must enforce this requirement as well. Specifically, before submitting an upload, you must perform these steps:

  1. Check whether the current user needs to agree to the TM Sharing Conditions. Call GET /user with the self parameter set. The must_agree_to_upload_terms field of the result tells you whether the user needs to agree to the TM Sharing Coditions.
  2. Display the TM Sharing Conditions so that the user can easily read them. The TM Sharing Conditions are available by calling GET /upload_terms.txt. You should request the TM Sharing Conditions every time you need to display them, so that any changes will be reflected in your application.
  3. Give the user a way to positively and unambiguously agree to the terms. For example, you may display a check-box next to the words, "Check to indicate that you have read and agree to the TM Sharing Conditions.", and only make the upload if the check-box is checked.

Compatibility

We intend to support the Service as specified here indefinitely, with reasonable allowances for growth, including: Calls may be added. Parameters may be added to calls. In results, keys may be added to maps and values may be added to enums. Errors may be added. Various limits may be imposed for performance reasons.

The search algorithm in GET /segment may be changed and the query language may be extended.

Call Listing

The heading of every call contains the method, the resource name, and for POST calls the action parameter. The method listed must be used; you cannot use GET for a call documented to use POST. If the resource has an extension, the call has an unusual result. Otherwise, you must choose the result format by adding a .json, .xml, or .html extension. If the resource contains <id> an entity id is part of the resource name, for example GET .../lang/en-US.json for GET /lang/<id>. The following information is then listed, as applicable:

Id Type:
The type of the <id> in the resource name.
Parameters:
For each named parameter accepted by the call, its name; type (one of the basic types); its default value (if optional); and whether it can be repeated multiple times. Optional parameters are in italics. For optional parameters, if you give the empty string as the parameter value, it will be treated the same as if you gave no value at all, and the default (if any) will be used.
Result:
The schema of the result, in a JSON-like notation. Keys that may not be present in every result are in italics. The following represents a map, with one key lang, whose value is a list of maps, each of which has key id with value of type lang and another key name with value of type string.
{ lang: [{ id: lang,
           name: string }] }
The status and reason keys are implicit.
Errors:
Possible errors that are specific to the call. As with common errors, they are returned in the HTTP status line and the status and reason keys of the result, and an explanatory message is returned in the X-TDA-Message header and the message key of the result.

GET /status direct link

Get the status of the Service. Useful for checking that the system is alive and you can talk to it. Example: GET .../status.json.

GET /user direct link

Parameters:
Result:
{ user: [{ id: natural,
           organization: { id: natural,
                           name: string },
           can_search: boolean,
           can_download: boolean,
           can_leverage: boolean,
           can_upload: boolean,
           must_agree_to_upload_terms: boolean }] }

Get a listing of users. Currently, only the authorized user of the request is returned. The self parameter must be set to true to make this explicit (setting it to false is currently an error). Some result fields are optional for future extension, but will always be set when self is true.

The can_* fields indicate whether the user is allowed to perform those functions. Currently, can_search is always true; can_download and can_leverage are true for TAUS Data member users only, except for those who have been explicitly denied download permission; and can_upload is always true, except for TAUS Data member users who have been explicitly denied upload permission.

See the terms of use for the must_agree_to_upload_terms field.

POST /auth_key?action=connect direct link

Parameters:
Result:
{ auth_key: { id: string,
              manage_url: string } }
Create an auth key by the connect method. This call does not require authentication, but you must supply an app key. The created auth key will be inactive; direct the user to the manage_url to activate it. After activation, the user will be redirected to the redirect_url.

POST /auth_key?action=login direct link

Result:
{ auth_key: { id: string,
              manage_url: string } }
Create an auth key by the login method. This call must be authenticated, presumably by username and password. The created auth key will active.

GET /lang direct link

Parameters:
Result:
{ lang: [{ id: lang,
           name: string }] }
Get a listing of langs, sorted by name. See "Listing Langs and Attrs". Example: GET .../lang.json.

GET /lang/<id> direct link

Id Type:
string
Result:
{ lang: { id: lang,
          name: string } }
Get a lang. Example: GET .../lang/en-US.json.

GET /attr direct link

Result:
{ attr: [string] }
Get a listing of attrs. Currently, the attrs are: industry, content_type, owner, product, provider, and this list is not expected to change. Example: GET .../attr.json.

GET /attr/provider direct link

Parameters:
Result:
{ provider: [{ id: natural,
               name: string }] }
Get a listing of provider, sorted by name. See "Listing Langs and Attrs".

GET /attr/owner direct link

Parameters:
Result:
{ owner: [{ id: natural,
            name: string }] }
Get a listing of owner, sorted by name. See "Listing Langs and Attrs".

GET /attr/content_type direct link

Parameters:
Result:
{ content_type: [{ id: natural,
                   name: string }] }
Get a listing of content_type, sorted by name. See "Listing Langs and Attrs".

GET /attr/product direct link

Parameters:
Result:
{ product: [{ id: natural,
              name: string }] }
Get a listing of product, sorted by name. See "Listing Langs and Attrs".

GET /attr/industry direct link

Parameters:
Result:
{ industry: [{ id: natural,
               name: string }] }
Get a listing of industry, sorted by name. See "Listing Langs and Attrs".

GET /counts direct link

Parameters:
Result:
{ word_count: natural,
  new_word_count: natural,
  new_segment_count: natural,
  old_word_count: natural,
  limit: natural,
  within_limit: boolean,
  direct_count: natural,
  direct_old_count: natural,
  matrix_count: natural,
  matrix_old_count: natural }
Get counts on the data meeting the given criteria. Example: GET .../counts.json?source_lang=en-US&target_lang=fr-FR. The result fields are described in the download use case.

GET /segment direct link

Parameters:
Result:
{ segment: [{ id: string,
              source_lang: { id: lang,
                             name: string },
              target_lang: { id: lang,
                             name: string },
              source: string,
              target: string,
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string } }] }

Search for segments matching the given query, meeting the given criteria. The Service uses two different search methods. If the query short (four words or less), it performs an exact term search, returning segments containing the query words in order, with no intervening words or puctuation. For this search to work, the query words must be space separated (even for languages that do not normally separate words by spaces), without punctuation. (If you need to match punctuation, you can separate it from the words with spaces, in which case it finds exactly this punctuation.) The results are unordered (but stable over time) and will typically contain segments from a variety of data owners, industries, etc. For example, to find uses of the term "data center": GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q=data+center.

If the query is long or exact term search fails, the Service performs fuzzy whole-segment search, returning segments ordered by their similarity to the query unless the fuzzy optional parameter is set to false. The query does not need to be specially formatted for this search method. For example, to find segments similar to "How do I reset my password?": GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q=how+do+I+reset+my+password%3F.

The direction parameter determines the translation direction of the segments the search will inspect. For example, searching in English with the source as English and the target as French in the forward direction searches in segments that are translations from English to French. If reverse is specified, the search will look in segments that are from French to English by searching the target English text for a match. Both is an alias for specifying both forward and reverse directions and is the default search direction if direction is omitted. The matrix direction will always include the forward direction by default before attemping to find a matrix match.

The exact search algorithm is subject to change, so consider this a statement of intent rather than a guarantee.

The limit parameter is a hint for how many segments you want. The result may in fact contain more or fewer.

GET /segment/<id> direct link

Id Type:
string
Result:
{ segment: { id: string,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             source: string,
             target: string,
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string } } }
Get the segment with this id.

POST /segment/<id>?action=report_problem direct link

Id Type:
string
Parameters:
Submit a problem report for a segment. The email address on file for the user will be used if one is not given.

GET /leverage direct link

Result:
{ leverage: [{ id: natural,
               user: { id: natural,
                       name: string },
               creation_time: time,
               source_lang: [{ id: lang,
                               name: string }],
               target_lang: [{ id: lang,
                               name: string }],
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
               word_count: natural,
               segment_count: natural,
               leverage_word_count: natural,
               leverage_segment_count: natural,
               reason: enum (bad_xliff, unsupported_xliff, bad_langs),
               message: string }] }
Get a listing of leverage requests made by this organization, ordered from newest to oldest.

POST /leverage?action=create direct link

Parameters:
Result:
{ leverage: [{ id: natural,
               user: { id: natural,
                       name: string },
               creation_time: time,
               source_lang: [{ id: lang,
                               name: string }],
               target_lang: [{ id: lang,
                               name: string }],
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
               word_count: natural,
               segment_count: natural,
               leverage_word_count: natural,
               leverage_segment_count: natural,
               reason: enum (bad_xliff, unsupported_xliff, bad_langs),
               message: string }] }
Errors:
400 bad_zip
The zip file could not be unpacked to produce an XLIFF file.
400 bad_xliff
The XLIFF file is invalid.
400 unsupported_xliff
Some features of the XLIFF file are unsupported by the Service; for example, the XLIFF file has more than two langs.
400 bad_langs
The lang codes in the XLIFF file do not match the source_lang and target_lang parameters.
400 over_limit
The organization is over its download limit.

Issue a leverage request for the given zipped XLIFF file. (The XLIFF file must have a .xliff or .xlf extension within the zip file.) Only segments with the given attrs will be used for leverage. By default, if there is not a problem, the leverage request will be in the unconfirmed state. However, setting the confirm parameter has the same effect as immediately calling POST /leverage/<id>?action=confirm. Setting the approve parameter has the same effect as calling POST /leverage/<id>?action=approve as soon as the leverage request reaches the ready state.

Normally, TAUS Data will notify the user who initiated the leverage request when it is ready, or if there is any error. Set the notify to disable this behavior.

There are many things that can go wrong with a leverage request, as reflected by the possible error results. Note that this call checks only the beginning of the file, so even if it is successful, there could be problems later on. Currently, an leverage request created with the Service will never go into the incomplete state.

Only TAUS Data members may leverage, and some member users may not have permission to leverage. See GET /user.

Leverage requests remain active for at least 30 days from their time of creation. After this, they may go into the expired state.

Note that the result schema allows multiple leverage requests to be created. Currently, you will never get more than one, but this may change in the future.

GET /leverage/<id> direct link

Id Type:
natural
Result:
{ leverage: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: [{ id: lang,
                              name: string }],
              target_lang: [{ id: lang,
                              name: string }],
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
              word_count: natural,
              segment_count: natural,
              leverage_word_count: natural,
              leverage_segment_count: natural,
              reason: enum (bad_xliff, unsupported_xliff, bad_langs),
              message: string } }
Get information about a leverage request.

POST /leverage/<id>?action=approve direct link

Id Type:
natural
Result:
{ leverage: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: [{ id: lang,
                              name: string }],
              target_lang: [{ id: lang,
                              name: string }],
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
              word_count: natural,
              segment_count: natural,
              leverage_word_count: natural,
              leverage_segment_count: natural,
              reason: enum (bad_xliff, unsupported_xliff, bad_langs),
              message: string } }
Errors:
400 over_limit
Approving the leverage request would put the organization over its download limit.
Approve a leverage request. The request must be in the ready state.

POST /leverage/<id>?action=confirm direct link

Id Type:
natural
Result:
{ leverage: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: [{ id: lang,
                              name: string }],
              target_lang: [{ id: lang,
                              name: string }],
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
              word_count: natural,
              segment_count: natural,
              leverage_word_count: natural,
              leverage_segment_count: natural,
              reason: enum (bad_xliff, unsupported_xliff, bad_langs),
              message: string } }
Confirm a leverage request. The request must be in the unconfirmed state.

POST /leverage/<id>?action=cancel direct link

Id Type:
natural
Result:
{ leverage: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: [{ id: lang,
                              name: string }],
              target_lang: [{ id: lang,
                              name: string }],
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (incomplete, unconfirmed, processing, not_ready, ready, cancelled, success, expired, error),
              word_count: natural,
              segment_count: natural,
              leverage_word_count: natural,
              leverage_segment_count: natural,
              reason: enum (bad_xliff, unsupported_xliff, bad_langs),
              message: string } }
Cancel a leverage request. The request must be in the unconfirmed state or the ready state.

GET /leverage/<id>/match_counts direct link

Id Type:
natural
Result:
{ match_counts: [{ min_score: natural,
                   max_score: natural,
                   word_count: natural,
                   segment_count: natural }] }
Get a break-down on the leverage results by fuzzy match score range. The leverage request must be in the ready state or the success state. Results are ordered from highest match score range to lowest. The current match score ranges are: 50 to 74, 75 to 84, 85 to 94, and 95 to 99. (This may change in the future.) Within each range, the segment_count and word_count fields are the number of segments and words in the original XLIFF file for which the best match found was in that range.

GET /leverage/<id>/result.xlf.zip direct link

Id Type:
natural

Download the leverage results as a zipped XLIFF file. The leverage request must be in the success state. If it is in the not_ready state and the approve parameter was set when the leverage request was created, you will get a 503 not_ready status, and you should try again in 1 minute or longer.

GET /download direct link

Result:
{ download: [{ id: natural,
               user: { id: natural,
                       name: string },
               creation_time: time,
               source_lang: { id: lang,
                              name: string },
               target_lang: { id: lang,
                              name: string },
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (unconfirmed, not_ready, ready, archived, cancelled),
               word_count: natural,
               segment_count: natural }] }
Get a listing of (confirmed) downloads created by this organization, ordered from newest to oldest.

POST /download?action=create direct link

Parameters:
Result:
{ word_count: natural,
  segment_count: natural,
  download: [{ id: natural,
               user: { id: natural,
                       name: string },
               creation_time: time,
               source_lang: { id: lang,
                              name: string },
               target_lang: { id: lang,
                              name: string },
               industry: { id: natural,
                           name: string },
               content_type: { id: natural,
                               name: string },
               owner: { id: natural,
                        name: string },
               product: { id: natural,
                          name: string },
               provider: { id: natural,
                           name: string },
               state: enum (unconfirmed, not_ready, ready, archived, cancelled),
               word_count: natural,
               segment_count: natural }] }
Errors:
400 no_data
There is no data meeting the given criteria.
400 over_limit
Creating the download would put the organization over its download limit.

Create a download of data meeting the given criteria. The exclude_own parameter excludes data uploaded by this organization from the download. By default, the download will be in the unconfirmed state and not yet charged to the organization. However, setting the confirm parameter has the same effect as immediately calling POST /download/<id>?action=confirm.

After being confirmed, the download will go into the not_ready state and then automatically to the ready state, when the TMX file may be downloaded. At some point, it may go into the archived state. You must call GET /download/<id>.zip to put the download back into the not_ready state.

Normally, TAUS Data will notify the user who initiated the download when it is ready. Set the notify to disable this behavior.

Only TAUS Data members may download, and some member users may not have permission to download. See GET /user.

The cancelled state is for future extension.

Note that the result schema allows multiple downloads to be created. Currently, you will never get more than one, but this may change in the future.

GET /download/<id> direct link

Id Type:
natural
Result:
{ download: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: { id: lang,
                             name: string },
              target_lang: { id: lang,
                             name: string },
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (unconfirmed, not_ready, ready, archived, cancelled),
              word_count: natural,
              segment_count: natural } }
Get information about a download.

GET /download/<id>.zip direct link

Id Type:
natural
Download the data as a zipped TMX file. The download must be in the ready state. If it is in the not_ready or archived state, you will get a 503 not_ready status, and you should try again in 1 minute or longer.

POST /download/<id>?action=confirm direct link

Id Type:
natural
Result:
{ download: { id: natural,
              user: { id: natural,
                      name: string },
              creation_time: time,
              source_lang: { id: lang,
                             name: string },
              target_lang: { id: lang,
                             name: string },
              industry: { id: natural,
                          name: string },
              content_type: { id: natural,
                              name: string },
              owner: { id: natural,
                       name: string },
              product: { id: natural,
                         name: string },
              provider: { id: natural,
                          name: string },
              state: enum (unconfirmed, not_ready, ready, archived, cancelled),
              word_count: natural,
              segment_count: natural } }
Errors:
400 over_limit
Confirming the download would put the organization over its download limit. This can happen if the organization has confirmed other downloads since this download was created.
400 already_downloaded
Some of the data in this download has already been downloaded. This can happen if the organization has confirmed other downloads since this download was created.
Confirm a download. The download must be in the unconfirmed state. After confirmation, the download is be charged to the organization. The download may enter either the not_ready or ready state.

GET /upload direct link

Result:
{ upload: [{ id: natural,
             user: { id: natural,
                     name: string },
             creation_time: time,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string },
             state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
             word_count: natural,
             segment_count: natural,
             reason: enum (bad_tmx, unsupported_tmx, bad_langs),
             message: string }] }
Get a listing of uploads made by this organization, ordered from newest to oldest.

POST /upload?action=create direct link

Parameters:
Result:
{ upload: [{ id: natural,
             user: { id: natural,
                     name: string },
             creation_time: time,
             source_lang: { id: lang,
                            name: string },
             target_lang: { id: lang,
                            name: string },
             industry: { id: natural,
                         name: string },
             content_type: { id: natural,
                             name: string },
             owner: { id: natural,
                      name: string },
             product: { id: natural,
                        name: string },
             provider: { id: natural,
                         name: string },
             state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
             word_count: natural,
             segment_count: natural,
             reason: enum (bad_tmx, unsupported_tmx, bad_langs),
             message: string }] }
Errors:
400 bad_zip
The zip file could not be unpacked to produce a TMX file.
400 bad_tmx
The TMX file is invalid.
400 unsupported_tmx
Some features of the TMX file are unsupported by the Service; for example, the TMX file has more than two langs.
400 bad_langs
The lang codes in the TMX file do not match the source_lang and target_lang parameters.
400 duplicate
The TMX file has already been uploaded.

Create a new upload from the given zipped TMX file, having the given attrs. (The TMX file must have a .tmx extension within the zip file.) By default, if there is not a problem, the upload will be in the unconfirmed state. However, setting the confirm parameter has the same effect as immediately calling POST /upload/<id>?action=confirm. Setting the approve parameter has the same effect as calling POST /upload/<id>?action=approve as soon as the upload reaches the ready state.

Normally, TAUS Data will notify the user who initiated the upload when it is ready, or if there is any error. Set the notify to disable this behavior.

Some users may not have permission to upload. See GET /user.

There are many things that can go wrong with an upload, as reflected by the possible error results. Note that this call checks only the beginning of the file, so even if it is successful, there could be problems later on. Currently, an upload created with the Service will never go into the incomplete state.

Note that the result schema allows multiple uploads to be created. Currently, you will never get more than one, but this may change in the future.

GET /upload/<id> direct link

Id Type:
natural
Result:
{ upload: { id: natural,
            user: { id: natural,
                    name: string },
            creation_time: time,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Get information about an upload.

POST /upload/<id>?action=approve direct link

Id Type:
natural
Result:
{ upload: { id: natural,
            user: { id: natural,
                    name: string },
            creation_time: time,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Approve an upload. The upload must be in the ready state.

POST /upload/<id>?action=confirm direct link

Id Type:
natural
Result:
{ upload: { id: natural,
            user: { id: natural,
                    name: string },
            creation_time: time,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Confirm an upload. The upload must be in the unconfirmed state.

POST /upload/<id>?action=cancel direct link

Id Type:
natural
Result:
{ upload: { id: natural,
            user: { id: natural,
                    name: string },
            creation_time: time,
            source_lang: { id: lang,
                           name: string },
            target_lang: { id: lang,
                           name: string },
            industry: { id: natural,
                        name: string },
            content_type: { id: natural,
                            name: string },
            owner: { id: natural,
                     name: string },
            product: { id: natural,
                       name: string },
            provider: { id: natural,
                        name: string },
            state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
            word_count: natural,
            segment_count: natural,
            reason: enum (bad_tmx, unsupported_tmx, bad_langs),
            message: string } }
Cancel an upload. The upload must be in the unconfirmed state or the ready state.

GET /upload_terms.txt direct link

Get the Data Upload and Download Conditions as UTF-8 text. Example:GET .../upload_terms.txt.