The TAUS Data Association (TDA) Web Service allows programs to use many features of TDA, including:
The Service has a "Representational State Transfer", or REST, interface. Parameters as passed as in HTML forms. Responses are returned as either JSON or XML documents, at your choice.
It is assumed that the reader is familiar with web programming. Experience with REST services is helpful but not required. We have tried to keep the Service simple so that you can get up and running quickly, using your platform and tools of choice. No programming language bindings are provided at this time.
You can begin developing with the TDA Web Service using just this document. However, we encourage you to contact TDA and tell us about your application, so that we can provide you with:
In a REST web service, requests are made over HTTP (or HTTPS), and the
resource to access is given in the URL. In our example, the resource we
wish to access is a listing of locales (termed "langs" in the Service) in TDA. This
resource is named /lang. To simply retrieve information
about the resource, we use the GET HTTP method. To ask
for the response in JSON format, we can add the .json
extension to the resource. No additional
parameters are required, but to limit the listing to
langs for a particular language, we can add the language parameter in the query string.
The full request is:GET .../lang.json?language=en
Click on the link to try it! Enter your TDA username and password when
prompted. (If you don't have a TDA account, go get one, it's free!)
The response looks like:
HTTP/1.1 200 success
Content-type: application/json; encoding=UTF-8
{ "status": 200,
"reason": "success",
"lang": [
{ "id": "en-AU", "name": "English (Australia)" },
{ "id": "en-CA", "name": "English (Canada)" },
{ "id": "en-GB", "name": "English (United Kingdom)" },
{ "id": "en-US", "name": "English (United States)" }
]
}
(Response documents are reformatted for readability.)
Or, for the response in XML, the request is:GET .../lang.xml?language=en
The response looks like:
HTTP/1.1 200 success
Content-Type: text/xml; charset=UTF-8
<result>
<status>200</status>
<reason>success</reason>
<lang>
<name>English (Australia)</name>
<id>en-AU</id>
</lang>
<lang>
<name>English (Canada)</name>
<id>en-CA</id>
</lang>
<lang>
<name>English (United Kingdom)</name>
<id>en-GB</id>
</lang>
<lang>
<name>English (United States)</name>
<id>en-US</id>
</lang>
</result>
Or, for the response in human-friendly HTML, the request is:GET .../lang.html?language=en
If you want to understand the Service from the ground up, read this document in order. If you want to get a high-level of what you'll need to do, start with the use cases and refer to other sections as needed. Everyone will at some point want to read about access control, app keys, and the terms of use.
The remaining sections are:
All requests are made over HTTP (or HTTPS) and follow common REST conventions.
The base URL for the Service is http://www.tausdata.org/api
or https://www.tausdata.org/api. The secure version is
recommended, as passwords and other private data may
be exchanged. If you request access to a development sandbox, you will get a different URL for your
sandbox. Following the base URL is the resource name, as given in
the call documentation. For most calls, you must add an extension
giving the requested response format: .json, .xml, or .html. Resource names and extensions
are case-sensitive.
The Service uses two HTTP methods, GET and POST. GET
is used to access resources without modifying them. POST is used to
create new resources and modify existing ones. The
documented method must be used for every call.
Calls take named parameters, where keys
and values are Unicode strings. For GET requests, the parameters must
be query-string encoded in the URL. Since there is no means to specify
the character encoding of the query string, it is always taken to be in
UTF-8. For POST requests, the parameters may either be
query-string encoded in the URL, or sent in the body with content type application/x-www-form-urlencoded or multipart/form-data. You
may specify the character encoding for
these requests, however UTF-8 is recommended. You may put some
parameters in the query string and others in the body (though the same
parameter should not appear in both places). A parameter
may given multiple times only where
specifically documented. Parameter names are case-sensitive.
Some special parameters are used for access control.
Unknown parameters in the request are ignored by default.
However, there will be a message about them in the X-TDA-Warning header. You may request that they be
considered errors by setting the X-TDA-Strict-Parameters request header to true.
If you cannot avoid passing extra parameters, you can avoid the risk that they will have meaning in a future version of the Service by giving them names beginning with an underscore ("_"), which will never be used by the Service.
The HTTP status line contains a numeric status code and a string reason code. You may use the reason to discriminate the status beyond the code. For convenience, the status code and reason are both duplicated in the response body.
Status codes are classed by their numeric range. Codes 200-299 reflect success. For other codes, an explanatory message (in English) is given in the body. Codes 400-499 reflect invalid requests. The message may help you find and fix the error. Codes 500-599 reflect problems that are not your fault. Codes in other ranges are not used by the Service.
A common set of codes and reasons is given here. Others are documented for specific calls.
200 success201 createdLocation header points to the primary created resource. The
response body contains the successful result of the call, and
should describe all created resources.400 invalid_params400 no_app_key400 bad_app_key401 no_credentials401 bad_username_password401 bad_auth_key403 permission_denied404 no_such_resource405 method_not_allowed409 state_error500 system_error503 down_for_maintenance503 not_readyInformation is not generally returned in HTTP headers. The Content-type header is always set, and you may check to see
that the body has the expected content type. (In extreme error cases, the
Service may not be able to return a body of the requested type.) The Location header is sent along with a status 201 response,
pointing to a created resource; however more complete information is
available in the result.
A few calls return special document types, eg. zipped TMX files. Those calls are termed "unusual", and their response formats are specified in the call documentation.
Other, "usual," results are structured as values built from lists, maps (aka. dictionaries, objects, associative arrays) from keys (aka. properties, fields) to values, and the basic types. All lists contain elements of only one type. Note there are no "null" or undefined values.
At the top level, every usual result is a map with at least two keys, status (type natural) and reason (type string), whose values are the HTTP status code and reason string. Most calls add additional keys. Error respones add a message (type string) key explaining what went wrong.
Results come in one of three formats: "JavaScript Object
Notation" (JSON), XML, or HTML. For all formats, the character encoding
is UTF-8. The HTML format is a human-readable
rendering for testing purposes, and is not further described. The JSON
format follows the result structure directly, with maps represented by
JSON objects, lists by JSON arrays, and basic values by JSON strings,
numbers, and the boolean literals true and false. In the XML
format, a map is represented by a series of XML elements whose names are
the keys of the map. If the value for a key is a basic value or another
map, there
is a single XML element for the key, containing the representation of
the value; if the value for a key is a list, there is an XML element for
every element of the list, containing the representation of the list
element. Basic values are represented as character data. The whole
thing is wrapped in a top-level <result> element.
Note that not all possible
results described above can be represented in XML this way, eg. a list
within a list. However, only results that can be represented in the XML
format (as well-formed XML) will be produced. The exact representation
of basic types in either format is given below.
There only a few basic types. Each has a string representation, used for parameters and XML results, and a JSON representation.
true" or "false". JSON representation is
one of the JSON literals true or false.There is another type, file, that is only used for
parameters. The value for this parameter is expected to be the contents
of a (possibly large) file. file parameters are only used
with POST requests, and when one is part of a request, the body must
use the multipart/form-data content type, and the part
containing the file must have a filename attribute in the Content-Disposition
header.
Access control is based upon TDA user accounts. The Service does not provide a way to register new users, but you may direct users to register themselves.
Most calls to the Service must contain valid authentication credentials. For convenience, flexibility, and security, the Service provides several methods of authentication. All are intended to be used with HTTPS to keep credentials private.
In this method, the username and password of a registered user are
passed with each request. The recommended form is HTTP basic
authentication. (Other HTTP authentication methods, such as digest,
are not supported.) In support of HTTP basic, the Service may return a
WWW-Authenticate header with value Basic
realm="TAUS Data Association Web Service" in response to an
unauthenticated request. However, since some platforms (in
particular, web browsers) will pop up an unwanted password entry form when
receiving this header, it is returned only when HTTP basic
authentications was tried, or no authentication method was tried, in the
request.
HTTP basic authentication is also handy for testing the Service directly in a web browser, as you may have found if you clicked the links in the introductory example.
If HTTP basic is not feasible, you may pass the username and password as auth_username and auth_password parameters.
In this method, you first make a POST /auth_key?action=login request that is authenticated by
username and password. An "auth key" is returned that can be
used (without the username and password) to authenticate
future requests. To use the auth key, pass it as the X-TDA-Auth-Key request header or, if that is not feasible, the auth_auth_key parameter.
Currently, this is only marginally more secure than username and
password authentication. The application must possess the
username and password and send it with the initial POST /auth_key; and the auth key never expires.
However, it's use is consistent with the connect method.
The connect method enables an application to act on behalf of a user,
without the user disclosing their username and password to the
application. You first make a POST /auth_key?action=connect request. This request requires no authentication, but it must
contain an app key. An "auth key" is returned that can be
used to authenticate future requests as above, but first it must be
activated. The application should send the user to the manage_url field of the result, where they can activate
the auth key. After the user activates the auth key, they will be sent
to the redirect_url given in the request.
Many calls may be performed by any authenticated user; some, however,
are restricted. To determine the user's permissions, call GET /user with the self parameter set.
There is no registration requirement to use the Service, beyond a user
account at www.tausdata.org. However, you
are encouraged to register your application and get an app key.
(For now, please contact TDA for an app key.) An app key identifies your
application to TDA, and helps us diagnose any problems you may have.
An app key also enables you to use the connect authentication method.
There may be additional benefits in the future,
such as logs and statistics of the calls made by your app.
To use your app key, pass it as the X-TDA-App-Key request header or, if that is not feasible, the auth_app_key parameter.
An organization is the most important unit of identity in the Service. An organization may or may not be a TDA member, and may really represent a single individual. Every user account belongs to an organization.
What we refer to as a "lang" and use throughout the Service is actually a
language-region pair, often called a locale.
For example fr-CA for Canadian
French. When referring to the language only (eg. French), we always
call it a "language" to distinguish it from a "lang".
Languages are represented as ISO 639-1 two letter language codes.
Regions are represented by ISO 3166-1 alpha-2 two letter country
codes (or in a few cases, by non-standard codes such
as XL for Latin America).
Langs are represented as a language, followed by a dash ("-"),
followed by a region. Langs, languages, and regions are
case-insensitive.
Not all possible langs are supported by TDA. The current list is:
| TDA Langs | |
|---|---|
| ar-ae | Arabic (U.A.E.) |
| ar-ar | Arabic |
| ar-eg | Arabic (Egypt) |
| ar-sa | Arabic (Saudi Arabia) |
| be-by | Belarusian |
| bg-bg | Bulgarian |
| cs-cz | Czech |
| cy-gb | Welsh |
| da-dk | Danish |
| de-de | German (Germany) |
| el-gr | Greek |
| en-au | English (Australia) |
| en-ca | English (Canada) |
| en-gb | English (United Kingdom) |
| en-us | English (United States) |
| es-em | Spanish (International) |
| es-es | Spanish (Spain) |
| es-mx | Spanish (Mexico) |
| es-xl | Spanish (Latin America) |
| et-ee | Estonian |
| eu-es | Basque |
| fa-ir | Farsi |
| fi-fi | Finnish |
| fr-be | French (Belgium) |
| fr-ca | French (Canada) |
| fr-fr | French (France) |
| he-il | Hebrew (Israel) |
| hr-hr | Croatian |
| ht-ht | Haitian |
| hu-hu | Hungarian |
| id-id | Indonesian |
| is-is | Icelandic |
| it-it | Italian (Italy) |
| ja-jp | Japanese |
| ko-kr | Korean |
| lt-lt | Lithuanian |
| lv-lv | Latvian |
| mt-mt | Maltese |
| nb-no | Norwegian (Bokmal) |
| nl-be | Dutch (Belgium) |
| nl-nl | Dutch (Netherlands) |
| nn-no | Norwegian (Nynorsk) |
| no-no | Norwegian |
| pl-pl | Polish |
| pt-br | Portuguese (Brazil) |
| pt-pt | Portuguese (Portugal) |
| ro-ro | Romanian |
| ru-ru | Russian |
| sk-sk | Slovak |
| sl-si | Slovene |
| sv-se | Swedish |
| th-th | Thai |
| tr-tr | Turkish |
| uk-ua | Ukranian |
| vi-vn | Vietnamese |
| zh-cn | Chinese (PRC) |
| zh-hk | Chinese (Hong Kong) |
| zh-tw | Chinese (Taiwan) |
Every TM in TDA has several attributes, or "attrs". They are: industry, content_type, owner, product, provider. This list is not expected to change. Attrs have numerical values,
and each value is associated with a name for display.
The values for provider and owner are organizations. All of the attrs for a TM are chosen by the uploader, except for provider which is the organization of the uploader. The uploader can specify as the owner one of the organizations on whose behalf they are permitted to upload. The uploader can specify as the product one of the products belonging to the owner.
Attr values will never be reused, ie, a given attr value will always refer to the same entity. The list of values for the industry and content_type attrs will change rarely. The current lists are:
| TDA Industries | |
|---|---|
| 1 | Automotive Manufacturing |
| 2 | Consumer Electronics |
| 3 | Computer Software |
| 4 | Computer Hardware |
| 5 | Industrial Manufacturing |
| 6 | Telecommunications |
| 7 | Professional and Business Services |
| 8 | Stores and Retail Distribution |
| 9 | Industrial Electronics |
| 10 | Legal Services |
| 11 | Energy, Water and Utilities |
| 12 | Financials |
| 13 | Medical Equipment and Supplies |
| 14 | Healthcare |
| 15 | Pharmaceuticals and Biotechnology |
| 16 | Chemicals |
| 17 | Undefined Sector |
| 18 | Leisure, Tourism, and Arts |
| TDA Content Types | |
|---|---|
| 1 | Instructions for Use |
| 2 | Sales and Marketing Material |
| 4 | Policies, Process and Procedures |
| 5 | Software Strings and Documentation |
| 6 | Undefined Content Type |
| 7 | News Announcements, Reports and Research |
| 8 | Patents |
| 9 | Standards, Statutes and Regulations |
| 10 | Financial Documentation |
| 12 | Support Content |
The Service provides calls for listing langs and attr values. The GET /lang call lists langs, and the GET /attr/<attr> family of calls lists attrs, for example GET /attr/industry lists industries. All of these calls are similar: they take as
parameters other langs and attrs that may limit the results, and return
a list of objects of the type requested. Depending on the purpose for
which you are listing them, you may want different listings. For
example, if you want to download data, you probably only want to see
combinations of langs and attrs for which data is available. The
different ways you can list langs and attrs, and what you get back, are
described here.
The "purpose" of the listing is given by the for
parameter. When for is not given, the default is to list
every value in the system. For
example, GET .../attr/industry.json lists all
industries. However, for privacy reasons, you may not list all owners,
products, or providers.
When the for parameter is upload, all
combinations of attrs that could be applied to an upload by this member
are listed:
When the for parameter is download, only
langs and attrs for which there is data to download are listed. When
listing langs for download, you must set the side
parameter to either source to list langs for which there
is data with that source lang, or target to list langs for
which there is data with that target lang. When listing langs, you may
also set the source_lang or target_lang
parameter (whichever is the opposite of the side
parameter) to list only langs for which there is data with that
source or target lang. For example, to list all langs for
which there is data with that lang as the target lang and en-US as the source lang, GET .../lang.json?side=target&source_lang=en-US. When listing attrs for download, the lang and other attr parameters
limit the listing to values for which there is data with all of those
langs and attrs. For example, to list all owners for which there is
data with that owner, en-US as the source lang fr-FR as the target lang, and 3 as the
industry, GET .../attr/owner.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3.
When the for parameter is segment, only
langs and attrs for which there may be search results are listed. This
works exactly like listing for download, but the results exclude
combinations that have not been indexed for search or are not supported
for search.
Some calls return English language messages and names. Messages are usually diagnostics, such as the contents of the message key that comes with error responses. Names are the human-readable designations for langs, attrs, and other objects. There is no way to request messages and names in other languages or to translate them. Pedantically, messages do not begin with a capital or end with punctation.
All keywords are singular, even when they would be more natural as plural, in order to avoid dealing with grammatical irregularities.
The following sections walk through typical use cases.
In this use case, you wish to search for segments containing a given term. Please note the Service provides only a simplified version of TAUS Search. Word attributes (lemma and part of speech) are not available, nor are computed translations.
The first step is to identify the langs and
attributes of the data you wish to search. Source and target langs
are required, and all other attrs are optional. If your application
already knows the langs and attrs of the data to search, it can use
them. Otherwise, you should request listings of the langs and attrs with the for parameter set to segment, to get langs and attrs for which data is
searchable.
To search, call GET /segment with the term you wish to find, the langs, and the attr criteria. The
result is a list of
segments, with their attrs. The source and target text are available as
plain text strings. To report problems with segments, see POST /segment/<id>?action=report_problem.
In this use case, you wish to download TM data.
The first step is to identify the langs and
attributes of the data you wish to download. Source and target langs
are required, and all other attrs are optional. If your application
already knows the langs and attrs of the data to download, it can use
them. Otherwise, you should request listings of the langs and attrs with the for parameter set to download, to get langs and attrs for which data is
available.
There are several things you need to know about downloads. First, every download includes only data that has not already been downloaded by the same member. In order to get data that was already downloaded, you must re-request the old download. Second, some members have download limits and may be charged for excess downloads. If a download would exceed a hard limit, it is disallowed, but there is no way to tell if a member is over a soft limit and will be charged for a download. Download limits may vary depending on the source and target langs of the download. Third, there is no way to request only some of the words meeting the given criteria. To limit the number of words, you must refine the criteria. Fourth, even data that was provided by the member creating the download counts towards their download limits if it is included in the download. There is an option to exclude data provided by the same member requesting the download.
You may wish to check how much data is available, and whether it is
within the member's limit, before creating a download..
For this, call GET /counts with the
chosen langs and attrs. For example, to get counts for data with en-US as the source lang fr-FR as the target lang, and 3 as the
industry, GET .../counts.json?for=download&source_lang=en-US&target_lang=fr-FR&industry=3. The result has several useful fields. word_count is
the total number of words meeting the given criteria. new_word_count and new_segment_count are the
number of words and segments that have not been downloaded by the member
(and would be included in a new download); old_segment_count is the number of words that have already
been downloaded by the member. If the member has a download limit for
this source and target lang, it
is given by limit. Finally, if the download would be
allowed according to the member's limits, within_limit is
true; otherwise it is false.
When you are ready to download, make a POST /download?action=create request with the desired criteria. This
fixes the exact set of data to be downloaded. You may examine the
result to make
sure the download has the amount of data expected (in case it has changed
since your last call to /counts), or perhaps confirm it
with the user. Then you must make a POST /download/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the download.
If a download is never confirmed, it does not count
towards the member's limit.
Once the download is confirmed, you should make polling GET /download/<id> requests until the
download is in the ready state. At that point, you may
retrieve the download as a zipped TMX file by calling GET /download/<id>.zip. Alternately, you may simply make GET /download/<id>.zip requests, and if the file is not yet ready, the response will have a 503 not_ready status code. Please use a poll interval of at
least one minute.
In this use case, you wish to upload a TM to TDA. The first step is to
choose the langs and attributes of the TM. Both source and target langs
and all attributes except provider and owner are required. provider will
be the organization performing the upload, and cannot be set. owner may be set to any of the organizations on whose
behalf the provider is permitted to upload; if not given, it defaults to
the provider. If your application already knows the langs and attrs of
the TM, it can use them. Otherwise, you should request listings of the langs and attrs with the for parameter set to upload.
(There is no way to add new owners and products using the Service. There is also no way to grant permission to another organization to upload on your behalf. Please contact TDA for help with this.
There are several things you need to know about uploads. First, every upload must be a zip archive containing a single TMX file. The TMX file must be valid TMX, and should contain data for the source and target langs only. Second, TDA filters duplicate data, so if you upload a TM and then re-upload it after adding new translations, only the new translations will be saved. On the other hand, it will waste network bandwidth and TDA resources, so please make a reasonable effort to upload only new data. Third, uploaded TMs are normally available for TDA members to download and for the public to search; however, there is an option to make a TM available only for search.
When you are ready to upload, make a POST /upload?action=create request with the zipped TMX file, along with the langs and attrs. A
successful response indicates that the file has passed the first round
of checks: the TMX file was extracted, the beginning of the file was
parsed, and the lang codes in the file match the source_lang and target_lang parameters.
Then you must make a POST /upload/<id>?action=confirm request. You may combine the two steps by setting the confirm parameter when creating the upload.
TDA will begin processing the file. To monitor progress, you
should make polling GET /upload/<id> requests until the upload leaves the processing state.
Please use a poll interval of at least one minute. If
there is a problem, the state will be error, and the reason and message fields will contain
information about the problem.
Otherwise, the state will be ready, and the user who
created the upload will get an email notification with more
details. Finally, you must make a POST /upload/<id>?action=approve request, perhaps after receiving positive approval from the user. You
can skip the approval step by setting the approve parameter when creating the upload. At this point, the upload will be
credited to the member's account and the data will be available for
other members to download (unless the upload was for search only).
There may be a delay before the data is available for search.
Although it will probably not be an issue, you may wish to know how TDA
interprets lang codes within the TMX file. We look for lang codes that
are "compatible" with the source and target langs given in POST /upload?action=create. The current definition of compatible is that the language prefixes
are the same (case-insensitive).
For example, if you upload a TM giving a target_lang of es-XL the TMX file may use the
code es-XL, es-AR or just es. However, the code must be be formatted either as a two-letter
language code; or a two-letter language code, followed by a dash ("-"),
followed by a two-letter region code. So Spanish would
not be recognized and the TMX file would not be accepted. Also, the
same lang code must be used throughout the TMX file; it may not start
with es-XL and switch to es.
TDA will relax these restrictions to accommodate real-world needs. We
will try never to reject lang codes that were previously accepted.
Note that while the TDA's Data Pooling web interface supports
automatic lang detection, this function is not available in the Service.
We expect that applications using the upload API already know the langs
of the TMs they are uploading. If this is not the case for your
application and you would benefit from lang detection, contact TDA.
Also, note that there is an upload state incomplete to
accommodate cases where the langs can't be detected; the source_lang and target_lang fields are optional in upload results for the same reason. You may
run across this case now, with uploads created using the Data Pooling
web interface; however there is nothing you can do with them using the
Service.
We recommend the JSON format, as it closely corresponds to the structure of the result and will require less decoding on your end. But JSON and XML are equally supported.
We recommend using HTTP basic authentication for desktop apps, and connect authentication for web apps (including AJAX apps).
Your users must have TDA user accounts in order for your application to authenticate with the Service. TDA user accounts are free to the public. You may direct users to http://www.tausdata.org/index.php/component/user/register to register for an account. To download data, users must have a TDA membership (or belong to a member organization). You may direct users to https://www.tausdata.org/index.php/members/join-tda for information on joining TDA.
When writing a user interface for selecting langs and attrs, we recommend modeling them on the UIs used at tausdata.org. This will provide users with a consistent experience. We recommend that every lang or attr constrain the listings following it on the screen (and only these).
If you are developing an AJAX application, cross-site scripting
restrictions in the browser will likely prevent you from calling the
Service directly. In this case, the best solution is to construct a
simple proxy that accepts calls on your web server, and forwards them to
www.tausdata.org/api. Further
information.
It is not possible to use the main engine of AJAX applications, XMLHttpRequest to perform file uploads, due to browser
security restrictions. (There is no way to send the file contents.)
It is necessary instead to arrange for the
browser to submit your request as a form. To prevent the result from
appearing to the user, the form submission is usually targeted to a
iframe that is not displayed. Accessing the result of the submission
is tricky, and many complicated schemes are employed by web developers.
However, we have found that a call to the Service can
be loaded right into an iframe and processed reliably. We describe the
method here. (Please contact TDA if you would like these secrets revealed!)
Every call to the Service must be authenticated with the account of the user who caused the request to be made. Do not use your own TDA account to authenticate calls made for other users. Contact TDA if you need an exception.
Users who do not belong to a TDA member must read and agree to the TM Sharing Conditions before they may upload TMs. If your applications allows users to upload TMs, it must enforce this requirement as well. Specifically, before submitting an upload, you must perform these steps:
GET /user with the self parameter set. The must_agree_to_upload_terms field of the result tells you
whether the user needs to agree to the TM Sharing Coditions.GET /upload_terms.txt. You
should request the TM Sharing Conditions every time you need to
display them, so that any changes will be reflected in your
application.We intend to support the Service as specified here indefinitely, with reasonable allowances for growth, including: Calls may be added. Parameters may be added to calls. In results, keys may be added to maps and values may be added to enums. Errors may be added. Various limits may be imposed for performance reasons.
The search algorithm in GET /segment may be
changed and the query language may be extended.
The heading of every call contains the method, the resource name,
and for POST calls the action parameter. The
method listed must be used; you cannot use GET for a call documented to
use POST. If the
resource has an extension, the call has an unusual result. Otherwise, you must choose the result format by
adding a .json, .xml, or .html extension. If the resource contains <id> an entity id is part of the resource name, for example GET .../lang/en-US.json for GET /lang/<id>. The
following information is then listed, as applicable:
<id> in the resource name.{ lang: [{ id: lang,
name: string }] }GET /statusGET .../status.json.GET /user{ user: [{ id: natural,
organization: { id: natural,
name: string },
can_search: boolean,
can_download: boolean,
can_upload: boolean,
must_agree_to_upload_terms: boolean }] }Get a listing of users. Currently, only the authorized user of the request is returned. The self parameter must be set to true to make this explicit (setting it to false is currently an error).
The can_* fields indicate whether the user is allowed to
perform those functions. Currently, can_search is always
true; can_download is true for TDA member users only,
except for those who have been explicitly denied download permission; and can_upload is always true, except for TDA member
users who have been explicitly denied upload permission.
See the terms of use for the must_agree_to_upload_terms field.
POST /auth_key?action=connect{ auth_key: { id: string,
manage_url: string } }POST /auth_key?action=login{ auth_key: { id: string,
manage_url: string } }GET /lang{ lang: [{ id: lang,
name: string }] }GET .../lang.json.GET /lang/<id>{ lang: { id: lang,
name: string } }GET .../lang/en-US.json.GET /attr{ attr: [string] }industry, content_type, owner, product, provider, and this list is not expected to change. Example: GET .../attr.json.GET /attr/industry{ industry: [{ id: natural,
name: string }] }GET /attr/content_type{ content_type: [{ id: natural,
name: string }] }GET /attr/owner{ owner: [{ id: natural,
name: string }] }GET /attr/product{ product: [{ id: natural,
name: string }] }GET /attr/provider{ provider: [{ id: natural,
name: string }] }GET /countsdownloadfalse{ word_count: natural,
new_word_count: natural,
new_segment_count: natural,
old_word_count: natural,
limit: natural,
within_limit: boolean }GET .../counts.json?source_lang=en-US&target_lang=fr-FR.
The result fields are
described in the download use case.GET /segment20{ segment: [{ id: string,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
source: string,
target: string,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string } }] }Search for segments matching the given query, meeting the given
criteria. The query q is a space-separated list
of words (even for languages that do not
normally separate words by space, such as zh-CN). Example: GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q=data+center.
Only segments containing the exact sequence of words in the query
(case-insensitive), with no intervening punctuation, will be returned. So
a search for "web service" will find segments with the word "web" followed
immediately by the word "service".
Punctuation in the query is not well-supported. If you wish to match
punctuation, separate the punctuation from words with space. For example,
to find "hello, world!", your query should be "hello , world !". Only segments with exactly this punctuation will be returned.
Queries with more than a few words may take a long time, and are not likely to return any results because only exact matching is supported. There is a limit of 10 words in a query.
The results will typically contain segments from a variety of data owners, industries, etc. Other than that, the segments returned are effectively random, and their order is not significant. However, the results for the same query will usually remain similar over time.
In the future, the search algorithm may be enhanced. To continue searching
for an exact sequence of words, surround them with double-quotes. Example: GET .../segment.json?source_lang=en-US&target_lang=fr-FR&q="data+center". Also, some characters that don't normally appear in words (eg. ":")
may have special meaning in the future.
The limit parameter is a hint for how many segments you want. The result may in fact contain more or fewer.
GET /segment/<id>{ segment: { id: string,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
source: string,
target: string,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string } } }POST /segment/<id>?action=report_problemGET /download{ download: [{ id: natural,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (unconfirmed, not_ready, ready),
word_count: natural,
segment_count: natural }] }POST /download?action=createfalsefalse{ word_count: natural,
segment_count: natural,
download: [{ id: natural,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (unconfirmed, not_ready, ready),
word_count: natural,
segment_count: natural }] }400 no_data400 over_limitCreate a download of data meeting the given criteria. The exclude_own parameter excludes data uploaded by this member
from the download. By default, the download will be in the unconfirmed state and not yet charged to the member. However,
setting the confirm parameter has the same effect as
immediately calling POST /download/<id>?action=confirm.
Only TDA members may download, and some users may not have
permission to download. See GET /user.
Note that the result schema allows multiple downloads to be created. Currently, you will never get more than one, but this may change in the future.
GET /download/<id>{ download: { id: natural,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (unconfirmed, not_ready, ready),
word_count: natural,
segment_count: natural } }GET /download/<id>.zipready state. If it is in the not_ready state, you
will get a 503 not_ready status, and you should try again in 1
minute or longer.POST /download/<id>?action=confirm{ download: { id: natural,
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (unconfirmed, not_ready, ready),
word_count: natural,
segment_count: natural } }400 over_limitunconfirmed
state. After confirmation, the download is be charged to the member. The
download may enter either the not_ready or ready
state.GET /upload{ upload: [{ id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string }] }POST /upload?action=createfalsefalsefalse{ upload: [{ id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string }] }400 bad_zip400 bad_tmx400 unsupported_tmx400 bad_langs400 duplicateCreate a new upload from the given file, having the given attrs. By
default, if there is not a problem, the upload will be in the unconfirmed state. However, setting the confirm parameter has the same effect as immediately calling POST /upload/<id>?action=confirm. Setting the approve parameter has the same effect as
calling POST /upload/<id>?action=approve as soon as the upload reaches the ready state.
Some users may not have permission to upload. See GET /user.
There are many things that can go wrong with an upload, as reflected by
the possible error results. Note that this call checks only the beginning of
the file, so even if it successful, there could be problem later on.
Currently, an upload created by the Service will never go into the incomplete state.
Note that the result schema allows multiple uploads to be created. Currently, you will never get more than one, but this may change in the future.
GET /upload/<id>{ upload: { id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string } }POST /upload/<id>?action=approve{ upload: { id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string } }ready state.POST /upload/<id>?action=confirm{ upload: { id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string } }unconfirmed
state.POST /upload/<id>?action=cancel{ upload: { id: natural,
source_lang: { id: lang,
name: string },
target_lang: { id: lang,
name: string },
industry: { id: natural,
name: string },
content_type: { id: natural,
name: string },
owner: { id: natural,
name: string },
product: { id: natural,
name: string },
provider: { id: natural,
name: string },
state: enum (incomplete, unconfirmed, processing, ready, cancelled, success, error),
word_count: natural,
segment_count: natural,
reason: enum (system_error, bad_tmx, unsupported_tmx, bad_langs),
message: string } }unconfirmed state or the ready state.GET /upload_terms.txtGET .../upload_terms.txt.