Matching Data as a Service
Matching Data is a clustered search technology applied to the TAUS Data Cloud repository and to web-crawled data. Matching Data uses an example data set and returns matches according to relevance on a segment level across files and domains.With this methodology developers of MT engines can create high fidelity data sets tuned to their own domains. This new approach is based on DatAptor, a joint research project between the University of Amsterdam, TAUS, Intel and EC DGT.
Here is how it works
Query corpus submission
User provides a query corpus and a profile of the data they are looking for (domain name, languages, domain description)
Based on a query corpus the best matching data in the TAUS Data Cloud is identified, on a segment-level basis
Data selections are created, with different matching rates (Compact, Medium, Large).
Selection review and choice
The user chooses the most fitting match rate(s) and languages
Payment and download
After the payment, the data is ready for download