Arjen Poutsma

ajwp@xs4all.nl


Publications

Language Identification Applying Monte Carlo Techniques to Language Identification
A paper presented on CLIN 2001. In this paper, we introduce a new language identification technique that is based on Monte Carlo sampling. We show that, by determining the language of a large enough number of random features, we can determine the document language to be the language which result most often from these features. Whether the amount of samples is sufficiently large can be determined by calculating the standard error of the samples. Finally, we discuss some pilot experiments where we compare this new technique with others.
Data-Oriented Translation Data-Oriented Translation
A paper presented on COLING 2000.
Data-Oriented Translation : Using the Data-Oriented Parsing framework for Machine Translation
The final version of my master thesis.
Data-Oriented Translation
A first paper that studies the Data-Oriented Translation method. This paper was the topic of a presentation held at the CLIN 1998 in Leuven.
DOT Implementation notes
The implementation written for the for the paper mentioned above was programmed in C++. It was documented with the fine utility doc++.
JavaDOT Implementation notes
Recently, a new implementation was made in Java. Though this language is not quite as fast as C++, there are numerous other advantages. Javadoc is one of them.
Examining the Cognitive Aspects of Human and Machine Translation
In this paper, we will examine human translation: what type of knowledge is required, and what stages of competence in translation can be found. Then, we will give a short introduction into MT, and we will we examine if and how MT systems correspond to the cognitive aspects of human translation. Finally, we take a certain MT system (The Data-Oriented Translation system), and explain that this system does adhere to some of the human cognitive aspects of translation.