NooJ - outil pour la traduction automatique
Transcription
NooJ - outil pour la traduction automatique
A New Machine Translation System English to Portuguese Using NooJ Anabela Barreiro [email protected] Universidade do Porto & Linguateca New York University NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Presentation Outline 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Structure Introduction - Background Focus MT: Languages, Resources and Application - Functionalities and NooJ Potential Linguistic Challenges: Source Analysis and Cross-Language Phenomena NooJ Dictionaries - Source and Target Dictionaries - Lexical Verbs and Nominalizations + Support Verbs - Source and Target Dicitionaries: Examples NooJ Grammars - Structure: Metagraphs and Subgraphs - Transfer: Examples NooJ Translation Capabilities - Testing 3 Linguistic Issues Word Order - Adjective and Noun in NP Language Specific Representations - Format Differences - Dates Support Verb Constructions - Definition - Transfer Maintains Support Verb Lexical Verb as Paraphrase - Paraphrases: Lexical Verbs: Examples Conclusions and Prospective - NooJ Potential and Future Features Comments... - and Questions? NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Introduction Background Relevant Background PhD dissertation topic Bilingual Paraphrases for Machine Translation Previous professional activity Logos (commercial Machine Translation) development of the English-Portuguese language pair collaboration in the English-Spanish and English-Italian language pairs NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Focus MT Languages, Resources and Application Language Pairs English-Portuguese (mainly) and Portuguese-English Romance languages possibly other Resources Conversion of OpenLogos resources Multilingual dictionary of +100,000 cannonical forms (i.e., non-inflected); inflectional paradigms; semantic information, conversion of some syntactic-semantic rules. NOMLEX dictionary of +5,000 nominalizations and dictionary of +300 support verbs (Proteus Project NYU) Application Analysis and formalization in NooJ NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Focus MT Functionalities and NooJ Potential New Functionalities Not many quantitative results yet in terms of MT, but many resources conversion in progress Interesting Linguistic Aspects of Translation Some good qualitative results and potential for further short-term development NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Linguistic Challenges Source Analysis and Cross-Language Phenomena Even simple sentences deal with issues like: 1- Source analysis PoS ambiguity and homography 2- Cross-language phenomena (source and target related) analysis and generation transfer NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Linguistic Challenges Cross-Language Phenomena + Agreement + Tense and aspect + Passive and active Lunches are paid to all employees > Pagam-se almoços a todos os empregados N1 Aux V(Pass) Prep(to) N2 > V(Refl) N1 Prep(a) N2 + Word order This is a red table > Isto é uma mesa vermelha V Det Adj N > V Det N Adj I gave John a book > Eu dei um livro ao João V N1 Det N2 > V Det N2 Prep(a) Det N1 + Different standards for representation of dates + Support verb constructions and paraphrases NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 NooJ Dictionaries Source and Target Dictionaries Source dictionary: Regular NooJ dictionary with link to the target language (additional fields corresponding to the translation) Lemma, POS, Inflectional and derivational information (FLX shows how to inflect; DRV shows how to derive) It can be bilingual or multilingual Target dictionary: Like any NooJ monolingual dictionary but easily convertable into a source bilingual dictionary NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 NooJ Dictionaries Lexical Verbs and Nominalizations + Support Verbs NOMLEX Nominalizations +5.000 Support Verbs +300 NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 NooJ Dictionaries Source and Target Dictionaries: Examples New field With this field added, it becomes also source NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 NooJ Grammars Structure: Metagraphs and Subgraphs Metagraph for Verb Tenses Metagraph Structure Subgraph for Translation of Present Perfect NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 NooJ Grammars Transfer - Examples EN should make decisions > PT devia decidir PT homem alto > EN tal man NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 NooJ Translation Capabilities Testing 3 Linguistic Issues NooJ translation capability for linguistic aspects such as: Word order differences, among others Ex: homem alto > tall man N+A > A+N Different representation of dates Ex: Monday, September the 11th > segunda-feira, 11 de Setembro Support verb constructions and paraphrases Ex: make a visit = visit > fazer uma visita = visitar NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Word Order Adjective and Noun in NP NP Word Order Grammar PT Source Dictionary Test Sentences Concordance PT-EN Translation NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Language Specific Representations Format Differences Languages use different ways of representing: Dates Numerals Addresses Etc. NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Language Specific Representations Dates Test Sentences Concordance Local Grammar Source Bilingual Dictionary NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Support Verb Constructions Definition Within the scope of my dissertation, a Support Verb Construction is a phrase which is made up of a Support Verb and a Nominalization, where: 1. the Support Verb is a verb which takes a nominalization as an argument (subj, obj, etc.) and which meets at least one of the following criteria: a) is semantically empty contributing little meaning to the sentence, other than tense, number and person (ex: make in make an arrangement or and take in take a seat) b) shares one or more arguments with the nominalization (ex: in John pays Mary a visit, John is the subject of both pay and visit and Mary is the IO of pay and the DO of visit. . 2 the Nominalization carries the meaning of the phrase NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Support Verb Constructions Transfer Maintains Support Verb Simple Local Grammar Corpora Text Concordance identification and translation EN Source Dictionary NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Support Verb Constructions Corpora Text Lexical Verb as Paraphrase Simple Local Grammar Concordance identification and translation NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Support Verb Constructions Paraphrases Lexical Verbs: Examples Simple Local Grammar EN Source Dictionary Metagraph NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela Barreiro Paris, September 29th, 2006 Support Verb Constructions Concordance Paraphrases Lexical Verbs: Examples Identification and Translation Corpora for Testing Local Grammar EN-PT Bilingual Dictionary Conclusions and Prospective NooJ Potential and Future Features NooJ has great potential for Machine Translation Easy to use, flexible and versatile Easy to optimize Useful future features Further optimization and new functionalities Automatic propagation of features like agreement Ability to create simpler, easier-to-build grammars by specification of lemma, for instance Development of an MT module for EN-Romance languages and vice-versa (starting with bilingual paraphrases) NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006 Feedback... And Questions? NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ? Anabela- Barreiro Paris,Belgrade, September 29th, Anabela Barreiro 9th INTEX/NooJ Conference June 1-3,2006 2006