NooJ - outil pour la traduction automatique

Transcription

NooJ - outil pour la traduction automatique
A New Machine Translation System
English to Portuguese Using NooJ
Anabela Barreiro
[email protected]
Universidade do Porto & Linguateca
New York University
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Presentation Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Structure
Introduction - Background
Focus MT: Languages, Resources and Application - Functionalities and NooJ
Potential
Linguistic Challenges: Source Analysis and Cross-Language Phenomena
NooJ Dictionaries - Source and Target Dictionaries - Lexical Verbs and
Nominalizations + Support Verbs - Source and Target Dicitionaries: Examples
NooJ Grammars - Structure: Metagraphs and Subgraphs - Transfer: Examples
NooJ Translation Capabilities - Testing 3 Linguistic Issues
Word Order - Adjective and Noun in NP
Language Specific Representations - Format Differences - Dates
Support Verb Constructions - Definition - Transfer Maintains Support Verb Lexical Verb as Paraphrase - Paraphrases: Lexical Verbs: Examples
Conclusions and Prospective - NooJ Potential and Future Features
Comments... - and Questions?
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Introduction
Background
Relevant Background
PhD dissertation topic
Bilingual Paraphrases for Machine Translation
Previous professional activity
Logos (commercial Machine Translation)
development of the English-Portuguese language pair
collaboration in the English-Spanish and English-Italian language pairs
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Focus MT
Languages, Resources and Application
Language Pairs
English-Portuguese (mainly) and Portuguese-English
Romance languages
possibly other
Resources
Conversion of OpenLogos resources
Multilingual dictionary of +100,000 cannonical forms (i.e., non-inflected);
inflectional paradigms; semantic information, conversion of some
syntactic-semantic rules.
NOMLEX dictionary of +5,000 nominalizations and dictionary of +300
support verbs (Proteus Project NYU)
Application
Analysis and formalization in NooJ
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Focus MT
Functionalities and NooJ Potential
New Functionalities
Not many quantitative results yet in terms of MT, but many resources
conversion in progress
Interesting Linguistic Aspects of Translation
Some good qualitative results and potential for further short-term
development
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Linguistic Challenges
Source Analysis and
Cross-Language Phenomena
Even simple sentences deal with issues like:
1- Source analysis
PoS ambiguity and homography
2- Cross-language phenomena (source and target related)
analysis and generation
transfer
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Linguistic Challenges
Cross-Language Phenomena
+ Agreement
+ Tense and aspect
+ Passive and active
Lunches are paid to all employees
> Pagam-se almoços a todos os empregados
N1 Aux V(Pass) Prep(to) N2 > V(Refl) N1 Prep(a) N2
+ Word order
This is a red table > Isto é uma mesa vermelha
V Det Adj N > V Det N Adj
I gave John a book > Eu dei um livro ao João
V N1 Det N2 > V Det N2 Prep(a) Det N1
+ Different standards for representation of dates
+ Support verb constructions and paraphrases
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
NooJ Dictionaries
Source and Target Dictionaries
Source dictionary:
Regular NooJ dictionary with link to the target language (additional fields
corresponding to the translation)
Lemma, POS, Inflectional and derivational information
(FLX shows how to inflect; DRV shows how to derive)
It can be bilingual or multilingual
Target dictionary:
Like any NooJ monolingual dictionary but easily convertable into a
source bilingual dictionary
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
NooJ Dictionaries
Lexical Verbs and Nominalizations +
Support Verbs
NOMLEX
Nominalizations +5.000
Support Verbs +300
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
NooJ Dictionaries
Source and Target Dictionaries: Examples
New field
With this field added,
it becomes also source
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
NooJ Grammars
Structure: Metagraphs and Subgraphs
Metagraph for Verb Tenses
Metagraph Structure
Subgraph for Translation of
Present Perfect
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
NooJ Grammars
Transfer - Examples
EN should make decisions > PT devia decidir
PT homem alto > EN tal
man
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
NooJ Translation Capabilities
Testing 3 Linguistic Issues
NooJ translation capability for linguistic aspects such as:
Word order differences, among others
Ex: homem alto > tall man
N+A
> A+N
Different representation of dates
Ex: Monday, September the 11th > segunda-feira, 11 de Setembro
Support verb constructions and paraphrases
Ex: make a visit = visit > fazer uma visita = visitar
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Word Order
Adjective and Noun in NP
NP Word Order Grammar
PT Source Dictionary
Test Sentences
Concordance
PT-EN Translation
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Language Specific Representations
Format Differences
Languages use different ways of representing:
Dates
Numerals
Addresses
Etc.
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Language Specific Representations
Dates
Test Sentences
Concordance
Local Grammar
Source Bilingual Dictionary
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Support Verb Constructions
Definition
Within the scope of my dissertation, a Support Verb Construction is
a phrase which is made up of a Support Verb and a Nominalization,
where:
1.
the Support Verb is a verb which takes a nominalization as an argument
(subj, obj, etc.) and which meets at least one of the following criteria:
a) is semantically empty contributing little meaning to the sentence, other
than tense, number and person (ex: make in make an arrangement or and take
in take a seat)
b) shares one or more arguments with the nominalization
(ex: in John pays Mary a visit, John is the subject of both pay and visit and
Mary is the IO of pay and the DO of visit.
.
2
the Nominalization carries the meaning of the phrase
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Support Verb Constructions
Transfer Maintains Support Verb
Simple Local Grammar
Corpora Text
Concordance
identification and translation
EN Source Dictionary
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Support Verb Constructions
Corpora Text
Lexical Verb as Paraphrase
Simple Local Grammar
Concordance
identification and translation
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Support Verb Constructions
Paraphrases
Lexical Verbs: Examples
Simple Local Grammar
EN Source Dictionary
Metagraph
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela Barreiro
Paris, September 29th, 2006
Support Verb Constructions
Concordance
Paraphrases
Lexical Verbs: Examples
Identification and Translation
Corpora for Testing
Local Grammar
EN-PT Bilingual Dictionary
Conclusions and Prospective
NooJ Potential and Future Features
NooJ has great potential for Machine Translation
Easy to use, flexible and versatile
Easy to optimize
Useful future features
Further optimization and new functionalities
Automatic propagation of features like agreement
Ability to create simpler, easier-to-build grammars by
specification of lemma, for instance
Development of an MT module for EN-Romance languages
and vice-versa (starting with bilingual paraphrases)
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006
Feedback...
And Questions?
NooJ: outil pour la traduction automatique; quelles fonctionnalités développer ?
Anabela- Barreiro
Paris,Belgrade,
September
29th,
Anabela Barreiro
9th INTEX/NooJ Conference
June
1-3,2006
2006