???global.info.a_carregar???
José Souza is a Staff AI Research Scientist at Unbabel working on improving different aspects of the Unbabel translation pipeline. From 2016 to 2020 José was a research scientist at eBay in the CoreAI team, working on language generation, machine translation and quality estimation of language generation models. Prior to that, José was a PhD candidate at the Fondazione Bruno Kessler (FBK), and at the University of Trento, Italy, working on quality estimation of machine translation and automatic speech recognition systems. His PhD thesis received the Best Thesis Award granted by the European Association of Machine Translation in 2017. His research interests include natural language processing, machine learning, machine translation, language generation, and automatic estimation of the quality of NLP systems.
Identification

Personal identification

Full name
José Guilherme Camargo de Souza

Citation names

  • de Souza, José G. C.
  • Souza, José
  • Souza, J. G. C.

Author identifiers

Ciência ID
4B18-24FE-9097
ORCID iD
0000-0001-6344-7633

Email addresses

  • jose.souza@unbabel.com (Professional)

Knowledge fields

  • Exact Sciences - Computer and Information Sciences - Computer Sciences

Languages

Language Speaking Reading Writing Listening Peer-review
Portuguese (Mother tongue)
English Proficiency (C2) Proficiency (C2) Proficiency (C2) Proficiency (C2) Proficiency (C2)
Italian Advanced (C1) Advanced (C1) Intermediate (B1) Advanced (C1) Elementary (A2)
French Elementary (A2) Intermediate (B1) Elementary (A2) Upper intermediate (B2) Elementary (A2)
Education
Degree Classification
2011/10 - 2016/03
Concluded
Doctorate in Information and Communication Technology (Dottorato di Ricerca)
Major in Natural Language Processing, Machine Translation
Università degli Studi di Trento, Italy
2009 - 2011
Concluded
Natural Language Processing and Human Language Technology (Master)
University of Wolverhampton, United Kingdom
19 (numa escala de 20)
2009 - 2010
Concluded
Processamento de Linguagem Natural e Indústrias das Línguas (Mestrado)
Universidade do Algarve, Portugal
19 (numa escala de 20)
2002 - 2008
Concluded
Ciência da Computação (Bachelor)
Universidade do Vale do Rio dos Sinos, Brazil
Affiliation

Positions / Appointments

Category
Host institution
Employer
2016/05 - 2020/04 Research Scientist eBay Inc, United States

Others

Category
Host institution
Employer
2020/05 - Current Staff AI Research Scientist Unbabel LDA, Portugal
Projects

Contract

Designation Funders
2023 - Current Center for Responsible AI
Researcher
Unbabel LDA, Portugal
Agência para a Competitividade e Inovação IP
Ongoing
2022 - Current Unified Transcription and Translation for Extended Reality (UTTER)
Principal investigator
Ongoing
2020 - 2024 Multilingual AI Agent Assistants (MAIA)
Multilingual AI Agent Assistants (MAIA)
Researcher
Unbabel LDA, Portugal
Concluded
2022 - 2023 Quality-Aware Machine Translation (QUARTZ)
951847
Principal investigator
Unbabel LDA, Portugal
Concluded
2011 - 2016 Machine Translation Enhanced Computer Assisted Translation (MATECAT)
Fondazione Bruno Kessler, Italy
Concluded
Outputs

Publications

Conference paper
  1. Martins, Pedro Henrique; Alves, João; Vaz, T^ania; Gonçalves, Madalena; Silva, Beatriz; Buchicchio, Marianna; de Souza, José GC; Martins, André FT. "Empirical Assessment of kNN-MT for Real-World Translation Scenarios". Paper presented in Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023.
  2. Alves, Duarte; Guerreiro, Nuno; Alves, João; Pombal, José; Rei, Ricardo; de Souza, José; Colombo, Pierre; Martins, Andre. "Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning". Paper presented in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
    10.18653/v1/2023.findings-emnlp.744
  3. Rei, Ricardo; Guerreiro, Nuno M.; Pombal, José; van Stigt, Daan; Treviso, Marcos; Coheur, Luisa; C. de Souza, José G.; Martins, André. "Scaling up CometKiwi: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task". Paper presented in Proceedings of the Eighth Conference on Machine Translation, 2023.
    10.18653/v1/2023.wmt-1.73
  4. Alves, João; Martins, Pedro Henrique; De Souza, José GC; Farajian, M Amin; Martins, André FT. "Unbabel-IST at the WMT Chat Translation Shared Task". Paper presented in Proceedings of the Seventh Conference on Machine Translation (WMT), 2022.
  5. Rei, Ricardo; de Souza, José G. C. ; Alves, Duarte; Zerva, Chrysoula; Farinha, Cristiana; Taisiya Glushkova; Lavie, Alon; et al. Corresponding author: Rei, Ricardo. "COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task". Paper presented in Seventh Conference on Machine Translation (WMT), 2022.
    Published
  6. Barreiro, Anabela; de Souza, José GC; Gatt, Albert; Bhatt, Mehul; Lloret, Elena; Erdem, Aykut; Gkatzia, Dimitra; et al. "Multi3Generation: Multitask, Multilingual, Multimodal Language Generation". Paper presented in Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022.
  7. Rei, Ricardo; Farinha, Ana C; de Souza, José GC; Ramos, Pedro G; Martins, André FT; Coheur, Luisa; Lavie, Alon; et al. Corresponding author: Rei, Ricardo. "Searching for COMETINHO: The Little Metric That Could". Paper presented in Conference of the European Association for Machine Translation, 2022.
    Published
  8. de Souza, José GC; Rei, Ricardo; Farinha, Ana C; Moniz, Helena; Martins, André FT. "QUARTZ: Quality-Aware Machine Translation". Paper presented in Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022.
  9. Zerva, Chrysoula; Blain, Frédéric; Rei, Ricardo; Lertvittayakumjorn, Piyawat; de Souza, José GC de Souza; Eger, Steffen; Kanojia, Diptesh; et al. "Findings of the WMT 2022 Shared Task on Quality Estimation". 2022.
    10.3115/v1/p14-1067
  10. Alves, Duarte M; Rei, Ricardo; Farinha, Ana C; de Souza, José GC; Martins, André FT. "Robust MT evaluation with Sentence-level Multilingual Augmentation". Paper presented in Proceedings of the Seventh Conference on Machine Translation (WMT), 2022.
  11. Zerva, Chrysoula; van Stigt, Daan; Ricardo Rei; Ana Catarina Farinha; Pedro Ramos; de Souza, José G. C. ; Glushkova, Taisiya; et al. "IST-Unbabel 2021 Submission for the Quality Estimation Shared Task". Paper presented in Proceedings of the Sixth Conference on Machine Translation, 2021.
  12. Camargo de Souza, José G.; Kozielski, Michael; Mathur, Prashant; Chang, Ernie; Guerini, Marco; Negri, Matteo; Turchi, Marco; Matusov, Evgeny. "Generating E-Commerce Product Titles and Predicting their Quality". Paper presented in Proceedings of the 11th International Conference on Natural Language Generation, 2018.
    10.18653/v1/w18-6530
  13. Ueffing, Nicola; C. de Souza, José G.; Leusch, Gregor. "Quality Estimation for Automatically Generated Titles of eCommerce Browse Pages". Paper presented in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), 2018.
    10.18653/v1/n18-3007
  14. Chatterjee, Rajen; C. de Souza, José G.; Negri, Matteo; Turchi, Marco. "The FBK Participation in the WMT 2016 Automatic Post-editing Shared Task". Paper presented in Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 2016.
    10.18653/v1/w16-2377
  15. Jalalvand, Shahab; Negri, Matteo; Turchi, Marco; C. de Souza, José G.; Daniele, Falavigna; Qwaider, Mohammed R. H.. "TranscRater: a Tool for Automatic Speech Recognition Quality Estimation". Paper presented in Proceedings of ACL-2016 System Demonstrations, 2016.
    10.18653/v1/p16-4008
  16. Jalili Sabet, Masoud; Negri, Matteo; Turchi, Marco; C. de Souza, José G.; Federico, Marcello. "TMop: a Tool for Unsupervised Translation Memory Cleaning". Paper presented in Proceedings of ACL-2016 System Demonstrations, 2016.
    10.18653/v1/p16-4009
  17. Ataman, Duygu; C. De Souza, Jose G.; Turchi, Marco; Negri, Matteo. "FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual Semantic Similarity Measurement Using Quality Estimation Features and Compositional Bilingual Word Embeddings". Paper presented in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016.
    10.18653/v1/s16-1086
  18. C. de Souza, José G.; Zamani, Hamed; Negri, Matteo; Turchi, Marco; Daniele, Falavigna. "Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances". Paper presented in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015.
    10.3115/v1/n15-1073
  19. C. de Souza, José G.; Negri, Matteo; Ricci, Elisa; Turchi, Marco. "Online Multitask Learning for Machine Translation Quality Estimation". Paper presented in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
    10.3115/v1/p15-1022
  20. Zamani, Hamed; de Souza, José GC; Negri, Matteo; Turchi, Marco; Falavigna, Daniele. "Reference-free and confidence-independent binary quality estimation for automatic speech recognition". Paper presented in Proceedings of the second Italian conference on computational linguistics (CLiC-it), Trento, Italy, 2015.
  21. de Souza, José GC; Federico, Marcello; Sawaf, Hassan. "MT Quality Estimation for e-commerce data". Paper presented in Proceedings of Machine Translation Summit XV: User Track, 2015.
  22. Camargo de Souza, José Guilherme; González-Rubio, Jesús; Buck, Christian; Turchi, Marco; Negri, Matteo. "FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task". Paper presented in Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014.
    10.3115/v1/w14-3340
  23. de Souza, José G. C. ; Turchi, Marco; Negri, Matteo. "Towards a combination of online and multitask learning for MT quality estimation: a preliminary study". Paper presented in AMTA Workshop on interactive and adaptive machine translation, 2014.
  24. de Souza, José G. C. ; Turchi, Marco; Negri, Matteo. "Machine Translation Quality Estimation Across Domains". Paper presented in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 2014.
  25. Negri, Matteo; Turchi, Marco; de Souza, José G. C. ; Falavigna, Daniele. "Quality Estimation for Automatic Speech Recognition". Paper presented in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 2014.
  26. Mathur, Prashant; Mauro Cettolo; Federico, Marcello; de Souza, José G. C.. "Online multi-user adaptive statistical machine translation". 2014.
  27. De Souza, José GC; Turchi, Marco; Anastasopoulo, Antonios; Negri, Matteo. "Online and multitask learning for Machine Translation quality estimation in real-world scenarios". Paper presented in CLICIT 2014, 2014.
  28. Raphael Rubino; de Souza, José G. C. ; Jennifer Foster; Lucia Specia. "Topic Models for Translation Quality Estimation for Gisting Purposes". Paper presented in Proceedings of the XIV Machine Translation Summit, 2013.
  29. Lucia Specia; Kashif Shah; de Souza, José G. C. ; Trevor Cohn. "QuEst - A translation quality estimation framework". Paper presented in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2013.
  30. de Souza, José G. C. ; Christian Buck; Turchi, Marco; Negri, Matteo. "FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task". 2013.
  31. de Souza, José G. C. ; Miquel Esplà-Gomis; Marco Turchi; Negri, Matteo. "Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks". 2013.
  32. de Souza, José G. C. ; Negri, Matteo; Mehdad, Yashar. "FBK: Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity". 2012.
  33. "FBK: Cross-Lingual Textual Entailment Without Translation". 2012.
  34. de Souza, José Guilherme Camargo; Negri, Matteo; Mehdad, Yashar. "FBK: Combining Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity.". 2012.
  35. Mehdad, Yashar; de Souza, José G. C. ; Negri, Matteo; Alina Petrova. "FBK Participation in the RTE-7 Main Task". 2011.
  36. de Souza, José G. C. ; Orasan, Constantin. "Can Projected Chains in Parallel Corpora Help Coreference Resolution?". 2011.
    10.1007/978-3-642-25917-3\_6
  37. Vieira, Renata; Bick, Eckhard; Coelho, Jorge; Muller, Vinicius; Collovini, Sandra; Souza, Jose; Rino, Lucia. "Semantic tagging for resolution of indirect anaphora". 2009.
  38. de Souza, José G. C. ; Patrícia Nunes Gonçalves; Vieira, Renata. "Learning Coreference Resolution for Portuguese Texts". 2008.
    10.1007/978-3-540-85980-2\_16
Journal article
  1. Fernandes, Patrick; Madaan, Aman; Liu, Emmy; Farinhas, António; Martins, Pedro Henrique; Bertsch, Amanda; de Souza, José GC; et al. "Bridging the gap: A survey on integrating (human) feedback for natural language generation". Transactions of the Association for Computational Linguistics 2023 (2023):
    https://doi.org/10.1162/tacl_a_00626
Preprint
  1. Alves, Duarte M; Pombal, José; Guerreiro, Nuno M; Martins, Pedro H; Alves, João; Farajian, Amin; Peters, Ben; et al. "Tower: An Open Multilingual Large Language Model for Translation-Related Tasks". 2024.
  2. Farinhas, António; de Souza, José GC; Martins, André FT. "An Empirical Study of Translation Hypothesis Ensembling with Large Language Models". 2023.
  3. Rei, Ricardo; Treviso, Marcos; Guerreiro, Nuno M; Zerva, Chrysoula; Farinha, Ana C; Maroti, Christine; de Souza, José GC; et al. "CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task". 2022.
Thesis / Dissertation
  1. "Adaptive Quality Estimation for Machine Translation and Automatic Speech Recognition". 2016. http://eprints-phd.biblio.unitn.it/1805/.
Activities

Supervision

Thesis Title
Role
Degree Subject (Type)
Institution / Organization
2022 - 2023 Retrieval-based Adaptation For Machine Translation Applications Using Large Language Models And In-Context Learning
Co-supervisor
Engenharia Eletrotécnica e de Computadores (Master)
Universidade de Lisboa Instituto Superior Técnico, Portugal
2017 - 2017 Robust Language-independent Product Title Generation for E-commerce
Co-supervisor
University of Washington Department of Linguistics, United States

Conference scientific committee

Conference name Conference host
2022 - Current Conference of the European Association for Machine Translation
2016 - Current Conference on Empirical Methods in Natural Language Processing
2015 - Current Annual Conference of the North American Chapter of the Association for Computational Linguistics
2014 - Current Meeting of the Association for Computational Linguistics
Distinctions

Award

2022 Best Paper Award EAMT for "Searching for COMETINHO: The Little Metric That Could"
European Association for Machine Translation, Switzerland
2017 Best Thesis Award
European Association for Machine Translation (EAMT), Belgium