???global.info.a_carregar???
João L. M. Pereira holds a PhD from Instituto Superior Técnico, University of Lisbon (2023), and is certified for professor level positions in the Netherlands by the University of Amsterdam, the Basiskwalificatie Onderwijs (BKO) certification (2023). João is interested in Machine Learning (ML), Natural Language Processing (NLP), and Databases (DB), he made significance research in information extraction systems, acronym expansion, and user involvement in data cleaning. His noteworthy contributions include the publication of work on acronym expansion in the Proceedings of the Very Large Databases (Schimago Q1) in 2022, presented at the VLDB conference. Additional presentations on acronym disambiguation were presented in a workshop associated with the AAAI Conference on Artificial Intelligence. João's supervision led to the publication of two master's thesis works in workshops associated with the Conference on Empirical Methods in Natural Language Processing and the International Conference on Intelligent Data Engineering and Automated Learning. João has successfully coordinated four courses and participated as a Teaching Assistant for more than 10 different courses. He participated as supervisor for projects and Master thesis with companies and organizations such as Randstad, the Municipality of Amsterdam, and the fire department. Currently, as a researcher at the VISTA Lab, Algoritmi Center at the University of Évora, João participates in the EU Horizon funded project HarmonicAI and the development consulting project Awaken Moments funded by Empower Startups.
Identification

Personal identification

Full name
João Pedro Lebre Magalhães Pereira

Citation names

  • João L. M. Pereira

Author identifiers

Ciência ID
9813-025F-6310
ORCID iD
0000-0002-3247-5524
Google Scholar ID
n9-ph6AAAAAJ&hl
Researcher Id
IYJ-9715-2023
Scopus Author Id
57198007997

Knowledge fields

  • Engineering and Technology - Electrotechnical Engineering, Electronics and Informatics
  • Exact Sciences - Computer and Information Sciences - Information Science

Languages

Language Speaking Reading Writing Listening Peer-review
Portuguese (Mother tongue)
English Advanced (C1) Advanced (C1) Advanced (C1) Advanced (C1)
Spanish; Castilian Intermediate (B1) Intermediate (B1) Beginner (A1) Intermediate (B1)
Education
Degree Classification
2016 - 2023/02/24
Concluded
Engenharia Informática e de Computadores (Doutoramento)
Universidade de Lisboa Instituto Superior Técnico, Portugal
"Towards effective and effortless data cleaning: from automatic approaches to user involvement" (THESIS/DISSERTATION)
Aprovado com Distinção
2014
Concluded
Master of Science (Technology) in Computer Science and Engineering (Master)
Aalto-yliopisto, Finland
"Supervised Learning for Relationship Extraction From Textual Documents " (THESIS/DISSERTATION)
3 (Finnish scale)
2013/11/11
Concluded
Engenharia Informática e de Computadores (Mestrado)
Universidade de Lisboa Instituto Superior Técnico, Portugal
"Supervised Learning for Relationship Extraction From Textual Documents" (THESIS/DISSERTATION)
16
2013 - 2013
Concluded
Scientific Writing and Communication - Short Course for Researchers (Curso médio)
Universidade de Lisboa Instituto Superior Técnico, Portugal
2011
Concluded
Licenciatura Bolonha em Engenharia Informática e de Computadores - Alameda (Licenciatura)
Universidade de Lisboa Instituto Superior Técnico, Portugal
14
Affiliation

Science

Category
Host institution
Employer
2015/07 - 2015/10 Research Trainee (Research) Webdetials, a Pentaho Company, Portugal

Teaching in Higher Education

Category
Host institution
Employer
2024/09/16 - Current Invited Assistant Professor (University Teacher) Universidade de Évora, Portugal
Universidade de Évora Departamento de Informática, Portugal
2023/09/12 - 2024/06/06 Invited Assistant Professor (University Teacher) Universidade de Évora, Portugal
Universidade de Évora Departamento de Informática, Portugal

Positions / Appointments

Category
Host institution
Employer
2021/08/23 - 2023/09 Teacher 4 (Lecturer) Universiteit van Amsterdam Faculteit der Natuurwetenschappen Wiskunde en Informatica, Netherlands

Others

Category
Host institution
Employer
2015/10 - 2016/04 Engenheiro de software (Técnica Superior) Webdetials, a Pentaho Company, Portugal
2015/01 - 2015/07 Investigador Junior (Investigação Científica) Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvo, Portugal
2014/01 - 2014/12 Investigador Junior (Investigação Científica) Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvo, Portugal
2013/03 - 2013/12 Investigador Junior (Investigação Científica) Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvo, Portugal
Projects

Grant

Designation Funders
2018/09 - 2021/08 FCT PhD Scholarship
SFRH/BD/135719/2018
PhD Student Fellow
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Fundação para a Ciência e a Tecnologia
Concluded
2016 - 2018 ULisboa PhD Scholarship
BD ULisboa
PhD Student Fellow
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Concluded

Contract

Designation Funders
2024 - 2028 HarmonicAI: Human-guided collaborative multi-objective design of explainable, fair, and privacy-preserving AI for digital health
101131117
Researcher
Universidade do Minho Centro ALGORITMI, Portugal

Universidade de Évora, Portugal
Horizon Europe Excellent Science
2015/03/01 - 2021/04/01 Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa
UID/CEC/50021/2019
SFRH/BPD/110695/2015
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Fundação para a Ciência e a Tecnologia
Concluded
2015/01/12 - 2015/07/12 Project DataStorm - Large-Scale Data Management in Cloud Environments
EXCL/EEI-ESS/0257/2012
Research Fellow
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Fundação para a Ciência e a Tecnologia
Concluded
2014/01/01 - 2014/12/31 Project DataStorm - Large-Scale Data Management in Cloud Environments
EXCL/EEI-ESS/0257/2012
Research Fellow
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Fundação para a Ciência e a Tecnologia
Concluded
2013/03/15 - 2013/12/15 SMARTIES
PTDC/EIA-EIA/115346/2009
Research Fellow
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa, Portugal
Fundação para a Ciência e a Tecnologia
Concluded
Outputs

Publications

Conference paper
  1. Berry Blom; João L. M. Pereira. "Domain Adaptation in Transformer Models: Question Answering of Dutch Government Policies". Paper presented in Intelligent Data Engineering and Automated Learning, Évora, 2023.
    10.1007/978-3-031-48232-8_19
  2. Lorincz, Anna; Graus, David; Lavi, Dor; João Pedro Lebre Magalhães Pereira. "Transfer learning for multilingual vacancy text generation". 2022.
    10.18653/v1/2022.gem-1.18
  3. Pereira, João Pedro Lebre Magalhães; Helena Galhardas; Bruno Martins. "A Benchmark for Relation Extraction Kernels". Paper presented in East-European Conference on Advances in Databases and Information Systems, Poitiers, 2015.
    Published • 10.1007/978-3-319-23135-8_13
  4. Pereira, João Pedro Lebre Magalhães; Gonçalo Simões; Helena Galhardas; Bruno Martins. "Uma Benchmark para Kernels de Extracção de Relações". Paper presented in INFORUM, Porto, 2014.
    Published
Journal article
  1. João L. M. Pereira; Manuel J. Fonseca; Antónia Lopes; Helena Galhardas. "Cleenex: Support for User Involvement During an Iterative Data Cleaning Process". Journal of Data and Information Quality (2024): http://dx.doi.org/10.1145/3648476.
    10.1145/3648476
  2. João Pedro Lebre Magalhães Pereira; João Casanova; Helena Galhardas; Dennis Shasha. "AcX: system, techniques, and experiments for acronym expansion". Proceedings of the VLDB Endowment 15 11 (2022): 2530-2544. http://dx.doi.org/10.14778/3551793.3551812.
    10.14778/3551793.3551812
Report
  1. Pereira, João Pedro Lebre Magalhães; Gonçalo Simões; Helena Galhardas; Bruno Martins. 2015. A Benchmark for Relation Extraction Kernels.
Thesis / Dissertation
  1. "Towards effective and effortless data cleaning: from automatic approaches to user involvement". PhD, 2023. https://scholar.tecnico.ulisboa.pt/records/_9Wa_izOSPFA-ckoVSS5LrcEVDX-bgdZWHYy.
Activities

Supervision

Thesis Title
Role
Degree Subject (Type)
Institution / Organization
2024 - 2024 Automating Character Network Extraction for Portuguese Literature
Co-supervisor
Data-Driven Marketing (Master)
Universidade NOVA de Lisboa NOVA Information Management School, Portugal
2024 - 2024 Network analysis of characters in portuguese literature
Co-supervisor
Ciência de Dados e Métodos Analíticos Avançados (Master)
Universidade NOVA de Lisboa NOVA Information Management School, Portugal
2023 - 2023 Enhancing Acronym Identification: Introducing a parentheses-free rule-based algorithm - ParenlessAI and a novel Bulgarian Data Set
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Summarization of Webpages to Generate Company Descriptions
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Long-Form Question Answering in the Dutch Municipal Domain
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Enhancing Topic Classification of Dutch Provincial Motions through Transfer Learning: A Comparative Analysis of Machine Learning Models
Co-supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Using A Web Search Engine for Automatic Acronym Disambiguation
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Evaluating Large Language Models for Author Name Extraction in Noisy and Context-restricted Settings
Co-supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2023 - 2023 Extraction of Acronyms and Expansions from Text using Machine Learning Methods
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Acronym identification techniques and experiments for acronym expander systems
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Acronym expansion in Dutch: Improving out-expansion performance with BERT and SBERT
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Acronym expansion for Spanish language
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Domain Adaptation in Transformer models: Question Answering of Dutch Government Policies
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Identifying football talent profiles: a feature reduction and cluster analysis
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2022 - 2022 Transfer learning for multilingual vacancy text generation
Supervisor
Information Studies (Master)
Universiteit van Amsterdam, Netherlands
2020 - 2021 Acronym and Definition Extraction
Co-supervisor
Engenharia Informática e de Computadores (Master)
Universidade de Lisboa Instituto Superior Técnico, Portugal

Event organisation

Event name
Type of event (Role)
Institution / Organization
2014 - 2014 Data Integration in the Life Sciences
Conference (Other)
Fundação para a Ciência e a Tecnologia, Portugal
2014 - 2014 DataStorm Big Data Summer School
Other (Other)
Fundação para a Ciência e a Tecnologia, Portugal

Event participation

Activity description
Type of event
Event name
Institution / Organization
2023/11/22 - Current Session Chair
Conference
Intelligent Data Engineering and Automated Learning
Universidade de Évora, Portugal

Jury of academic degree

Topic
Role
Candidate name (Type of degree)
Institution / Organization
2024/10/29 Chemical Language Modeling. A Comparative Approach to SMILES and SELFIES Tokenization
(Thesis) Main arguer
Miguelangel Augusto Leon Mayuare (Master)
Universidade NOVA de Lisboa NOVA Information Management School, Portugal
2024/10/28 Improving Product Identification in Retail Industry using Computer Vision
(Thesis) Main arguer
Gonçalo Figueiredo Alves Lopes (Master)
Universidade NOVA de Lisboa NOVA Information Management School, Portugal
2024/10/28 Morality in Political Discourse in Portugal
(Thesis) Main arguer
Jaime Olivio Teixeira Duarte (Master)
Universidade NOVA de Lisboa NOVA Information Management School, Portugal
2023 An Experimental Evaluation of Cluster K Estimation Methods on Deep Learned Vector Embeddings for Page Stream Segmentation
(Thesis) Main arguer
Eric Alfaro (Master)
Universiteit van Amsterdam, Netherlands
2023 Testing for the efficacy and applicability of NLP approaches in capturing the dynamics of the British security environment between 2010 and 2021
(Thesis) Main arguer
Saskia Heyster (Master)
Universiteit van Amsterdam, Netherlands
2023 Provenance and Dependency Analysis of Slide Decks
(Thesis) Main arguer
Gargi Nandanpawar (Master)
Universiteit van Amsterdam, Netherlands
2023 Recognizing Complex Named Entities Using GPT-3
(Thesis) Main arguer
Mark Geurts (Master)
Universiteit van Amsterdam, Netherlands
2022 Information Retrieval and Knowledge Extraction on the Dutch Government Information Public Access Act Decision Letters
(Thesis) Main arguer
Julián Venhuizen (Master)
Universiteit van Amsterdam, Netherlands
2022 A comparison between two page-stream segmentation approaches using content evaluated on Dutch governmental data
(Thesis) Main arguer
Stefan Dijkstra (Master)
Universiteit van Amsterdam, Netherlands
2022 Combining Computer Vision and Deep Learning to split concatenated policy documents
(Thesis) Main arguer
Sang Pham Minh (Master)
Universiteit van Amsterdam, Netherlands
2022 Prediction of Post-Operative Esophagectomy Complications through Structured and Unstructured EHR Data
(Thesis) Main arguer
Emily Bakker (Master)
Universiteit van Amsterdam, Netherlands
2022 Improving readability and searchability of documents provided by the Dutch government under the WOB
(Thesis) Main arguer
Justin Bon (Master)
Universiteit van Amsterdam, Netherlands
2022 Detecting Redaction in publicly published government documents
(Thesis) Main arguer
Ammar Alhashmi (Master)
Universiteit van Amsterdam, Netherlands
2022 Generalisibility of deep learning page-stream segmentation methods evaluated on governmental data
(Thesis) Main arguer
Pepijn Groenen (Master)
Universiteit van Amsterdam, Netherlands
2022 Data structuring for climate change research: Carbon net-zero-specific label assignment to scientific documents
(Thesis) Main arguer
Suvendu Pati (Master)
Universiteit van Amsterdam, Netherlands
2022 Applying Dutch pre-trained word embedding models to classify grocery products
(Thesis) Main arguer
Gijs Gubbels (Master)
Universiteit van Amsterdam, Netherlands
2022 Using deep learned vector representations for page stream segmentation by agglomerative clustering
(Thesis) Main arguer
Lukas Busch (Master)
Universiteit van Amsterdam, Netherlands

Committee member

Activity description
Role
Institution / Organization
2024 - Current Membro da Comissão Científica da Maratona Inter-Universitária de Programação - MIUP. A Maratona Inter-Universitária de Programação (MIUP) é um concurso anual de programação para estudantes universitários, inserido no ACM International Collegiate Programming Contest (2024).
Member

Conference scientific committee

Conference name Conference host
2024 - 2024 Main-Track Program Committee member European Conference on Artificial Intelligence (ECAI)
2024 - 2024 Doctoral Consortium Program Committee member European Conference on Artificial Intelligence (ECAI)
2023/06/21 - 2023/06/23 International Conference on Computational Science London, United Kingdom

Course / Discipline taught

Academic session Degree Subject (Type) Institution / Organization
2014 - 2014 Java for Big Data Especialização (Curso médio) DataStorm Big Data Summer School, Portugal
2014 - 2014 Streaming Data Hands-On Lab Session Especialização (Curso médio) DataStorm Big Data Summer School, Portugal

Other jury / evaluation

Activity description Institution / Organization
2023 - Current Membro do júri para a contratação de dois docentes em Sistemas de Informação na University of Amsterdam Universiteit van Amsterdam, Netherlands
Distinctions

Award

2022 IST Excellent Teacher 2020/2021
Universidade de Lisboa Instituto Superior Técnico, Portugal
2012 3rd place in the EBEC Aalto Software Development Competition