???global.info.a_carregar???
Mohamed Khemakhem is the core developer of GROBID-Dictionaries, a machine-learning library for structuring digitised lexical resources. He is currently a research associate at Inria, Centre de Recherche de Paris (ALMAnaCH Lab), and is completing his Doctoral Studies in Computer Science at Paris Diderot University with a thesis on "Standard-based Lexical Models for Automatically Structured Dictionaries". He is a member of several standardising committies on language resources (ISO, DIN, AFNOR), and co-project leader of ISO 24613-4 “TEI-Serialization”. His research interests include digitisation of lexical resources, machine learning based information extraction and digital humanities.
Identification

Personal identification

Full name
Mohamed Khemakhem

Citation names

  • Khemakhem, Mohamed

Author identifiers

Ciência ID
F212-7C38-1418
ORCID iD
0000-0003-3529-2990
Google Scholar ID
7uv0BkAAAAJ&hl

Telephones

Telephone
  • (+33) 0768676304 (Professional)

Knowledge fields

  • Exact Sciences - Computer and Information Sciences - Computer Sciences

Languages

Language Speaking Reading Writing Listening Peer-review
Arab (Mother tongue)
French (Mother tongue)
English Proficiency (C2) Proficiency (C2) Proficiency (C2) Proficiency (C2) Proficiency (C2)
German Upper intermediate (B2) Upper intermediate (B2) Upper intermediate (B2) Advanced (C1) Intermediate (B1)
Education
Degree Classification
2016/09 - 2020/08
Ongoing
Doctoral Studies in Computer Science (Doctor)
Université Paris Diderot, France
"Standard-based Lexical Models for Automatically Structured Dictionaries" (THESIS/DISSERTATION)
2010/09 - 2012/12
Concluded
Master in Information Systems and New Technologies (Master)
Université de Sfax Faculté des Sciences Economiques et de Gestion de Sfax, Tunisia
"Collaborative Construct and Query System for a Standardized Arabic Dictionary" (THESIS/DISSERTATION)
2007/09 - 2010/06
Concluded
Bachelor in Computer Science Applied to Management (Bachelor (1st cycle))
Université de Sfax Faculté des Sciences Economiques et de Gestion de Sfax, Tunisia

Habib Maazoon High School, Tunisia
"Views Creation for the Interactive Standardized Arabic Dictionary" (THESIS/DISSERTATION)
Affiliation

Science

Category
Host institution
Employer
2016/06 - Current Researcher (Research) Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Centre Marc Bloch, Germany
2019/09/01 - 2020/08/31 Postdoc (Research) Université Grenoble Alpes, France
2019/09 - 2020/08 Researcher (Research) Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Université Grenoble Alpes, France
2014/09 - 2016/03 Researcher (Research) Ubiquitous Knowledge Processing (UKP) Lab, Germany
2012/02 - 2012/12 Researcher (Research) MIRACL Lab, Tunisia
Projects

Grant

Designation Funders
2019/09 - 2020/08 BasNum
ANR-18-CE38-0003
Researcher
Université Grenoble Alpes, France

Inria Centre de Recherche de Paris, ALMAnaCH Lab, France

Contract

Designation Funders
2016 - 2018 PARTHENOS
PhD Student Fellow
Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Concluded

Other

Designation Funders
2018 - Current Paris Time Machine Consortium
Researcher
Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Ongoing
2018/04 - 2020/11 DISCO
Research Fellow
Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Ongoing
2017 - 2018 Nénufar
Research Fellow
Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
Concluded
Outputs

Publications

Conference paper
  1. Khemakhem, Mohamed. "Information Extraction Workflow for Digitised Entry-based Documents". 2020.
  2. Khemakhem, Mohamed. "Selling autograph manuscripts in 19th c. Paris: digitising the Revue des Autographes". 2020.
  3. Khemakhem, Mohamed. "Nénufar: Modelling a Diachronic Collection of Dictionary Editions as a Computational Lexical Resource". 2019.
  4. Khemakhem, Mohamed. "TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries". 2019.
  5. Khemakhem, Mohamed. "Scaling up Automatic Structuring of Manuscript Sales Catalogues". 2019.
  6. Khemakhem, Mohamed. "How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures". 2019.
  7. Khemakhem, Mohamed. "Historical Dictionaries as Digital Editions and Connected Graphs: the Example of Le Petit Larousse Illustré". 2019.
  8. Khemakhem, Mohamed. "LMF Reloaded". 2019.
  9. Khemakhem, Mohamed. "Retro-digitizing and Automatically Structuring a Large Bibliography Collection". 2018.
  10. Khemakhem, Mohamed. "Automatically Encoding Encyclopedic-like Resources in TEI". 2018.
  11. Khemakhem, Mohamed. "Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories". 2018.
  12. Khemakhem, Mohamed. "Presenting the Nénufar Project: a Diachronic Digital Edition of the Petit Larousse Illustré". 2018.
  13. Khemakhem, Mohamed. "Enhancing Usability for Automatically Structuring Digitised Dictionaries". 2018.
  14. Khemakhem, Mohamed. "Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields". 2017.
  15. Khemakhem, Mohamed. "Sense-annotating a Lexical Substitution Data Set with Ubyline". 2016.

Other

Other output
  1. GROBID and catalogues. 2018. Khemakhem, Mohamed. https://hal.archives-ouvertes.fr/cel-01951107.
  2. A Diachronic Digital Edition of the Petit Larousse illustré. 2018. Khemakhem, Mohamed. https://hal.archives-ouvertes.fr/hal-01873805.
Activities

Event organisation

Event name
Type of event (Role)
Institution / Organization
2018 - 2018 GROBID-Camp Spring 2018 – Inria, Paris (2018 - 2018)
Workshop
2017 - 2017 GROBID-Camp Summer 2017 - ResearchGate, Berlin (2017 - 2017)
Workshop

Association member

Society Organization name Role
2018 - Current Member of PARTHENOS Project (https://www.parthenos-project.eu/) Member

Committee member

Activity description
Role
Institution / Organization
2019 - Current Member of DIN NA 105-00-66 AA committee “Language Resource Management”
Advisor / Consultant
Deutsches Institut für Normung eV, Germany
2019 - Current Member of AFNOR/X03A committee “Terminologie - principes et coordination”
Advisor / Consultant
Association Française de Normalisation, France
2019 - Current Member of the ISO/TC 37/SC4/WG4 “Language resource management”
Advisor / Consultant
International Organization for Standardization, Switzerland
2019 - Current DARIAH BiblioData Working Group (https://www.dariah.eu/activities/working-groups/bibliographical-data-bibliodata/)
Member
2018 - Current Member of DARIAH-ERIC Working Group "Lexical Resources" (https://dariah-eric.github.io/lexicalresources/)
Member
2018 - Current Member of Groupe annuaires et adresses at Paris Time Machine (https://paris-timemachine.huma-num.fr/groupe-adresses-et-annuaires/)
Member

Conference scientific committee

Conference name Conference host
2016 - 2016 10th edition of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož (Slovenia)

Consulting

Activity description Institution / Organization
2019 - Current Co-Project Leader of ISO 24613-4 “TEI-Serialization” (https://www.iso.org/standard/75411.html) International Organization for Standardization, Switzerland
2016/06 - Current - Study approaches and techniques for structuring modern and legacy digitized dictionaries; - Design and Implementation of an open source machine learning system for parsing and structuring digitized dictionaries; - Design the architecture of GROBID-Dictionaries (https://github.com/MedKhem/grobid-dictionaries) to cover more entry-based documents. Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
2019/09 - 2020/08 Customise GROBID-Dictionaries (https://github.com/MedKhem/grobid-dictionaries) to support the structuring of legacy dictionaries (Basnage de Beauval dictionary). Inria Centre de Recherche de Paris, ALMAnaCH Lab, France
2014/09 - 2016/03 - Study the manual and semi-automatic techniques for linking corpora and lexical resources for the purpose of semantic annotation; - Get familiarized with DKPro, a repository of NLP tools based on Apache UIMA framework; - Design and implementation of UbyLine for the extraction and annotation of usage examples from corpora to be linked to entries in a lexical resource (UBY). Ubiquitous Knowledge Processing (UKP) Lab, Germany
2012/02 - 2012/12 - Adaptation of collaborative techniques for lexicographic tasks; - Design and implementation of a web based system for interactive query and collaborative enrichment of an Arabic dictionary, an instantiation of the ISO standard Lexicon Markup Framework (LMF). MIRACL Lab, Tunisia

Course / Discipline taught

Academic session Degree Subject (Type) Institution / Organization
2018/12/03 - 2018/12/07 GROBID-Dictionaries workshop at Lexical data Masterclass 2018 – BBAW, Berlin Berlin Brandenburg Academy of Sciences (BBAW), Germany
2018/11/02 - 2018/11/02 GROBID-Dictionaries workshop at Stellenbosch University 2018 – SADiLaR, Stellenbosch Stellenbosch Institute for Advanced Studies (STIAS), South Africa
2018/10/30 - 2018/10/30 GROBID-Dictionaries workshop at North-West University 2018 – SADiLaR, Potchefstroom South African Centre for Digital Language Resources (SADiLaR), South Africa
2018/10/26 - 2018/10/26 GROBID-Dictionaries workshop at University of Pretoria 2018 – SADiLaR, Pretoria University of Pretoria, South Africa
2018/06/26 - 2018/06/29 GROBID-Dictionaries workshop at CAHIER 2018 – Praxiling, Montpellier Université Paul-Valéry Montpellier 3, France
2017/12/04 - 2017/12/08 GROBID-Dictionaries workshop at Lexical Data Masterclass 2017 – BBAW, Berlin Berlin Brandenburg Academy of Sciences (BBAW), Germany