research-profile

Opening portfolio

Loading profile…

Paris, France

AI Researcher · Data Scientist · Statistician

Channdeth SOK

Recently graduated from ENSAE Paris and Institut Polytechnique de Paris with an MSc in Data Science & AI.

Research interests: Self-supervised Learning · LLM · RAG · ML · DS

News

Recent updates

  • Officially graduated from Institut Polytechnique de Paris with an MSc in Data Science.

  • Officially graduated from ENSAE Paris (IP Paris) with an MSc in Data Science & Statistics.

  • Paper accepted at IAAI-ALA 2025 is a part of ECAI 2025 — MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems. [link]

  • Joined Forvia as Data Science & AI Research Intern, working on hallucination detection on LLM and AI Agent applications [link]

Publications

Research highlights

Google Scholar ↗
  1. [1]MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems

    IAAI Workshop @ ECAI 2025 · Bologna, Italy (CEUR-WS Vol-4136)

    Proposes MetaRAG, a metamorphic testing framework to systematically detect hallucinations in Retrieval-Augmented Generation systems by evaluating consistency under controlled input transformations.

Experience

Trajectory

Forvia logo

May – Nov 2025

Data Science & AI InternForvia (Paris Tech Center)

Developed rapid prototypes for predictive maintenance and supply-chain decision-making, collaborating with embedded teams to deploy AI models within the digital cockpit stack.

AI-Vidence logo

Mar – Jul 2024

AI Engineer InternAI-Vidence

Built document intelligence pipelines combining large language models with compliance workflows, improving review throughput for regulated clients.

Paris Partners Softwares logo

Jun – Sep 2023

AI Engineer InternParis Partners Softwares

Studied state-of-the-art LLMs and built a GPT-3.5/LLaMA-2 system to evaluate résumés against job descriptions, producing ranked candidate shortlists with explainable qualification decisions.

Education

Academic path

Institut Polytechnique de Paris logo

2024 — 2025

Master of Science in Data Science

Institut Polytechnique de Paris | IP Paris

Relevant courses: Deep learning, NLP, Advanced AI, Big Data, Reinforcement Learning.

ENSAE Paris logo

2022 — 2025

Diplôme d'ingénieur — Data Science, Statistics & ML

ENSAE Paris

Relevant courses: Statistics, Machine Learning, Deep Learning, NLP, Bayesian Statistics, Econometrics, Optimal Transport, Macroeconomics, Microeconomics.

Institut de Technologie du Cambodge logo

2019 — 2022

Engineering Program

Institut de Technologie du Cambodge | ITC

Two years of intensive courses in Mathematics, Physics and Computer Science.

Projects

Selected work

Research Implementation

Annealed Sinkhorn for Optimal Transport

Reproduced the convergence, regularization path, and debiasing results from Lénaïc Chizat (2024) using OTT-JAX and packaged the workflow in Google Colab for peers.

Delivered annotated notebooks and benchmarks validating annealed Sinkhorn behavior across datasets.

Python · JAX · OTT-JAX · Colab

VINCI · Jan–Apr 2025

Agnostic LLM Retriever

Developing an essential retriever (LLM) that would be reboust enough to efficiently deal with generic use cases, but gnostic enough to be easily customized to deal with specific use cases. This would almost be like a package for information retrieval built for Python to service all LLM use cases that would want or need to use it

Built a flexible retriever module that can be easily integrated into RAG pipelines, improving retrieval relevance and reducing hallucinations across diverse applications.

Python · LangChain · HuggingFace · LLM

Data Science Sprint

Education Investment Ranking

Constructed composite indicators to score countries on education investment attractiveness, blending macro trends with education KPIs.

Produced a ranked list of countries with actionable insights for policymakers and investors, highlighting key drivers of education investment potential.

Python · PCA · EDA · Visualization

Collège de France · 2022–2023

Math Performance Gap Study

Analyzed DEPP Premier Degré panel data to uncover determinants of mathematics performance gaps among French students.

Identified key socioeconomic and pedagogical factors to inform policy recommendations.

R · Regression · EDA · Policy Analysis