Chihsheng Jin

[ʈ͡ʂɯ ʂəŋ] · he/him

Hi! Welcome to my personal website. I'm an independent researcher and developer working on natural language generation.

In 2023, I received my Bachelor's degree in Chinese Language at Fudan University, where I was advised by Prof. Yu-Fu Chien. I also received an Associate degree in French from Shanghai International Studies University in 2022.

In 2025, I received my Master's degree in Computational Linguistics, advised by Prof. Aaron Steven White, while working at the FACTS.lab.

My research interests are language modeling, formal methods, information extraction and psycholinguistics.

On a sidenote, my preferred name is Jin.

Curriculum Vitae

Email

GitHub

Research

Cross-Document Event-Keyed Summarization

arXiv

William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

We constructed a cross-document summarization dataset on top of the FAMuS dataset. Results from extensive experiments show that cross-document summarization is a non-trivial task for language models, and smaller models can even outperform larger models when fine-tuned with the dataset.

Presented at XLLM@ACL2025 and PEER2025 · Slides

Dynamics in the phonological encoding of bilingual speech production

PDF

Senior thesis

I used a character-naming paradigm to test whether the phonological mapping status of cognates in Mandarin and Shanghai Dialect would affect the speech production process. Besides the common cognate facilitation effect, there are two novel findings: the phonological mapping complexity between languages interferes with speech production, and whether phonemes are the most frequent match for the latent language also affects production.

Note: This experiment was conducted during a COVID lockdown with limited sample size, so I decided against publishing.

Projects

Hover PDF Reader

GitHub

A minimalist PDF reader extension designed for people who spend way too much time reading academic papers in the browser. It intentionally keeps only the most essential features, but aims to make the actual experience of reading papers smoother, faster, and more immersive.

Retrieval-Augmented Event Extraction

GitHub

I conducted extensive experiments on GPT-4o and Claude Sonnet 3.5's capability of extracting event roles from the FAMuS dataset using various prompt strategies. Using CEAF metrics, results suggest large language models cannot extract roles accurately, indicating they lack human-level event understanding. However, with post-processing, extractions can still be useful for some applications.

Read the paper

Agenda-based Parser for Multiple Context-free Grammars

GitHub

A Python module for parsing multiple context-free grammars. MCFG is a class of grammars strictly more expressive than context-free grammar, useful for cross-serial dependencies, agreement within relative clauses, and more.

Experience

FACTS.Lab

Jun 2024 – Present

Working on a series of projects related to structured representations of events, event extraction and event retrieval.

Shanghai AI Lab

Jun 2022 – Oct 2022

Project management intern. Used coding skills to process files and improve workflow efficiency.

Fudan Yayan

Jun 2020 – Sep 2022

Chief editor and designer for a bimonthly in-house newspaper dedicated to news on language regulations and language research at Fudan University.