Chihsheng Jin
[ʈ͡ʂɯ ʂəŋ] · he/him
Hi! Welcome to my personal website. I'm an independent researcher and developer working on natural language generation.
In 2023, I received my Bachelor's degree in Chinese Language at Fudan University, where I was advised by Prof. Yu-Fu Chien. I also received an Associate degree in French from Shanghai International Studies University in 2022.
In 2025, I received my Master's degree in Computational Linguistics, advised by Prof. Aaron Steven White, while working at the FACTS.lab.
My research interests are language modeling, formal methods, information extraction and psycholinguistics.
On a sidenote, my preferred name is Jin.
Research
Cross-Document Event-Keyed Summarization
arXivWe constructed a cross-document summarization dataset on top of the FAMuS dataset. Results from extensive experiments show that cross-document summarization is a non-trivial task for language models, and smaller models can even outperform larger models when fine-tuned with the dataset.
Presented at XLLM@ACL2025 and PEER2025 · Slides
Dynamics in the phonological encoding of bilingual speech production
PDFI used a character-naming paradigm to test whether the phonological mapping status of cognates in Mandarin and Shanghai Dialect would affect the speech production process. Besides the common cognate facilitation effect, there are two novel findings: the phonological mapping complexity between languages interferes with speech production, and whether phonemes are the most frequent match for the latent language also affects production.
Note: This experiment was conducted during a COVID lockdown with limited sample size, so I decided against publishing.
Projects
Hover PDF Reader
GitHubA minimalist PDF reader extension designed for people who spend way too much time reading academic papers in the browser. It intentionally keeps only the most essential features, but aims to make the actual experience of reading papers smoother, faster, and more immersive.
Retrieval-Augmented Event Extraction
GitHubI conducted extensive experiments on GPT-4o and Claude Sonnet 3.5's capability of extracting event roles from the FAMuS dataset using various prompt strategies. Using CEAF metrics, results suggest large language models cannot extract roles accurately, indicating they lack human-level event understanding. However, with post-processing, extractions can still be useful for some applications.
Agenda-based Parser for Multiple Context-free Grammars
GitHubA Python module for parsing multiple context-free grammars. MCFG is a class of grammars strictly more expressive than context-free grammar, useful for cross-serial dependencies, agreement within relative clauses, and more.
Experience
FACTS.Lab
Jun 2024 – PresentWorking on a series of projects related to structured representations of events, event extraction and event retrieval.
Shanghai AI Lab
Jun 2022 – Oct 2022Project management intern. Used coding skills to process files and improve workflow efficiency.
Fudan Yayan
Jun 2020 – Sep 2022Chief editor and designer for a bimonthly in-house newspaper dedicated to news on language regulations and language research at Fudan University.