Profile Pic

Hello! I'm Chihsheng Jin ([ʒɯ ʃəŋ]) (he/him/his). In 2023, I received my Bachelor's degree in Chinese Language at Fudan University, where I was advised by Prof. Yufu Chien. I've also received an Associate degree in French from Shanghai Internaltional Studies University in 2022.
In 2025, I received a Master's degree in Computational Linguistics, advised by Prof. Aaron White, while I was working at the Facts Lab.
My research interests are language modeling, formal semantics, information extraction and psycholinguistics.
You can check out my CV here.

Experience

Research

Cross-Document Event-Keyed Summarization [arxiv]

William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

In this project we constructed a cross-document summarization dataset on top of the FAMuS (Frame across multiple Sources) dataset. Results from extensive experiments show that cross-document summarization is a non-trivial task for language models, and smaller models can even outperform larger models when being fine-tuned with the dataset.

Presented at XLLM@ACL2025 and PEER2025 [Slides].

Dynamics in the phonological encoding of bilingual speech production [pdf]

Senior thesis. I used a character-naming paradigm to test whether the phonological mapping status of cognates in Mandarin and Shanghai Dialect would affect the speech production process. Besides the most common cognate facilitation effect, there are two novel findings that have yet to be discovered in this field. First, the phonological mapping complexity between the two languages interferes with the speech production. Second, whether the phonemes is the most frequent match for the latent language also affects the speech production. More specifically, if the Mandarin phoneme in the trial is the most frequent match for the Shanghai Dialect phoneme (absent in the experiment), the cognate facilitation effect disappears.

Projects

Agenda-based Parser for Multiple Context-free Grammars [Github]

This is an agenda-based parsing Python module for multiple context-free grammars a full test suite implemented in Pytest. Multiple context-free grammar is a class of grammar that is strictly more expressive than context-free grammar.

Retrieval-Augmented Event Extraction [Github]

I conducted extensive experienments on GPT-4o and Claude Sonnet 3.5's capability of extracting event roles from the FAMuS dataset using various prompt strategies. Using the CEAF family of metrics, the results suggest that large language models can not extract roles accurately and truthfully, indicating that they do not have human-level event understanding. However, after some post-processing, the extracitons can still be useful for some applications.

Check out the paper if you're interested about this study and the results.

Skills

Programming Languages

Python (Data Science toolkits),
Pytorch, Lightning, R, Haskell,
Latex, HTML, JavaScript

Prodcution Softwares

Ableton Live (Professional Mixing&Mastering), Photoshop, Indesign, Lightroom, Praat

If you are interested in audio engineering services check out this page!

Professional Experiences

FACTS.Lab

Jun 2024 - Now

At FACTS.Lab, I worked on a series projects related to structured representations of events, event extraction and event retrieval.

Shanghai AI Lab

Jun 2022 - Oct 2022

I worded as a project management intern and used my coding skills to process files, which made our workflow more efficent.

Fudan Yayan

Jun 2020 - Sep 2022

Fudan Yayan is bimonthly in-house newspaper dedicated to news on language regulations and language research in Fudan University. I worked as a chief editor and designer there for tow years. You can check out our works here (I'm sorry that the contents are in Chinese and we don't upload the complete issues online).