Profile Pic

Hello! I'm Chihsheng Jin ([ʒɯ ʃəŋ]) (he/him/his). In 2023, I received my Bachelor's degree in Chinese Language at Fudan University, where I was advised by Prof. Yu-Fu Chien. I've also received an Associate degree in French from Shanghai International Studies University in 2022.
In 2025, I received my Master's degree in Computational Linguistics, advised by Prof. Aaron Steven White, while I was working at the FACTS.lab.
My research interests are language modeling, formal semantics, information extraction and psycholinguistics.
You can check out my CV here.

Experience

Selected Research

Cross-Document Event-Keyed Summarization [arxiv]

William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

In this project we constructed a cross-document summarization dataset on top of the FAMuS (Frame across multiple Sources) dataset. Results from extensive experiments show that cross-document summarization is a non-trivial task for language models, and smaller models can even outperform larger models when being fine-tuned with the dataset.

Presented at XLLM@ACL2025 and PEER2025 [Slides].

Dynamics in the phonological encoding of bilingual speech production [pdf]

Senior thesis. I used a character-naming paradigm to test whether the phonological mapping status of cognates in Mandarin and Shanghai Dialect would affect the speech production process. Besides the most common cognate facilitation effect, there are two novel findings that have yet to be discovered in this field. First, the phonological mapping complexity between the two languages interferes with the speech production. Second, whether the phonemes are the most frequent match for the latent language also affects the speech production. More specifically, if the Mandarin phoneme in the trial is the most frequent match for the Shanghai Dialect phoneme (absent in the experiment), the cognate facilitation effect disappears. I decided against publishing this manuscript because this experiment was conducted during a COVID lockdown, so the sample size is extremely limited.

Projects

Agenda-based Parser for Multiple Context-free Grammars [Github]

This is an agenda-based parsing Python module for multiple context-free grammar. Multiple context-free grammar is a class of grammars that is strictly more expressive than context-free grammar, which can be used for cross-serial dependencies, agreement within relative clauses, etc.

Retrieval-Augmented Event Extraction [Github]

I conducted extensive experiments on GPT-4o and Claude Sonnet 3.5's capability of extracting event roles from the FAMuS dataset using various prompt strategies. Using the CEAF family of metrics, the results suggest that large language models can not extract roles accurately and truthfully, indicating that they do not have human-level event understanding. However, after some post-processing, the extractions can still be useful for some applications.

Check out the paper if you're interested about this study and the results.

Skills

Programming Languages

Python (& Data Science toolkits),
PyTorch, Lightning
LaTeX, JavaScript, CSS, HTML

Production Software

Ableton Live (Professional Mixing&Mastering), Photoshop, Indesign, Lightroom, Praat

If you are interested in audio engineering services check out this page!

Professional Experiences

FACTS.Lab

Jun 2024 - Now

At FACTS.Lab, I worked on a series projects related to structured representations of events, event extraction and event retrieval.

Shanghai AI Lab

Jun 2022 - Oct 2022

I worded as a project management intern and used my coding skills to process files, which made our workflow more efficient.

Fudan Yayan

Jun 2020 - Sep 2022

Fudan Yayan is bimonthly in-house newspaper dedicated to news on language regulations and language research in Fudan University. I worked as a chief editor and designer there for two years. You can check out our works here (I'm sorry that the contents are in Chinese and we don't upload the complete issues online).