Research

(Selected Research Projects. Last Update: Oct 18, 2022)

COLING 2022: Clinical Note Summarization - Summarizing Patients’ Problems from Hospital Progress Notes

LREC 2022: Progress Note Understanding - A Suite of clinical NLP tasks for Diagnostic Reasoning

JAMIA: A Scoping Review of Publicily Available cNLP Tasks

ACL 2021: ABCD - Decomposing Complex Sentences into Individual Propositions

CoNLL 2019: PyrEval - Automated Pyramid Summarization Evaluation

AI in Education: Rubric Reliability and Student Writing Evaluation

Clinical Note Summarization: Summarizing Patients’ Problems from Hospital Progress Notes

Paper full text link:

Progress Note Understanding: A Novel Annotation Framework and A New Suite of Clinical NLP Tasks for Diagnostic Reasoning

Paper full text link:

Current States of Clinical Natural Language Processing: A Scoping Review of Publicly Available Clinical NLP Tasks

Applying methods in natural language processing (NLP) on electronic health records (EHR) data has attracted rising interests. We write a paper examining the current states of Clinical Natural Language Processing, through existing public language tasks built on eletronic health records. Paper is currently under review.

Gitlab: Link

Full-text article:

Gao, Yanjun, Dmitriy Dligach, Leslie Christensen, Samuel Tesch, Ryan Laffin, Dongfang Xu, Timothy Miller, Ozlem Uzuner, Matthew M. Churpek, and Majid Afshar. "A scoping review of publicly available language tasks in clinical natural language processing." Journal of the American Medical Informatics Association 29, no. 10 (2022): 1797-1806. Paper

--- ### ABCD: A Graph Framework Decomposing Complex Sentence into Simple Sentences

Finding propositions (simple sentences) from complex sentences is an important step for complex sentence understanding, representation, and other applications such as summarization. We proposed a sentence graph representing the syntax (dependency relations) within a complex sentence, and 4 graph edit operation that are learnable through neural network: Accept, Break, Copy and Drop. In this work, we show that through such edit operations, we build a system pipeline with nerual network and graph-based algorithm that succesfully decomposes a complex sentence into multiple simple sentences where the mearnings are preserved.

Publication:

Gao, Yanjun, Ting-Hao Huang, and Rebecca J. Passonneau. "ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences." Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

--- ### PyrEval: Automated Pyramid Summarization Evaluation

This work is based on the pyramid method for summarization evaluation. We proposed a new software package, PyrEval, as pipeline of building pyramids (content model) from reference summaries to evaluate target summaries without the need of human effort. PyrEval uses a sentence decomposition parser to find clausal units from complex sentences, a matrix factorization method to generate clause embeddings, and an novel set partition algorithm based on cosine similarity of clause embeddings to find content units and build the pyramid. We achieve higher correlation on human summaries than ROUGE, and PyrEval offers fine-grained feedback analysis for summarizer developers.

Papers:

Gao, Yanjun, Chen Sun, and Rebecca J. Passonneau. "Automated pyramid summarization evaluation." Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019.

Gao, Yanjun, Andrew Warner, and Rebecca J. Passonneau. "Pyreval: An automated method for summary content analysis." Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.

--- ### Rubric Design and Student Writing Assessments in STEM

We designed rubrics and tested their reliabilities for student writing assessments, particularly for studetns in STEM major. We conducted content unit annotation on students' summaries and argumentation annotation on students essays, using an improved version of an existing annotation tool, DUCView, and a new tool, SEAView, that combines summary content annotation with essay annotation. We then presented experiments using PyrEval on students summaries to explore the possibilities and challenges of using NLP technologies in real classroom settings. Please see our BEA 2019 and BEA 2018 paper for more details.

--- ### Collaborative Information Behaviour

We designed a web-based platform that supports multi-user information searching, sharing and communication, in the context of online retail. We then presented user studies and results of this platform. Please see our HICSS 2017 and CHI 2016 papers for more details.

Yanjun Gao