
Zihan Wang, Anita Slominska, Rennie Bimman, Elizabeth Di Flumeri, Amanda Mayappo-Neeposh, Conall Francoeur, Tamara Ellen Carver, Xiao-Wen Chang, Doina Precup, Esin Darici Haritaoglu, Ismail Haritaoglu, Akshatha Arodi, Naomi Goloff
Under review at EMNLP 2026
Pediatric serious illness communication (SIC) is critically important, yet scalable communication training for clinicians remains limited. Compared with other dialogue simulation settings, pediatric SIC poses additional challenges, including multi-party interactions, response to parental distress, and strong dependence on feedback dynamics. In collaboration with educators and pediatric clinicians, we introduce the first benchmark suite and simulation framework tailored to pediatric SIC training. Our benchmarks, PitfallBench and DialogueBench, evaluate simulators both at the turn level and across full dialogues. We further propose SIC-Agents, a self-improving framework that generates a clinician-editable skill document to guide simulator behavior. Our experiments show that SIC-Agents outperforms static expert prompting. To support future research, we release our benchmarks and a training interface for parent simulation in pediatric SIC.
Zihan Wang, Anita Slominska, Rennie Bimman, Elizabeth Di Flumeri, Amanda Mayappo-Neeposh, Conall Francoeur, Tamara Ellen Carver, Xiao-Wen Chang, Doina Precup, Esin Darici Haritaoglu, Ismail Haritaoglu, Akshatha Arodi, Naomi Goloff
Under review at EMNLP 2026
Pediatric serious illness communication (SIC) is critically important, yet scalable communication training for clinicians remains limited. Compared with other dialogue simulation settings, pediatric SIC poses additional challenges, including multi-party interactions, response to parental distress, and strong dependence on feedback dynamics. In collaboration with educators and pediatric clinicians, we introduce the first benchmark suite and simulation framework tailored to pediatric SIC training. Our benchmarks, PitfallBench and DialogueBench, evaluate simulators both at the turn level and across full dialogues. We further propose SIC-Agents, a self-improving framework that generates a clinician-editable skill document to guide simulator behavior. Our experiments show that SIC-Agents outperforms static expert prompting. To support future research, we release our benchmarks and a training interface for parent simulation in pediatric SIC.

Shuyuan Zhang, Zihan Wang, Xiao-Wen Chang, Doina Precup
Under review at Transactions on Machine Learning Research (TMLR) 2026
Graph-based Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) typically uses the graph as a stochastic sampling tool rather than as an environmental model that encodes connectivity and state-accessibility information, which is particularly limiting in quasimetric environments where asymmetric state transitions challenge stable policy learning and path planning. We introduce a state connectivity model that predicts pairwise state-connectivity strength in asymmetric environments and transforms it into scalar auxiliary dense rewards that provide continuous guidance across hierarchical levels. Our framework, Graph-Guided Quasimetric Dense Reward (G2QDR), can be integrated into any existing GCHRL architecture; the connectivity model is implemented as a neural network trained on a directed state graph generated during exploration. Across a wide range of sparse-reward environments, G2QDR significantly enhances baseline GCHRL approaches with minimal computational overhead.
Shuyuan Zhang, Zihan Wang, Xiao-Wen Chang, Doina Precup
Under review at Transactions on Machine Learning Research (TMLR) 2026
Graph-based Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) typically uses the graph as a stochastic sampling tool rather than as an environmental model that encodes connectivity and state-accessibility information, which is particularly limiting in quasimetric environments where asymmetric state transitions challenge stable policy learning and path planning. We introduce a state connectivity model that predicts pairwise state-connectivity strength in asymmetric environments and transforms it into scalar auxiliary dense rewards that provide continuous guidance across hierarchical levels. Our framework, Graph-Guided Quasimetric Dense Reward (G2QDR), can be integrated into any existing GCHRL architecture; the connectivity model is implemented as a neural network trained on a directed state graph generated during exploration. Across a wide range of sparse-reward environments, G2QDR significantly enhances baseline GCHRL approaches with minimal computational overhead.

Mohammadreza Sadeghi, Sareh Soleimani, Zihan Wang, Narges Armanfard
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2026
We introduce Unsupervised Continual Clustering (UCC) and propose Forward-Backward Knowledge Distillation for Continual Clustering (FBCC) to address catastrophic forgetting in unsupervised continual learning. FBCC leverages a teacher-student distillation framework with both forward and backward knowledge transfer to enhance memory efficiency and clustering performance. Experiments show that FBCC outperforms existing methods on continual clustering tasks, marking a significant advance for unsupervised continual learning.
Mohammadreza Sadeghi, Sareh Soleimani, Zihan Wang, Narges Armanfard
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2026
We introduce Unsupervised Continual Clustering (UCC) and propose Forward-Backward Knowledge Distillation for Continual Clustering (FBCC) to address catastrophic forgetting in unsupervised continual learning. FBCC leverages a teacher-student distillation framework with both forward and backward knowledge transfer to enhance memory efficiency and clustering performance. Experiments show that FBCC outperforms existing methods on continual clustering tasks, marking a significant advance for unsupervised continual learning.

Diganta Misra*, Nizar Islah*, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Muawiz Sajjad Chaudhary, Antonio Orvieto, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia (* equal contribution)
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL) 2026
We present GitChameleon 2.0, a dataset of 328 Python code completion problems conditioned on specific library versions, each paired with executable unit tests for rigorous, execution-based evaluation. Our analysis reveals that state-of-the-art LLMs and code assistants struggle with version-conditioned code generation, achieving only 48–51% success rates. GitChameleon 2.0 provides a challenging benchmark to spur advances in adaptable and robust AI code generation.
Diganta Misra*, Nizar Islah*, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Muawiz Sajjad Chaudhary, Antonio Orvieto, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia (* equal contribution)
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL) 2026
We present GitChameleon 2.0, a dataset of 328 Python code completion problems conditioned on specific library versions, each paired with executable unit tests for rigorous, execution-based evaluation. Our analysis reveals that state-of-the-art LLMs and code assistants struggle with version-conditioned code generation, achieving only 48–51% success rates. GitChameleon 2.0 provides a challenging benchmark to spur advances in adaptable and robust AI code generation.

Wesley Chung*, Zihan Wang*, Xiao-Wen Chang, David Meger, Doina Precup (* equal contribution)
Continual RL Workshop, Reinforcement Learning Conference (RLC) 2026
Continual reinforcement learning considers an agent that receives and learns from a stream of experience, aiming to maximize its accumulated reward. A fundamental problem is to evaluate such an agent using only its stream of experience, without assuming a particular structure of the world. We define an examiner that observes the same stream and, at every timestep, computes a perceived regret representing the agent's suboptimality gap from the examiner's perspective. We demonstrate empirically that perceived regret is a useful performance measure applicable to arbitrary streams of experience, and prove that no universal examiner can accurately assess all agents in any world, though specific modeling biases enable success in associated environments.
Wesley Chung*, Zihan Wang*, Xiao-Wen Chang, David Meger, Doina Precup (* equal contribution)
Continual RL Workshop, Reinforcement Learning Conference (RLC) 2026
Continual reinforcement learning considers an agent that receives and learns from a stream of experience, aiming to maximize its accumulated reward. A fundamental problem is to evaluate such an agent using only its stream of experience, without assuming a particular structure of the world. We define an examiner that observes the same stream and, at every timestep, computes a perceived regret representing the agent's suboptimality gap from the examiner's perspective. We demonstrate empirically that perceived regret is a useful performance measure applicable to arbitrary streams of experience, and prove that no universal examiner can accurately assess all agents in any world, though specific modeling biases enable success in associated environments.

Shuyuan Zhang, Zihan Wang, Xiao-Wen Chang, Doina Precup
Transactions on Machine Learning Research (TMLR) 2025
We propose Graph-Guided sub-Goal representation Generation RL (G4RL), a method that incorporates a graph encoder-decoder into goal-conditioned hierarchical reinforcement learning (GCHRL) to address sample inefficiency and subgoal representation issues. G4RL leverages state graphs built during exploration to provide intrinsic rewards and enable effective evaluation of unseen states, without requiring domain-specific knowledge. Experiments demonstrate that G4RL consistently improves the performance of state-of-the-art GCHRL methods in both dense and sparse reward settings with minimal computational overhead.
Shuyuan Zhang, Zihan Wang, Xiao-Wen Chang, Doina Precup
Transactions on Machine Learning Research (TMLR) 2025
We propose Graph-Guided sub-Goal representation Generation RL (G4RL), a method that incorporates a graph encoder-decoder into goal-conditioned hierarchical reinforcement learning (GCHRL) to address sample inefficiency and subgoal representation issues. G4RL leverages state graphs built during exploration to provide intrinsic rewards and enable effective evaluation of unseen states, without requiring domain-specific knowledge. Experiments demonstrate that G4RL consistently improves the performance of state-of-the-art GCHRL methods in both dense and sparse reward settings with minimal computational overhead.

Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
Proceedings of the British Machine Vision Conference (BMVC) 2025 Oral
Zero-shot anomaly detection (ZSAD) aims to identify and localize unseen defects without requiring any labeled anomalies, but existing methods struggle to generalize under domain shifts. We propose PILOT, a framework combining a dual-branch prompt learning mechanism with label-free test-time adaptation, enabling dynamic adaptation to new distributions using only unlabeled data. PILOT achieves state-of-the-art performance on 13 industrial and medical benchmarks for both anomaly detection and localization under domain shift.
Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
Proceedings of the British Machine Vision Conference (BMVC) 2025 Oral
Zero-shot anomaly detection (ZSAD) aims to identify and localize unseen defects without requiring any labeled anomalies, but existing methods struggle to generalize under domain shifts. We propose PILOT, a framework combining a dual-branch prompt learning mechanism with label-free test-time adaptation, enabling dynamic adaptation to new distributions using only unlabeled data. PILOT achieves state-of-the-art performance on 13 industrial and medical benchmarks for both anomaly detection and localization under domain shift.

Kunhan Wu*, Zihan Wang*, Jingyi Zhao, Haodong Xu, Tianming Hao, Wenzhi Lin (* equal contribution)
Journal of Physics: Conference Series 2023
We implement DeepIV, a pioneering framework that combines deep learning and instrumental variables for causal inference, to predict the effect of educational background on annual income using real-world datasets. Our results show that DeepIV achieves causal effect predictions comparable to established causal inference models and performs on par with traditional supervised learning methods. This demonstrates DeepIV’s practical reliability for real-world causal inference tasks.
Kunhan Wu*, Zihan Wang*, Jingyi Zhao, Haodong Xu, Tianming Hao, Wenzhi Lin (* equal contribution)
Journal of Physics: Conference Series 2023
We implement DeepIV, a pioneering framework that combines deep learning and instrumental variables for causal inference, to predict the effect of educational background on annual income using real-world datasets. Our results show that DeepIV achieves causal effect predictions comparable to established causal inference models and performs on par with traditional supervised learning methods. This demonstrates DeepIV’s practical reliability for real-world causal inference tasks.

Weijie Sun, Sunil Vasu Kalmady, Nariman Sepehrvand, Luan Manh Chu, Zihan Wang, Amir Salimi, Abram Hindle, Russell Greiner, Padma Kaul
NeurIPS 2022 Workshop on Learning from Time Series for Health 2022
We show that pre-training deep learning models on pre-pandemic health records and fine-tuning them with limited pandemic data can substantially improve ECG-based COVID-19 diagnosis and prognosis. This transfer learning approach demonstrates notable gains across three prediction tasks, highlighting its potential for rapid AI deployment in future outbreaks.
Weijie Sun, Sunil Vasu Kalmady, Nariman Sepehrvand, Luan Manh Chu, Zihan Wang, Amir Salimi, Abram Hindle, Russell Greiner, Padma Kaul
NeurIPS 2022 Workshop on Learning from Time Series for Health 2022
We show that pre-training deep learning models on pre-pandemic health records and fine-tuning them with limited pandemic data can substantially improve ECG-based COVID-19 diagnosis and prognosis. This transfer learning approach demonstrates notable gains across three prediction tasks, highlighting its potential for rapid AI deployment in future outbreaks.

Zihan Wang, Ruimin Chen, Mengxuan Liu, Guanfang Dong, Anup Basu
International Conference on Smart Multimedia 2022
We propose SPGNet, a method for 3D human pose estimation that integrates multi-dimensional re-projection into supervised learning. Our approach enforces kinematic constraints and jointly optimizes both 2D and 3D pose consistency, leading to improved accuracy. Experiments on the Human3.6M dataset show that SPGNet outperforms many state-of-the-art methods.
Zihan Wang, Ruimin Chen, Mengxuan Liu, Guanfang Dong, Anup Basu
International Conference on Smart Multimedia 2022
We propose SPGNet, a method for 3D human pose estimation that integrates multi-dimensional re-projection into supervised learning. Our approach enforces kinematic constraints and jointly optimizes both 2D and 3D pose consistency, leading to improved accuracy. Experiments on the Human3.6M dataset show that SPGNet outperforms many state-of-the-art methods.