![]() |
Alt name: Daniel Kao, Xuhong Kao
Google ScholarGithub DBLP X (Twitter) |
I am a first-year PhD student in Computer Science in the CVML Group at National University of Singapore (NUS), advised by Prof. Angela Yao. My research interests sit at the intersection of deep learning and computer vision, with a specific focus on video understanding and LLM/VLM reasoning.
Previously, I received the M.Phil. in Computer Science and Engineering from the Hong Kong University of Science and Technology (HKUST) under the supervision of Prof. Chi-Keung Tang, collaborating closely with Prof. Yu-Wing Tai from Dartmouth College. I also did research in 3D reconstruction and interned in the Media Computing Group at Microsoft Research Asia, working with Dr. Xiao Li and Dr. Jinglu Wang. Prior to graduate school, I earned the Bachelor of Science degree from HKUST, double-majoring in Data Science and Technology and in Computer Science. I completed my undergraduate thesis in Prof. Shueng-Han Gary Chan's lab, in cooperation with Dr. Jierun Chen.
| Mar. 2026 | CoT-Seg is accepted to ICLR 2026 Workshop RSI. |
| Jan. 2026 | CoT-RVS is accepted to ICLR 2026. |
| Jan. 2026 | I joined the PhD program at NUS. |
| Jun. 2025 | StreamGS is accepted to ICCV 2025. |
| Jan. 2025 | I was honored to receive the Stars of Tomorrow Award from Microsoft Research Asia. |
- About my name: I go by the name Daniel in daily life. My Chinese name, Shiu-hong (pronounced similarly to "she hong" with an "o" sound), is typically used only in official documents and my publications. In mainland China, I am sometimes referred to as Xuhong, which is based on a different homophonic transliteration system.
|
![]() Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang International Conference on Learning Representations (ICLR), 2026 [arXiv] [Project page] [Code] [Data] We propose CoT-RVS to extract the temporal-semantic correlation in videos with chain of thoughts and achieve the state-of-the-art performance for reasoning video segmentation. |
![]() |
![]() Shiu-hong Kao*, Chak-Ho Huang*, Huaiqian Liu*, Yu-Wing Tai, Chi-Keung Tang ICLR 2026 workshop on AI with Recursive Self-Improvement (RSI 2026) [arXiv] [Project page] [Code] [Data] *Equal contribution. We propose CoT-Seg, a modular reasoning segmentation framework that significantly improves the quality of segmentation masks with chain-of-thought reasoning and self-correction loop. |
![]() |
![]() Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang Arxiv preprint 2025. [arXiv] [Project page] We propose ThinkFirst, a novel Chain-of-Thought (CoT) reasoning segmentation framework that generates an accurate object mask given a text prompt. ThinkFirst can handle difficult scenarios such as implicit queries, camouflaged objects, out-of-domain objects with easy control. |
|
![]() Shiu-hong Kao, Xiao Li, Jinglu Wang, Yang Li, Chi-Keung Tang, Yu-Wing Tai, Yan Lu Arxiv preprint 2025. [arXiv] [Demo] We introduce UVRM, a novel 3D reconstruction model capable of being trained and evaluated on 360-degree monocular videos without requiring any information about the pose. |
|
![]() Yang Li, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Yingcong Chen, Yan Lu International Conference on Computer Vision (ICCV), 2025 [arXiv] We propose StreamGS, an online generalizable 3DGS reconstruction method for unposed image streams, which progressively transform image streams to 3D Gaussian streams by predicting and aggregating per-frame Gaussians. |
![]() |
![]() Chia-Hong Hsu, Shiu-hong Kao, Randall Balestriero Arxiv preprint 2025. [arXiv] [Project page] We propose invertible Guided Consistency Training (iGCT), a data-driven training framework for guided consistency model, contributing to fast and guided image generation and editing without requiring the training and distillation of any diffusion models. |
|
![]() Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang Arxiv preprint 2024. [arXiv] We propose InceptionHuman, a NeRF-based generative framework incorporating state-of-the-art diffusion models, which receives any types and any sizes of prompts, (e.g. text, pose, style) to generate realistic 3D human. |
|
![]() Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang European Conference on Computer Vision (ECCV), 2024 [arXiv] [Project page] We introduce Deceptive-NeRF/3DGS, a new method for enhancing the quality of reconstructed NeRF/3DGS models using synthetically generated pseudo-observations, capable of handling sparse input and removing floater artifacts. |
![]() |
![]() Shiu-hong Kao*, Jierun Chen*, S.-H. Gary Chan Arxiv preprint 2023. [arXiv] [Code] *Equal contribution. We reveal the issue of Inter-block Optimization Entanglement (IBOE) in end-to-end KD training and further propose StableKD to stablilize optimization. Extensive experiments show StableKD achieve high accuracy, fast convergence, and high data efficiency. |
![]() |
![]() Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 [Paper] [Code] We propose a simple yet fast and effective partial convolution (PConv), as well as a latency-efficient family of network architectures called FasterNet. |
|
Research Intern Media Computing Group, Microsoft Research Asia Jun 2024 - Dec 2024 Conduct cutting-edge research and publish state-of-the-art papers in the 3D computer vision domain, contributing significantly to advancements in the research community. |
|
|
Software Engineer Intern Hong Kong Applied Science and Technology Research Institute (ASTRI) Jun 2021 - Aug 2021 Design and develop algorithms to solve problems; researching the latest technologies; tasks including image processing, OCR, data parsing, deep learning, etc. [report] |