Publications

All papers have been subject to peer review unless indicated otherwise. *indicates equal contributions.

Click on a paper to view its short summary.

SFR-RAG: Towards Contextually Faithful LLMs

Technical Report., 2024

SFR-RAG: Towards Contextually Faithful LLMs

Citation: Xuan-Phi Nguyen, Shrey Pandit, Senthil Purushwalkam, Austin Xu, Hailin Chen, Yifei Ming, Zixuan Ke, Silvio Savarese, Caiming Xong, Shafiq Joty (2024). SFR-RAG: Towards Contextually Faithful LLMs. Arxiv Preprint - Technical report.
Paper Link: https://arxiv.org/abs/2409.09916

SeaLLMs - Large Language Models for Southeast Asia

ACL 2024 (DEMO) - Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024

SeaLLMs - Large Language Models for Southeast Asia

Citation: Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang, Chaoqun Liu, Hang Zhang, Lidong Bing (2024). SeaLLMs - Large Language Models for Southeast Asia. ACL 2024 - Proceedings of the Annual Meeting of the Association for Computational Linguistics.
Paper Link: https://arxiv.org/abs/2312.00738

ParaICL: Towards Robust Parallel In-Context Learning

ACL 2024 (DEMO) - Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024

ParaICL: Towards Robust Parallel In-Context Learning

Citation: Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing (2024). ParaICL: Towards Robust Parallel In-Context Learning. Arxiv Preprint.
Paper Link: https://arxiv.org/abs/2404.00570

Democratizing LLMs for low-resource languages by leveraging their english dominant abilities with linguistically-diverse prompts

ACL 2024 - Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2023

Democratizing LLMs for low-resource languages by leveraging their english dominant abilities with linguistically-diverse prompts

Citation: Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing (2024). Democratizing LLMs for low-resource languages by leveraging their english dominant abilities with linguistically-diverse prompts. ACL 2024 - Proceedings of the Annual Meeting of the Association for Computational Linguistics.
Paper Link: https://arxiv.org/pdf/2306.11372

Large language models are not yet human-level evaluators for abstractive summarization

EMNLP 2023 - The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Large language models are not yet human-level evaluators for abstractive summarization

Citation: Shen, Chenhui, Liying Cheng, Xuan-Phi Nguyen, Yang You, and Lidong Bing (2023). Large language models are not yet human-level evaluators for abstractive summarization. EMNLP 2023 - The 2023 Conference on Empirical Methods in Natural Language Processing
Paper Link: https://arxiv.org/pdf/2305.13091

A hierarchical encoding-decoding scheme for abstractive multi-document summarization

EMNLP 2023 - The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A hierarchical encoding-decoding scheme for abstractive multi-document summarization

Citation: Shen, Chenhui, Liying Cheng, Xuan-Phi Nguyen, Yang You, and Lidong Bing (2023). A hierarchical encoding-decoding scheme for abstractive multi-document summarization. EMNLP 2023 - The 2023 Conference on Empirical Methods in Natural Language Processing
Paper Link: https://arxiv.org/abs/2210.14514

Improving Speech-to-Speech Translation Through Unlabeled Text

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, 2023

Improving Speech-to-Speech Translation Through Unlabeled Text

Citation: Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong (2023). Improving Speech-to-Speech Translation Through Unlabeled Text. 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023).
Paper Link: https://arxiv.org/abs/2210.14514

Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, USA, 2022

Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

Citation: Xuan-Phi Nguyen, Shafiq Joty, Wu Kui & Aw Ai Ti (2022). Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
Paper Link: https://arxiv.org/abs/2205.15544

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

International Conference on Learning Representations (ICLR-22) 2022, 2022

Fully unsupervised mining method that can built synthetic parallel data for unsupervised machine translation

Citation: Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, and Shafiq Joty (2022). Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation. In International Conference on Learning Representations (ICLR) 2022.
Paper Link: https://openreview.net/pdf?id=pN1JOdrSY9

Cross-model Back-translated Distillation for Unsupervised Machine Translation

38th International Conference on Machine Learning (ICML), 2021

A novel strategy to improve unsupervised MT by using back-translation with multiple models.

Citation: Xuan-Phi Nguyen, Shafiq Joty, Thanh-Tung Nguyen, Wu Kui, & Ai Ti Aw (2021). Cross-model Back-translated Distillation for Unsupervised Machine Translation. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021).
Paper Link: https://arxiv.org/abs/2006.02163

A Conditional Splitting Framework for Efficient Constituency Parsing

ACL 2021 - The 59th Annual Meeting of the Association for Computational Linguistics, 2021

A Seq2Seq parsing framework that casts constituency parsing problems into a series of conditional splitting decisions.

Citation: Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty & Xiaoli Li (2021). A Conditional Splitting Framework for Efficient Constituency Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
Paper Link: not-ready-yet

RST Parsing from Scratch

Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021

A novel top-down end-to-end formulation of document level discourse parsing in the Rhetorical Structure Theory (RST) framework.

Citation: Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty & Xiaoli Li (2021). In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2021).
Paper Link: https://www.aclweb.org/anthology/2020.acl-main.589/

Data Diversification: An Elegant Strategy For Neural Machine Translation

34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020

A simple way to boost many NMT tasks by using multiple backward and forward models.

Citation: Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, & Ai Ti Aw (2019). Data Diversification: An Elegant Strategy For Neural Machine Translation. In the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
Paper Link: https://arxiv.org/abs/1911.01986

Tree-structured Attention with Hierarchical Accumulation

International Conference on Learning Representations (ICLR), 2020

A novel attention mechanism that aggregates hierarchical structures to encode constituency trees for downstream tasks.

Citation: Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, & Richard Socher (2020). Tree-Structured Attention with Hierarchical Accumulation. In International Conference on Learning Representations.
Paper Link: https://arxiv.org/abs/2002.08046

Efficient Constituency Parsing by Pointing

ACL 2020 - The 58th Annual Meeting of the Association for Computational Linguistics, 2020

A new parsing method that employs pointing mechanism to perform top-down decoding. The method is competitive with the state-of-the-art while being faster.

Citation: Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty & Xiaoli Li (2020). Efficient Constituency Parsing by Pointing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
Paper Link: https://www.aclweb.org/anthology/2020.acl-main.301/

Differentiable Window for Dynamic Local Attention

ACL 2020 - The 58th Annual Meeting of the Association for Computational Linguistics, 2020

Using differentiable windows to perform local attentions greatly improve performance of machine translation and language modeling.

Citation: Xuan-Phi Nguyen*, Thanh-Tung Nguyen*, Shafiq Joty & Xiaoli Li (2020). Differentiable Window for Dynamic Local Attention. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
Paper Link: https://www.aclweb.org/anthology/2020.acl-main.589/

Enhancing Attention with Explicit Phrasal Alignments

arXiv preprint, 2019

A new attention mechanism that considers phrases as attention entities instead of individual tokens in MT.

Citation: Xuan-Phi Nguyen, Shafiq Joty, Thank-Tung Nguyen (2019). Enhancing Attention with Explicit Phrasal Alignments.
Paper Link: https://openreview.net/forum?id=BygPq6VFvS

Medical Image Segmentation with Stochastic Aggregated Loss in a Unified U-Net

2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 2018

Traditional U-Net models suffer from gradient vanishing under certain circumstances, such as detecting the existence of tumors in the brain. We introduce a novel Stochastic Aggregated Loss that improves the gradient flows of U-Net and performance.

Citation: P. X. Nguyen, Z. Lu, W. Huang, S. Huang, A. Katsuki & Z. Lin (2019).Medical image segmentation with stochastic aggregated loss in a unified U-Net. In 2019 IEEE EMBS International Conference on Biomedical Health Informatics (BHI) (IEEE BHI 2019), Chicago, USA.
Paper Link: https://ieeexplore.ieee.org/document/8834667

Xuan-Phi Nguyen (Phi)

Publications