About

Viet Lai is a research scientist at Kensho Technologies, the AI Research Hub of S&P Global. He obtained his Ph.D. in Computer Science from theDepartment of Computer Science, University of Oregon. He was advised by Thien Huu Nguyen in UofO Natural Language Processing group.

He obtained his M.S. degree in Computer Science from the Japan Advanced Institute of Science and Technology (with Minh Le Nguyen) and his bachelor's degree in Computer Science from the Posts and Telecommunications Institute of Science and Technology.

His Ph.D. research involves an understanding of human languages that can be applied in computers to automate human-language cognitive tasks, specifically, extracting and structurizing information from massive written text across domains. He interned Adobe Research and worked with Franck Dernoncourt. He worked in projects to enhance language understanding for video applications such as subtitle segmentation, punctuation restoration, chitchat detection.

Contact me

Email

vietl at uoregon dot edu

Office

44 Brattle St, Cambridge, MA 02138

Other

Publications

2024

  • CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

    Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen | LREC-COLING | pdf

  • CAMAL: A Novel Dataset for Multi-label Conversational Argument Move Analysis

    Viet Dac Lai, Duy Pham, Jonathan Steinberg, Jamie Mikeska, Thien Huu Nguyen | LREC-COLING | pdf [To appear]

  • DocFinQA: A Long-Context Financial Reasoning Dataset

    Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Chris Tanner | Preprint | pdf

  • BizBench: A Quantitative Reasoning Benchmark for Business and Finance

    Rik Koncel-Kedziorski, Michael Krumdick, Viet Dac Lai, Varshini Reddy, Charles Lovering, Chris Tanner | Preprint | pdf

  • Using Machine Learning to Detect Student Learning Levels along a Learning Progression

    Duy Pham, Viet Dac Lai | NCME 2024 | pdf

2023

  • Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

    Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen | EMNLP 2023 Demo | pdf

  • Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

    Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen | INTERSPEECH 2023 | pdf

  • ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

    Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen | Findings of EMNLP 2023 | pdf

  • Automated scoring of argumentation-focused teaching transcripts: Challenges and added value of human annotations

    Duy Pham, Viet Dac Lai, Jamie Mikeska, Jonathan Steinberg, Heather Howell, Thien Huu Nguyen | NCME 2023 | pdf

2022

  • Few-Shot Cross-Lingual Learning for Event Detection

    Luis Guzman Nateras, Viet Dac Lai, Franck Dernoncourt and Thien Huu Nguyen | MRL@EMNLP 2022 | pdf

  • Multilingual SubEvent Relation Extraction: A Novel Dataset and Structure Induction Method

    Viet Dac Lai, Hieu Man, Linh Ngo, Franck Dernoncourt and Thien Huu Nguyen | Findings of EMNLP 2022 | pdf

  • MECI: A Multilingual Dataset for Event Causality Identification

    Viet Dac Lai, Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt and Thien Huu Nguyen | COLING 2022 | pdf

  • Event Extraction in Video Transcripts

    Amir Pouran Ben Veyseh, Viet Dac Lai, Franck Dernoncourt and Thien Huu Nguyen | COLING 2022 | pdf

  • SemEval 2022 Task 12: Symlink - Linking Mathematical Symbols to their Descriptions

    Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, and Thien Huu Nguyen | SemEval 2022 @ NAACL | pdf

  • BehancePR: A Punctuation Restoration Dataset for Livestreaming Video Transcript

    Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, and Thien Huu Nguyen | Findings of NAACL 2022 | pdf

  • Event Detection for Suicide Understanding

    Luis Guzman-Nateras,Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, and Thien Huu Nguyen | Findings of NAACL 2022 | pdf

  • BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts

    Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, and Thien Huu Nguyen | LREC 2022 | pdf

  • BehanceQA: A New Dataset for Identifying Question-Answer Pairs in Video Transcripts

    Amir Pouran Ben Veyseh, Viet Dac Lai, Franck Dernoncourt, and Thien Huu Nguyen | LREC 2022 | pdf

2021

  • Learning Prototype Representations Across Few-Shot Tasks for Event Detection

    Viet Dac Lai, Franck Dernoncourt, and Thien Huu Nguyen | EMNLP 2021 | pdf

  • Event Extraction from Historical Texts: A New Dataset for Black Rebellions

    Viet Dac Lai, Minh Nguyen, Heidi Kaufman, and Thien Huu Nguyen | ACL-IJCNLP 2021 | pdf

  • Unleash GPT-2 Power for Event Detection

    Amir Pouran Ben Veyseh, Viet Dac Lai, Franck Dernoncourt and Thien Huu Nguyen | ACL-IJCNLP 2021 (Findings) | pdf

  • Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection

    Viet Dac Lai, Minh Nguyen, Thien Huu Nguyen and Franck Dernoncourt | SIGIR 2021 | pdf

  • Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

    Minh Van Ngo, Viet Dac Lai, Amir Pouran Ben Veyseh and Thien Huu Nguyen | EACL 2021 | pdf | Demo | Docs

  • Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks

    Minh Van Nguyen, Viet Dac Lai and Thien Huu Nguyen | NAACL-HLT 2021 | pdf

2020

  • Event Detection: Gate Diversity and Syntactic Importance Scores for Graph Convolution Neural Networks

    Viet Dac Lai, Tuan Ngo Nguyen and Thien Huu Nguyen | EMNLP 2020 | pdf

  • Extensively Matching for Few-shot Learning Event Detection

    Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen | NUSE@ACL 2020 | pdf

  • Exploiting the Matching Information in the Support Set for Few Shot Event Classification

    Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen | PAKDD 2020 | pdf

2019

  • Extending Event Detection to New Types with Learning from Keywords

    Viet Dac Lai, Thien Huu Nguyen | W-NUT@EMNLP 2019 | pdf

2018 and before

  • TSix: A Human-involved-creation Dataset for Tweet Summarization

    Minh-Tien Nguyen, Viet Dac Lai, Huy-Tien Nguyen and Minh-Le Nguyen | LREC 2018 | pdf

  • Deletion-based sentence compression using Bi-enc-dec LSTM

    Viet Dac Lai, Nguyen Truong Son, Nguyen Le Minh | PACLING 2017 | pdf

  • VSoLSCSum: Building a vietnamese sentence-comment dataset for social context summarization

    Minh-Tien Nguyen, Viet Dac Lai, Phong-Khac Do, Duc-Vu Tran, Minh-Le Nguyen | ALR@COLING 2016 | pdf

Patents

Low Resource Event Detection (US, pending)

Reinforced Learning Approach to Generate Data (US, pending)

SubEvent Relation Extraction (US, pending)

Awards

Best paper Runner-up Award, MRL@EMNLP, 2022

Erwin & Gertrude Juilfs Scholarship, CIS, UOregon, 2022

Adobe Research Fellowship, 2022

Outstanding demo paper award, EACL 2021

Best Graduate Teaching Assitant, CIS, UOregon, 2021

Services

Reviewer:

Organizer:

-->