3rd Workshop on Advances in Language and Vision Research (ALVR)

In conjunction with ACL 2024
August 15~16st 2024 (Full Day)
Location: Bangkok, Thailand

Photo by boykpe on iStock

3rd Workshop on Advances in Language and Vision Research

Language and vision research has attracted great attention from both natural language processing (NLP) and computer vision (CV) researchers. Gradually, this area is shifting from passive perception, templated language, and synthetic imagery/environments to active perception, natural language, and photo-realistic simulation or real world deployment. This workshop covers (but is not limited to) the following topics:

  • Self-supervised vision and language pre-training;
  • New tasks and datasets that provide real-world solutions in language and vision;
  • Text-to-image/video generation and text-guided image/video editing;
  • External knowledge integration in visual and language understanding;
  • Visually-grounded natural language understanding and generation;
  • Language-grounded visual recognition and reasoning;
  • Language-grounded embodied agents, e.g., vision-and-language navigation;
  • Visually-grounded multilingual study, e.g., multimodal machine translation;
  • Shortcomings of the existing large vision\&language models on downstream tasks and solutions;
  • Ethics and bias on large vision\&language model.
  • Multidisciplinary study that may involve linguistics, cognitive science, robotics, etc.
  • Explainability and interpretability on large vision\&language model.

Call for Papers

Long papers may consist of up to 8 pages of content, plus unlimited pages for references and an appendix; final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be considered.

Short papers may consist of up to 4 pages of content, plus unlimited references and an appendix. Short papers will be given 5 content pages in the proceedings upon acceptance. Authors are encouraged to use this additional page to address reviewers’ comments in their final versions.

We are also including a non-archival track to allow dual submission of work to ALVR 2024 and other conferences/journals. Space permitting, these submissions will still participate and present their work in the workshop and will be hosted on the workshop website but will not be included in the official proceedings. Please apply the ACL format and submit through openreview but indicate that this is a cross-submission (non-archival) at the bottom of the submission form.

The submission website is https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/ALVR.

Schedule

All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

  • Paper submission deadline: May 17 (Friday), 2024
  • Notification of acceptance: June 17 (Monday), 2024
  • Camera-ready paper due: July 1 (Monday), 2024
  • Workshop dates: August 15, 2024

Invited Speakers


Dr. Alane Suhr is an Assistant Professor at UC Berkeley EECS. She received PhD in Computer Science at Cornell University, based at Cornell Tech in New York, NY, and advised by Yoav Artzi. Afterwards, she spent about a year in Seattle, WA at AI2 as a Young Investigator on the Mosaic team (led by Yejin Choi). Her research spans natural language processing, machine learning, and computer vision. She builds systems that use language to interact with people, e.g., in collaborative interactions (like CerealBar). She also designs models and datasets that address and represent problems in language grounding (e.g., NLVR), and develops learning algorithms for systems that learn language through interaction.

Dr. Angel Chang is an Assistant Professor at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs working on dialogue. She received my Ph.D. in Computer Science from Stanford, where she was part of the Natural Language Processing Group and advised by Chris Manning. Her research focuses on connecting language to 3D representations of shapes and scenes and grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and various datasets for 3D scene understanding. In general, she is interested in the semantics of shapes and scenes, the representation and acquisition of common sense knowledge, and reasoning using probabilistic models.

Dr. Daniel Fried is an assistant professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, working on natural language processing. His work focuses on enabling people to use language to interact with computers to carry out useful tasks in the world. One recurring theme in his work is pragmatics: viewing language as an action that people take in context to affect their communicative partners. He also studies domains where computers can complement human abilities. Recently, he has been focusing on code generation, aiming to make programming more communicative.

Dhruv Batra is an Associate Professor in the School of Interactive Computing at Georgia Tech and a Research Director in the Fundamental AI Research (FAIR) team at Meta. His research interests lie at the intersection of machine learning, computer vision, natural language processing, and AI. He is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE) (2019), the Early Career Award for Scientists and Engineers by the US Army (ECASE-Army) (2018), the Office of Naval Research (ONR) Young Investigator Program (YIP) award (2017), the National Science Foundation (NSF) CAREER award (2014), Army Research Office (ARO) Young Investigator Program (YIP) award (2014), Outstanding Junior Faculty awards from Georgia Tech (2018) and Virginia Tech (2015), multiple research awards from industry (Google, Amazon, Facebook), Carnegie Mellon Dean's Fellowship (2007), several best paper awards and nominations (ICLR 2023, CVPR 2022, ICCV 2019, EMNLP 2017) and teaching commendations.

Dr. Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department and Coordinated Science Laboratory of University of Illinois Urbana-Champaign. She is an Amazon Scholar. She is the Founding Director of Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge-enhanced Large Language Models, Knowledge-driven Generation and Conversational AI. She was selected as a Young Scientist to attend the 6th World Laureates Association Forum, and selected to participate in DARPA AI Forward in 2023.

Xin (Eric) Wang is an Assistant Professor of Computer Science and Engineering at UC Santa Cruz. His research interests include Natural Language Processing, Computer Vision, and Machine Learning, with an emphasis on Multimodal, Generative, and Embodied AI. He worked at Google AI, Facebook AI Research, Microsoft Research, and Adobe Research. Xin has served as Area Chair for conferences such as ACL, NAACL, EMNLP, ICLR, and NeurIPS, as well as a Senior Program Committee for AAAI and IJCAI. He organized workshops and tutorials at conferences such as ACL, NAACL, CVPR, and ICCV. He has received several awards and recognitions for his work, including CVPR Best Student Paper Award, Google Research Faculty Award, Amazon Alexa Prize Awards, and various gift awards from Adobe, Snap, eBay, etc.

Organizers

Jing Gu

UC Santa Cruz

Tsu-Jui (Ray) Fu

UC Santa Barbara

Drew Hudson

Google DeepMind

Asli Celikyilmaz

Fundamentals AI Research (FAIR) @ Meta

William Wang

UC Santa Barbara

Contact

Program Committee

  • Asma Ben Abacha
  • Microsoft
  • Shubham Agarwal
  • Mila
  • Arjun Akula
  • Google
  • Dhivya Chinnappa
  • Thomson Reuters
  • Simon Dobnik
  • University of Gothenburg
  • Yue Fan
  • University of California, Santa Cruz
  • Zhe Gan
  • Apple AI/ML
  • Cristina Garbacea
  • University of Michigan
  • Huaizu Jiang
  • Northeastern University
  • Yujie Lu
  • University of California, Santa Barbara
  • Loitongbam Sanayai Meetei
  • National Institute of Technology Silchar, India
  • Yulei Niu
  • Columbia University
  • Vikas Raunak
  • Microsoft
  • Arka Sadhu
  • University of Southern California
  • Thoudam Doren Singh
  • National Institute of Technology, Silchar, India
  • Alok Singh
  • National Institute of Technology, Silchar India
  • Ece Takmaz
  • University of Amsterdam
  • Hao Tan
  • Adobe Research
  • Yiming Xie
  • Northeastern University
  • Qianqi Yan
  • University of California, Santa Cruz
  • Kaizhi Zheng
  • University of California, Santa Cruz
  • Wanrong Zhu
  • University of California, Santa Barbara