Multimodal data fusion with large language models

Organizers

  • Weiwei Jiang, Beijing University of Posts and Telecommunications, China
  • Stefano Cirillo, University of Salerno, Italy
  • Ahmad Taher Azar, Prince Sultan University, Saudi Arabia
  • Muhammet Deveci, University College London, UK

Abstract

In the era of big data, information is increasingly multimodal, encompassing text, images, audio, and more. This special session at ICANN 2026, titled “Multimodal Data Fusion with Large Language Models”, aims to explore the cutting-edge advancements and challenges in integrating diverse data types using large language models (LLMs). LLMs have shown remarkable capabilities in natural language processing, but their potential extends beyond text. By incorporating multimodal data, these models can achieve a more comprehensive understanding of complex real-world scenarios. This session will delve into various aspects of multimodal data fusion.

This special session seeks to bring together researchers, practitioners, and industry leaders to share their insights, foster collaboration, and drive the development of multimodal data fusion techniques that leverage the power of large language models. By doing so, we aim to push the boundaries of artificial intelligence and unlock new possibilities for solving complex, real-world problems through a multimodal lens.

List of Topics Covered in the Special Session

  • Techniques for Multimodal Integration: Presenting novel methods for effectively combining different data modalities with LLMs, such as cross-modal attention mechanisms and multimodal pre-training strategies
  • Applications Across Domains: Showcasing practical applications in fields like healthcare (e.g., medical imaging and electronic health records), autonomous driving (e.g., sensor data and textual instructions), and multimedia content creation (e.g., video and text)
  • Challenges and Solutions: Addressing key challenges such as data alignment, modality-specific biases, and computational efficiency, while proposing innovative solutions to enhance the robustness and scalability of multimodal LLMs
  • Ethical and Social Implications: Discussing the ethical considerations of multimodal data fusion, including privacy concerns, fairness, and the potential impact on society