王广润

副教授

联系邮箱: wanggrun@gmail.com

联系地址: jbo竞博电竞官方网站东校区南基础实验楼D302

个人主页: https://wanggrun.github.io/

教师简介

王广润,jbo竞博电竞官方网站博士生导师、副教授(引进人才,新系列)。国家级“四青人才”。华为战略研究院人才Funding获得者(大学教职)。

被引进前在英国牛津大学担任Research Fellow,合作导师是英国皇家学会(皇家科学院)院士、英国皇家工程院院士、英国图灵研究院院士Philip H.S. Torr教授。在英国期间,兼任英国Aistetic公司研究团队项目负责人。与海外头部科技公司有紧密合作。在jbo竞博电竞官方网站原信息科学与技术学院和管理学院分别获得工学和管理学“双学士学位”,在jbo竞博电竞官方网站获得博士学位(其间在香港中文大学信息工程系交流),师从林倞教授。曾在暗物智能科技有限公司进行研究工作。发表A类会议和中科院一区论文近40篇。其中大会口头报告论文11篇(Oral presentation)。ESI高被引2篇,专利11项。入选华为天才少年计划的“最高级别”、获评吴文俊人工智能优秀博士论文(当年全国仅9名)、《Pattern Recognition》最佳论文(当年全球仅1篇)、IJCAI“杰出高级程序委员”(当年全球仅18人)、全球AI华人新星榜(当年机器学习领域全球仅25人)、多个国际竞赛最高荣誉、六次被评为CCF A类会议(IJCAI-21, NeurIPS-22, ICLR-22, ICLR-21, ICCV-21, NeurIPS-21)的顶尖、高光、杰出审稿人,论文获图灵奖得主Yann LeCun评论。参与组织ICML 2024和CVPR 2023的Workshop。合作指导牛津大学和jbo竞博电竞官方网站多名学生。研究成果在多家公司有部署落地。

研究领域

主要研究为新一代AI架构:参见 https://thegreatailab.github.io/。 专注于开发能够学习表征并发展智能的人工智能模型,从现实世界的复杂性中汲取灵感,并将人工智能应用于解决各个实际领域的挑战(即,世界为人工智能,人工智能为世界)。我们工作的长期目标是创造能够在广泛任务中自主运行的通用人工智能体。这一愿景旨在模糊人类和人工智能能力之间的界限,使人工智能和人类都能发展成类似的通用智能体(因为人也是一种通用智能体),从而增强人工智能与人类智能之间的协同效应。

1. 2024年诺贝尔奖得主Demis Hassabis说,谷歌AlphaGo和Gemini强强联合,想创造第二个Transformer,什么才是取代Transformer的新一代架构?

2. Ilya Sutskever认为GPT是实现通用人工智能道路之一,但是GPT与人类认知是不一致的,GPT架构是幻觉(Hallucination)的来源。什么才是取代GPT的新一代生成范式?

3. 人类将穷尽文本语料来训练大模型。大模型是否接近天花板?不是的。其他模态还有很多数据。但是多模态大模型缺乏原生统一框架。什么才是新一代多模态生成理解框架?

4. 假设将一个智能体置于密室中,它能否独立完成密室逃脱任务?在通用人工智能时代,生成式人工智能强调具备多模态内容生成能力的通用智能体。随着大语言模型如ChatGPT和OpenAI o1的发展,文本生成技术取得了显著进展,尤其在推理能力上得到了大幅提升,能够解答各种数学问题并进行复杂推理。然而,随着技术的不断发展,我们更为关注的是面向日常生活推理和多模态生成的智能体。这要求智能体不仅能够处理传统的数学题,还能够应对更具挑战性的现实任务。

关注:1. 新一代AI架构、可泛化的长思考AI Agent; 2. 多模态生成式AI:2/3/4D 视觉建模;语言建模;序列建模; 3. 自然科学

---------------------

Guangrun Wang, PhD advisor and Associate Professor (new talent series) at the School of Computer Science, Sun Yat-sen University. He is a Laureate of the widely-renowned National Talent Young Scientist Funding. He is also a laureate of Huawei Strategic Research Institute Talent Funding (University Faculty).

Before being recruited, he was a Research Fellow at the University of Oxford in the UK, with Professor Philip H. S. Torr, a fellow of the Royal Society, and a fellow of the Royal Academy of Engineering. During his time in the UK, he also served as the project leader for the research team at Aistetic, a UK-based company. He has close collaborations with leading global tech companies. He holds dual bachelor's degrees in engineering and business from the School of Information Science and Technology and the Sun Yat-sen Business School, and earned a PhD from the School of Computer Science at Sun Yat-sen University (during which he visited the Department of Information Engineering at The Chinese University of Hong Kong) under the supervision of Professor Liang Lin. He has conducted research at Dark Matter Intelligence Technology Co., Ltd. He has published nearly 40 papers in top conferences and journals, including 11 oral presentation papers. He has 2 ESI highly cited papers and 11 patents. He was selected for Huawei's "Top Talent" program, received the Wu Wenjun Artificial Intelligence Outstanding Doctoral Thesis Award (only 9 in the country that year), the Best Paper Award at Pattern Recognition (only 1 paper globally that year), the IJCAI "Outstanding Senior Program Committee Member" (only 18 people worldwide that year), the Global AI Chinese New Star list (only 25 people in the global machine learning field that year), multiple highest honors in international competitions, and six times recognized as an outstanding reviewer for CCF A-level conferences (IJCAI-21, NeurIPS-22, ICLR-22, ICLR-21, ICCV-21, NeurIPS-21). His paper has been used as supporting evidence in Yann LeCun's paper. He has participated in organizing workshops for ICML 2024 and CVPR 2023. He has co-supervised students from both Oxford University and Sun Yat-sen University. His research outcomes have been deployed in multiple companies.

Currently, he is focused on developing AI models that can learn representations and develop intelligence, drawing inspiration from the complexities of the real world and, in turn, applying AI to solve challenges in various real-world domains (i.e., World for AI and AI for World). The long-term goal of our work is to create generalizable AI agents that can function autonomously across a wide range of tasks. This vision extends to blurring the line between human and AI capabilities, where both AI and humans can evolve into similar general intelligence agents (a human is also a general intelligence agent), enhancing the synergy between artificial intelligence and human intelligence.
 

For example, some questions linger in our minds:

1. Demis Hassabis, the 2024 Nobel laureate, said that the powerful combination of Google AlphaGo and Gemini aims to create the second Transformer. What is the next-generation architecture that will replace Transformer?

2. Ilya Sutskever believes that GPT is one of the paths to achieving general artificial intelligence, but GPT is inconsistent with human cognition, and the GPT architecture is a source of hallucinations. What is the next-generation generative paradigm that will replace GPT?

3. Humans will exhaust textual corpora to train large models. Are large models close to their ceiling? No. There is still a lot of data in other modalities. However, multimodal large models lack a native unified framework. What is the next-generation multimodal generative understanding framework?

4. Imagine placing an AI agent in a locked room. Can it independently complete the escape room task? In the era of general artificial intelligence, generative AI emphasizes the need for general intelligent agents with multimodal content generation capabilities. With the development of large language models like ChatGPT and OpenAI o1, text generation technology has made significant progress, especially in reasoning abilities, allowing it to solve various mathematical problems and perform complex reasoning. However, as technology continues to evolve, our focus is shifting toward intelligent agents for daily life reasoning and multimodal generation. This requires agents not only to handle traditional math problems but also to tackle more challenging real-world tasks.

Our main research areas include the new generation of AI architectures:

1. New-generation AI architecture, generalizable long-thinking AI agents.

2. Multimodal generative AI: 2/3/4D visual modeling; language modeling; sequence modeling.

3. Natural sciences.