Large language model-enhanced reinforcement learning for generic bus holding control strategies,Transportation Research Part E: Logistics and Transportation Review

当前位置： X-MOL 学术 › Transp. Res. Part E Logist. Transp. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large language model-enhanced reinforcement learning for generic bus holding control strategies
Transportation Research Part E: Logistics and Transportation Review ( IF 8.3 ) Pub Date : 2025-05-26 , DOI: 10.1016/j.tre.2025.104142
Jiajie Yu, Yuhong Wang, Wei Ma

Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents’ performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.

中文翻译：

用于通用总线保持控制策略的大型语言模型增强强化学习

总线保持控制是一种广泛采用的策略，用于维护总线系统的稳定性和提高其运行效率。传统的基于模型的方法经常面临公交状态预测和乘客需求估计准确性低的挑战。相比之下，强化学习（RL）作为一种数据驱动的方法，在制定公交车保持策略方面表现出了巨大的潜力。RL 确定最优控制策略，以最大化累积奖励，这反映了总体控制目标。然而，将实际任务中的稀疏和延迟控制目标转化为 RL 的密集和实时奖励是具有挑战性的，通常需要大量的手动试错。有鉴于此，本研究利用大型语言模型（LLM）的上下文学习和推理功能，引入了一种自动奖励生成范式。这种新范式称为 LLM 增强型 RL，由几个基于 LLM 的模块组成：奖励初始化器、奖励修饰符、性能分析器和奖励优化器。这些模块根据指定的基于 RL 的任务的训练和测试结果的反馈，协作初始化和迭代改进奖励函数。过滤掉 LLM 生成的无效奖励函数，以确保 RL 代理的性能在迭代过程中的稳定演变。为了评估所提出的 LLM 增强 RL 范式的可行性，它被应用于广泛的公交车保持控制场景，这些场景在公交线路、站点和乘客需求方面各不相同。结果表明，与普通 RL 策略、基于 LLM 的控制器、基于物理的反馈控制器和基于优化的控制器相比，所提出的范式具有优越性、泛化能力和稳健性。本研究揭示了在各种智能移动应用中使用 LLM 的巨大潜力。

更新日期：2025-05-26

点击分享查看原文

点击收藏

阅读更多本刊新发论文