Aim To investigate the model free multi step average reward reinforcement learning algorithm.
目的 讨论平均准则下控制马氏链的强化学习算法,在事先未知状态转移矩阵及报酬函数的条件下,通过试凑法寻找使得长期每阶段期望平均报酬最大的最优控制策略。
On the basis of analyzing and investigating the exist algorithms,and using the linear parameter estimation theory,a new class of average reward multi-step temporal-difference learning algorithms based on linear function approximations and recursive least s.
对非周期不可约Markov链上的线性函数近似平均报酬指标即时差分学习方法进行了研究。
An average reward reinforcement learning algorithm for control Markov chains is presented.
目的是寻找使得长期每阶段期望平均报酬最大的最优控制策略。