In this paper , introducing joint - action to the traditional reinforcement learning , a new multi - agent reinforcement learning algorithm based on behavior prediction is presented and several methods for predicting other agents " behaviors are discussed 在傳統(tǒng)強(qiáng)化學(xué)習(xí)方式中引入組合動(dòng)作的基礎(chǔ)上,本文提出了一種基于行為預(yù)測(cè)的多智能體強(qiáng)化學(xué)習(xí)方法,研究了對(duì)其他智能體行為進(jìn)行預(yù)測(cè)的幾種可行方法。
The reinforcement learning algorithm was also introduced , since it has some relations with the colony algorithm and can be need in the problem of scheduling . 4 . some new concepts and scheduling algorithms for batch chemical process were proposed in our studies 由于蟻群算法與人工智能中的強(qiáng)化學(xué)習(xí)算法之間有著某種聯(lián)系,同時(shí)強(qiáng)化學(xué)習(xí)近年來(lái)也應(yīng)用于求解調(diào)度問(wèn)題,因此本文也涉及到了一些強(qiáng)化學(xué)習(xí)的主要算法。
Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved 在求解離散行為空間markov決策過(guò)程( mdp )最優(yōu)策略的增強(qiáng)學(xué)習(xí)算法研究方面,研究了小腦模型關(guān)節(jié)控制器( cmac )在mdp行為值函數(shù)逼近中的應(yīng)用,分析了基于cmac的直接梯度算法對(duì)mdp狀態(tài)空間離散化的特點(diǎn),研究了兩種改進(jìn)的cmac編碼結(jié)構(gòu),即:非鄰接重疊編碼和變尺度編碼,以提高直接梯度學(xué)習(xí)算法的收斂速度和泛化性能。
By means of the proposed reinforcement learning algorithm and modified genetic algorithm , neural network controller whose weights are optimized could generate time series small perturbation signals to convert chaotic oscillations of chaotic systems into desired regular ones . the computer simulations on controlling henon map and logistic chaotic system have demonstrated the capacity of the presented strategy by suppressing lower periodic orbits such as period - 1 and period - 2 . meanwhile , the periodic control methodology is utilized , the higher periods such as period - 4 can also be successfully directed to expected periodic orbits 該控制方法無(wú)需了解系統(tǒng)的動(dòng)態(tài)特性和精確的數(shù)學(xué)模型,也不需監(jiān)督學(xué)習(xí)所要求的訓(xùn)練數(shù)據(jù),通過(guò)增強(qiáng)學(xué)習(xí)訓(xùn)練方式,采用改進(jìn)遺傳算法優(yōu)化神經(jīng)網(wǎng)絡(luò)權(quán)系數(shù),使之成為混沌控制器,便可產(chǎn)生控制混沌系統(tǒng)的時(shí)間序列小擾動(dòng)信號(hào),仿真實(shí)驗(yàn)結(jié)果表明它不僅能有效鎮(zhèn)定混沌周期1 、 2等低周期軌道,而且在周期控制技術(shù)基礎(chǔ)上,也可成功將高周期混沌軌道(如周期4軌道)變成期望周期行為。
L3ased on the organization rules of internet data , the distribution laws of hyperlinks and the name rules of url , a algorithm of tvm rebuilding is established , and satisfactory experiment results are obtained by applying this algorithm . furthermore , efforts are made by applying of tvm on browse navigation , web page classification and reinforcement learning algorithm 結(jié)合互聯(lián)網(wǎng)資源的構(gòu)建規(guī)則、鏈接分布規(guī)律和url命名規(guī)則,論文提出了樹(shù)藤共生數(shù)據(jù)模型的重建算法,實(shí)驗(yàn)結(jié)果驗(yàn)證了樹(shù)藤共生模型的有效性與合理性,在此基礎(chǔ)上初步討論了樹(shù)藤共生模型在瀏覽導(dǎo)航、網(wǎng)頁(yè)分類和reinforcementlearning算法中的應(yīng)用。