• جزئیات بیشتر مقاله
    • تاریخ ارائه: 1392/07/24
    • تاریخ انتشار در تی پی بین: 1392/07/24
    • تعداد بازدید: 1159
    • تعداد پرسش و پاسخ ها: 0
    • شماره تماس دبیرخانه رویداد: -
     temporal-difference learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. the key idea is to update a value function from episodes of real experience, by bootstrapping from future value estimates, and using value function approximation to generalise between related states. monte-carlo tree search is a recent algorithm for high-performance search, which has been used to achieve master-level play in go. the key idea is to use the mean outcome of simulated episodes of experience to evaluate each state in a search tree. we introduce a new approach to high-performance search in markov decision processes and two-player games. our method, temporal-difference search, combines temporal-difference learning with simulation-based search. like monte-carlo tree search, the value function is updated from simulated experience; but like temporal-difference learning, it uses value function approximation and bootstrapping to efficiently generalise between related states. we apply temporal-difference search to the game of 9×9 go, using a million binary features matching simple patterns of stones. without any explicit search tree, our approach outperformed an unenhanced monte-carlo tree search with the same number of simulations. when combined with a simple alpha-beta search, our program also outperformed all traditional (pre-monte-carlo) search and machine learning programs on the 9×9 computer go server.

سوال خود را در مورد این مقاله مطرح نمایید :

با انتخاب دکمه ثبت پرسش، موافقت خود را با قوانین انتشار محتوا در وبسایت تی پی بین اعلام می کنم
مقالات جدیدترین رویدادها
مقالات جدیدترین ژورنال ها