Uso de heurísticas para a aceleração do aprendizado por reforço.

Bianchi, Reinaldo Augusto da Costa

doi:10.11606/T.3.2004.tde-28062005-191041

Home

Facilities

Doctoral Thesis

DOI

https://doi.org/10.11606/T.3.2004.tde-28062005-191041

Document

Doctoral Thesis

Author

Bianchi, Reinaldo Augusto da Costa (Catálogo USP)

Full name

Reinaldo Augusto da Costa Bianchi

E-mail

Institute/School/College

Escola Politécnica

Knowledge Area

Digital Systems

Date of Defense

2004-04-05

Published

São Paulo, 2004

Supervisor

Reali Costa, Anna Helena (Catálogo USP)

Committee

Sichman, Jaime Simao (President)
Araujo, Aluizio Fausto Ribeiro
Camargo Junior, Joao Batista
Costa, Oswaldo Luiz do Valle
Romero, Roseli Aparecida Francelin

Title in Portuguese

Uso de heurísticas para a aceleração do aprendizado por reforço.

Keywords in Portuguese

aprendizado computacional
aprendizado por reforço
inteligência artificial
robôs
robótica móvel inteligente

Abstract in Portuguese

Este trabalho propõe uma nova classe de algoritmos que permite o uso de heurísticas para aceleração do aprendizado por reforço. Esta classe de algoritmos, denominada "Aprendizado Acelerado por Heurísticas" ("Heuristically Accelerated Learning" - HAL), é formalizada por Processos Markovianos de Decisão, introduzindo uma função heurística H para influenciar o agente na escolha de suas ações, durante o aprendizado. A heurística é usada somente para a escolha da ação a ser tomada, não modificando o funcionamento do algoritmo de aprendizado por reforço e preservando muitas de suas propriedades. As heurísticas utilizadas nos HALs podem ser definidas a partir de conhecimento prévio sobre o domínio ou extraídas, em tempo de execução, de indícios que existem no próprio processo de aprendizagem. No primeiro caso, a heurística é definida a partir de casos previamente aprendidos ou definida ad hoc. No segundo caso são utilizados métodos automáticos de extração da função heurística H chamados "Heurística a partir de X" ("Heuristic from X"). Para validar este trabalho são propostos diversos algoritmos, entre os quais, o "Q-Learning Acelerado por Heurísticas" (Heuristically Accelerated Q-Learning - HAQL), que implementa um HAL estendendo o conhecido algoritmo Q-Learning, e métodos de extração da função heurística que podem ser usados por ele. São apresentados experimentos utilizando os algoritmos acelerados por heurísticas para solucionar problemas em diversos domínios - sendo o mais importante o de navegação robótica - e as heurísticas (pré-definidas ou extraídas) que foram usadas. Os resultados experimentais permitem concluir que mesmo uma heurística muito simples resulta em um aumento significativo do desempenho do algoritmo de aprendizado de reforço utilizado.

Title in English

Heuristically acelerated reinforcement learning.

Keywords in English

artificial intelligence
intelligent mobile robots
machine learning
reinforcement learning
robots

Abstract in English

This work presents a new class of algorithms that allows the use of heuristics to speed up Reinforcement Learning (RL) algorithms. This class of algorithms, called "Heuristically Accelerated Learning" (HAL) is modeled using a convenient mathematical formalism known as Markov Decision Processes. To model the HALs a heuristic function that influences the choice of the actions by the agent during its learning is defined. As the heuristic is used only when choosing the action to be taken, the RL algorithm operation is not modified and many proprieties of the RL algorithms are preserved. The heuristic used in the HALs can be defined from previous knowledge about the domain or be extracted from clues that exist in the learning process itself. In the first case, the heuristic is defined from previously learned cases or is defined ad hoc. In the second case, automatic methods for the extraction of the heuristic function H called "Heuristic from X" are used. A new algorithm called Heuristically Accelerated Q-Learning is proposed, among others, to validate this work. It implements a HAL by extending the well-known RL algorithm Q-Learning. Experiments that use the heuristically accelerated algorithms to solve problems in a number of domains - including robotic navigation - are presented. The experimental results allow to conclude that even a very simple heuristic results in a significant performance increase in the used reinforcement learning algorithm.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

tese-bianchi.pdf (2.08 Mbytes)

Publishing Date

2005-08-05

Derived works

WARNING: The material described below relates to works resulting from this thesis or dissertation. The contents of these works are the author's responsibility.

BIANCHI, REINALDO A. C., et al. Heuristically-Accelerated Multiagent Reinforcement Learning [doi:10.1109/tcyb.2013.2253094]. IEEE Transactions on Cybernetics [online], 2013, vol. PP, p. 1-1.
BIANCHI, Reinaldo Augusto da Costa, RIBEIRO, Carlos Henrique Costa, and Costa, Anna Helena Reali. Accelerating Autonomous Learning by Using Heuristic Selection of Actions [doi:10.1007/s10732-007-9031-5]. Journal of Heuristics [online], 2008, vol. 14, p. 135-168.
BIANCHI, Reinaldo Augusto da Costa, et al. Heuristically Accelerated Q-Learning: a New Approach to Speed Up Reinforcement Learning [doi:10.1007/b100195]. Lecture Notes in Computer Science [online], 2004, vol. 3171, p. 245-254.
ODAKURA, Valguima Victoria Viana Aguiar, BIANCHI, Reinaldo Augusto da Costa, and COSTA, ANNA HELENA REALI. General detection model in cooperative multirobot localization [doi:10.1590/S0104-65002009000300004]. Journal of the Brazilian Computer Society [online], 2009, vol. 15, p. 33-46.
BIANCHI, Reinaldo Augusto da Costa, and Costa, Anna Helena Reali. Comparing distributed reinforcement learning approaches to learn agent coordination. In 8th Ibero-American Conference on AI (IBERAMIA 2002), Seville, Spain, 2002. Advances in Artificial Intelligence - IBERAMIA 2002 8th Ibero-American Conference on AI.Berlin : Springer, 2002.
BIANCHI, Reinaldo Augusto da Costa, Costa, Anna Helena Reali, and RIBEIRO, Carlos Henrique Costa. Heuristic Selection of Actions in Multiagent Reinforcement Learning. In International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007. International Joint Conferences on Artificial Intelligence.Menlo Park, California : AAAI Press, 2007. Available from: http://www.ijcai.org/papers07/papers/ijcai07-110.pdf.
BIANCHI, Reinaldo Augusto da Costa, e Costa, Anna Helena Reali. Comparing distributed reinforcement learning approaches to learn agent coordination. In I Workshop do Projeto AACROM, São Paulo, 2002. Anais do I Workshop do Projeto AACROM., 2002.
BIANCHI, Reinaldo Augusto da Costa, e Costa, Anna Helena Reali. Uso de heurísticas para a aceleração do aprendizado por reforço. In XVIII Concurso de Teses e Dissertações - XXV Congresso da Sociedade Brasileira de Computação, São Leopoldo, 2005. Dispon?vel em: http://www.unisinos.br/congresso/sbc2005/?sessao=ctd.
BIANCHI, Reinaldo Augusto da Costa, RIBEIRO, Carlos Henrique Costa, and COSTA, Anna Helena Reali. Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results [doi:10.3233/978-1-61499-098-7-169]. In European Conference on Artificial Intelligence (ECAI 2012)), Montpellier, 2012. Frontiers in Artificial Intelligence and Applications.Amsterdam : IOS Press, 2012.
BIANCHI, Reinaldo Augusto da Costa, RIBEIRO, Carlos Henrique Costa, e Costa, Anna Helena Reali. Uso de heurísticas baseadas em políticas para aceleração do Aprendizado por Reforço. In II Workshop do Projeto AACROM, São José dos Campos, SP., 2003. Anais do II Workshop do Projeto AACROM., 2003.
ODAKURA, Valguima Victoria Viana Aguiar, et al. The use of Negative Detection in Cooperative Localization in a Team of Four-Legged Robots. In IX SBAI - Simpósio Brasileiro de Automação Inteligente, Brasília, 2009. Anais do SBAI 2009. : Sociedade Brasileira de Automática, 2009.
CELIBERTO JUNIOR, L. A., et al. Heuristic Reinforcement Learning applied to RoboCup Simulation Agents. In Gerhard Lakemeyer, et al. RoboCup 2006: Robot Soccer World Cup X. Organizador. Heildelberg : Springer, 2008{Volume}. chap. 5001, p. 220-227.http://www.teses.usp.br/teses/disponiveis/3/3141/tde-28062005-191041/
Costa, Anna Helena Reali, WALDMANN, J., e BIANCHI, Reinaldo Augusto da Costa. Visão robótica. In Luis Antonio Aguirre. Enciclopédia de Automática [online]. Organizador. São Paulo : Editora Edgard Blücher, 2007{Volume}. cap. 3, p. 410-427.http://www.teses.usp.br/teses/disponiveis/3/3141/tde-28062005-191041/