{"id":5068,"date":"2022-03-10T08:33:17","date_gmt":"2022-03-10T08:33:17","guid":{"rendered":"https:\/\/www.dinu.at\/profile\/home\/?p=5068"},"modified":"2022-08-13T10:57:37","modified_gmt":"2022-08-13T10:57:37","slug":"align-rudder-learning-from-few-demonstrations-by-reward-redistribution-2","status":"publish","type":"post","link":"https:\/\/www.dinu.at\/profile\/home\/align-rudder-learning-from-few-demonstrations-by-reward-redistribution-2\/","title":{"rendered":"Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution"},"content":{"rendered":"<div id=\"themify_builder_content-5068\" data-postid=\"5068\" class=\"themify_builder_content themify_builder_content-5068 themify_builder\">\n\n    <\/div>\n\n\n\n<h2>Abstract<\/h2>\n\n\n\n<p>Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at\u00a0<a href=\"https:\/\/github.com\/ml-jku\/align-rudder\">this https URL<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7SrVj-1jK","jetpack-related-posts":[{"id":4697,"url":"https:\/\/www.dinu.at\/profile\/home\/align-rudder-learning-from-few-demonstrations-by-reward-redistribution\/","url_meta":{"origin":5068,"position":0},"title":"Align-RUDDER: Learning From Few Demonstrations by  Reward Redistribution","date":"30. September 2020","format":false,"excerpt":"Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5059,"url":"https:\/\/www.dinu.at\/profile\/home\/xai-and-strategy-extraction-via-reward-redistribution\/","url_meta":{"origin":5068,"position":1},"title":"XAI and Strategy Extraction via Reward Redistribution","date":"17. October 2020","format":false,"excerpt":"Abstract In reinforcement learning, an agent interacts with an environment from which it receives rewards, that are then used to learn a task. However, it is often unclear what strategies or concepts the agent has learned to solve the task. Thus, interpretability of the agent\u2019s behavior is an important aspect\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5142,"url":"https:\/\/www.dinu.at\/profile\/home\/a-neuro-symbolic-perspective-on-large-language-models-llms\/","url_meta":{"origin":5068,"position":2},"title":"A Neuro-Symbolic Perspective on Large Language Models (LLMs)","date":"22. January 2023","format":false,"excerpt":"We are excited to present our work, combining the power of a symbolic approach and Large Language Models (LLMs). Our Symbolic API bridges the gap between classical programming (Software 1.0) and differentiable programming (Software 2.0). Conceptually, our framework uses neural networks - specifically LLMs - at its core, and composes\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.dinu.at\/wp-content\/uploads\/2023\/01\/symai_logo.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":5072,"url":"https:\/\/www.dinu.at\/profile\/home\/reactive-exploration-to-cope-with-non-stationarity-in-lifelong-reinforcement-learning\/","url_meta":{"origin":5068,"position":3},"title":"Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning","date":"1. August 2022","format":false,"excerpt":"Abstract In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4556,"url":"https:\/\/www.dinu.at\/profile\/home\/deep-learning\/","url_meta":{"origin":5068,"position":4},"title":"Deep Learning","date":"26. November 2016","format":false,"excerpt":"Hi, in this post I have added two PDF files with some important information and links related to the wide topic \"Deep Learning\". These should give you some guidence where to start and how to dig deeper. Good luck and have fun! Deep Learning Overview Using docker for Deep Learning","rel":"","context":"In &quot;Education&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4567,"url":"https:\/\/www.dinu.at\/profile\/home\/deep-learning-script\/","url_meta":{"origin":5068,"position":5},"title":"Deep Learning Script","date":"19. April 2017","format":false,"excerpt":"NVIDIA DIGITS offers great support for\u00a0experimenting with Deep Learning and provides\u00a0great integration of Caffe Script. To improve this experience I developed a DSL for Caffe which eases the prototyping of network architectures by drastically reducing the amount of code line and simplifying the development. All the results are available on\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"https:\/\/i2.wp.com\/www.dinu.at\/wp-content\/uploads\/2017\/04\/DLS-1024x651.png?resize=350%2C200","width":350,"height":200},"classes":[]}],"builder_content":"","_links":{"self":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5068"}],"collection":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/comments?post=5068"}],"version-history":[{"count":3,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5068\/revisions"}],"predecessor-version":[{"id":5115,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5068\/revisions\/5115"}],"wp:attachment":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/media?parent=5068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/categories?post=5068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/tags?post=5068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}