{"id":5072,"date":"2022-08-01T17:35:30","date_gmt":"2022-08-01T17:35:30","guid":{"rendered":"https:\/\/www.dinu.at\/profile\/home\/?p=5072"},"modified":"2022-08-13T10:59:32","modified_gmt":"2022-08-13T10:59:32","slug":"reactive-exploration-to-cope-with-non-stationarity-in-lifelong-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.dinu.at\/profile\/home\/reactive-exploration-to-cope-with-non-stationarity-in-lifelong-reinforcement-learning\/","title":{"rendered":"Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning"},"content":{"rendered":"<div id=\"themify_builder_content-5072\" data-postid=\"5072\" class=\"themify_builder_content themify_builder_content-5072 themify_builder\">\n\n    <\/div>\n\n\n\n<h2>Abstract<\/h2>\n\n\n\n<p>In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at:&nbsp;<a href=\"https:\/\/github.com\/ml-jku\/reactive-exploration\">this https URL<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7SrVj-1jO","jetpack-related-posts":[{"id":5076,"url":"https:\/\/www.dinu.at\/profile\/home\/a-dataset-perspective-on-offline-reinforcement-learning\/","url_meta":{"origin":5072,"position":0},"title":"A Dataset Perspective on Offline Reinforcement Learning","date":"1. August 2022","format":false,"excerpt":"Abstract The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4697,"url":"https:\/\/www.dinu.at\/profile\/home\/align-rudder-learning-from-few-demonstrations-by-reward-redistribution\/","url_meta":{"origin":5072,"position":1},"title":"Align-RUDDER: Learning From Few Demonstrations by  Reward Redistribution","date":"30. September 2020","format":false,"excerpt":"Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5068,"url":"https:\/\/www.dinu.at\/profile\/home\/align-rudder-learning-from-few-demonstrations-by-reward-redistribution-2\/","url_meta":{"origin":5072,"position":2},"title":"Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution","date":"10. March 2022","format":false,"excerpt":"Abstract Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5059,"url":"https:\/\/www.dinu.at\/profile\/home\/xai-and-strategy-extraction-via-reward-redistribution\/","url_meta":{"origin":5072,"position":3},"title":"XAI and Strategy Extraction via Reward Redistribution","date":"17. October 2020","format":false,"excerpt":"Abstract In reinforcement learning, an agent interacts with an environment from which it receives rewards, that are then used to learn a task. However, it is often unclear what strategies or concepts the agent has learned to solve the task. Thus, interpretability of the agent\u2019s behavior is an important aspect\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4672,"url":"https:\/\/www.dinu.at\/profile\/home\/overcoming-catastrophic-forgetting-with-context-dependent-activations-xda-and-synaptic-stabilization\/","url_meta":{"origin":5072,"position":4},"title":"Overcoming Catastrophic Forgetting with Context-Dependent Activations (XdA) and Synaptic Stabilization","date":"25. November 2019","format":false,"excerpt":"Abstract Overcoming Catastrophic Forgetting in neural networks is crucial to solving continuous learning problems. Deep Reinforcement Learning uses neural networks to make predictions of actions according to the current state space of an environment. In a dynamic environment, robust and adaptive life-long learning algorithms mark the cornerstone of their success.\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5080,"url":"https:\/\/www.dinu.at\/profile\/home\/ensemble-learning-for-domain-adaptation-by-importance-weighted-least-squares\/","url_meta":{"origin":5072,"position":5},"title":"Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation","date":"13. August 2022","format":false,"excerpt":"Abstract We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a\u2026","rel":"","context":"In &quot;General&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"builder_content":"","_links":{"self":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5072"}],"collection":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/comments?post=5072"}],"version-history":[{"count":5,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5072\/revisions"}],"predecessor-version":[{"id":5119,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/posts\/5072\/revisions\/5119"}],"wp:attachment":[{"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/media?parent=5072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/categories?post=5072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dinu.at\/profile\/home\/wp-json\/wp\/v2\/tags?post=5072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}