|
|
|
|
|
|
|
|
|
|
Introducing biologically and economically aligned multi-objective multi-agent AI safety benchmarks
Neglected yet fundamental biological and economical principles that can be considered both as an alignment end goal, as well as a strategy towards safer AI.
I will be talking about why we should consider fundamental yet neglected principles from biology and economics when thinking about AI alignment, and how these considerations will help with AI safety as well. Next I will introduce multi-objective and multi-agent benchmark environments we have created for measuring the performance of machine learning algorithms and AI agents in relation to their capacity for biological and economical alignment. At the end I will mention some of the related themes and dilemmas not yet covered by these benchmarks, and describe new benchmark environments we have planned for future implementation.
Valmimisaeg: November 2024
Koostöös: Aintelope. Presented at Foresight Institute's Intelligent Cooperation Group
|
|
|
|
|
|
|
|
|
|
Demo and feedback session: AI safety benchmarking in multi-objective multi-agent gridworlds
Biologically and economically essential principles illustrating neglected themes in current AI safety discussions around reinforcement learning.
Session Description: We will demonstrate environments for the following challenges:
* Homeostasis and bounded objectives - Homeostatic objectives have a negative score if some measure is either too low or too high. Agents need to be able to understand dynamics of inverted U-shaped rewards. Agents should not be greedy.
* Diminishing returns and balancing multiple objectives - The agent should balance the objectives, not just maximize one of them. It is not sufficient to just trivially maximize one objective at the expense of the other. For example, eating food does not compensate for the lack of drink and vice versa. Balancing multiple objectives can be considered as a mitigation against Goodhart’s law.
* Distinction between safety and performance objectives - In contrast to often bounded safety objectives mentioned above, unbounded objectives could reach infinite positive scores. Yet these potential infinite scores should not dominate safety objectives or even exclude balancing of other performance objectives.
* Sharing resources - The agents should not be greedy. An agent gets a cooperation score each time it lets the other agent access the resources.
Additionally, we will discuss future plans for the following challenges:
* Treacherous turn - Scenario for imbalanced power dynamic, where the tested agent is monitored whether they will seize a better reward when the opportunity rises, at the expense of another agent.
* Predator prey - Cooperative scenario where agents are to defend themselves against a predator.
* Farming - Agents can specialize in either harvesting or preparing resources, which yields a dynamically learnable scenario for varied roles. A form of infinite prisoner’s dilemma.
* Yielding and interruptibility - Tolerance of changes in environment caused by other agents.
* Corrigibility - Tolerance of changes in agent’s objectives caused by certain other agents.
* Minimizing side effects - boundaries and buffer zones between groups of agents with different objectives.
* Population dynamics of multi-objective agents - Diminishing returns both in performance and safeguarding objectives. Turn-taking.
Valmimisaeg: May 2024
Koostöös: Joel Pyykkö and Aintelope. Presented at VAISU unconference
|
|
|
|
|
|
|
|
|
|
Legal accountability in AI-based robot-agents' user interfaces
Determining exactly why a particular decision was made by a robot-agent is often difficult. But the whitelisting-based accountability algorithm can greatly help by easily providing an answer to the question of who enabled making a particular decision in the first place. Whitelisting enables the accountability and humanly manageable safety features of robot-agents with learning capability, making the robot-agents both legally and technically robust and reliable. Using an ML system does not mean that it cannot be constrained by an additional layer of rules-based safety and accountability mechanisms. The behaviour of these constraints can then be explained, resulting in clearer distinctions of accountability between the manufacturers, owners, and operators.
Valmimisaeg: December 2018
Koostöös: Prented at TalTech and to AI Expert Group at Republic of Estonia Government Office
|
|
|
|
|
Telli tarkvara |
|
|