Alignment

AI Alignment Challenge

The concern is not that AI systems might spontaneously decide to hurt us. The concern is that we might build AI systems that are caught in adversarial dynamics with respect to their users, with us, with society, with other AI systems, or that magnify the ability of some humans to hurt other humans. The concern is that there are possible AI designs that let them be much more powerful than humans, and that we might deploy systems with those designs too quickly.

Read More →

AI Alignment

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem: 1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. 2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

Read More →