Summary: Is Power-Seeking AI an Existential Risk? section 2: Timelines and 3: Incentives

Article read time: 25min
This post read time: 5min

This is a summary of sections 2 and 3 of Joseph Carlsmith’s Is power-seeking AI an existential risk?

Section 2 specifies some key properties of relevant systems to answer the question in the title.
1. An AI system has Advanced Capabilities if it can outperform the best humans at some task which grants power. This need not be a “human-level AI” or above. Instead, focusing on systems which can leverage power given to it through standard means is enough to consider the system as having advanced capabilities.
2. An AI system engages in Agentic Planning if it makes and executes plans in pursuit of objectives on the basis of models of the world. This contrasts to a reflexive agent. An AI capable of agentic planning will have instrumental goals such as gaining power which a reflexive agent will not. There is a requirement that the AI has a world model and decision making capability, or at least for there to be a reasonable perspective which can be taken that views the AI as having those 2 things.
3. An AI system that is already an agentic planner also has Strategic Awareness if the models it uses are sufficiently broad, informed, and sophisticated to make accurate predictions about the effects of its actions, particularly with regard to wielding power.

A system which has the above 3 characteristics is labeled “APS” (Advanced, Planning, Strategic).

Forecasting the emergence of something approximate to an “APS”:
Cotra: >65% chance of “transformative AI” by 2070
Metaculus: 54% chance of “human-machine intelligent parity” by 2040
Ranging Experts: >30% chance of “machines better at every task” by 2066
Carlsmith: 65% chance of APS before 2070

Section 3 assumes that an APS system can be developed, and examines the incentives to instantiating it.

It is a given that human actors will be incentivized to create AIs with advanced capabilites (1), but not so obvious that those actors will want the AI to be agentic planners / strategically aware.

For example, many extremely helpful-to-humans tasks require no planning or strategy such as translating languages, predicting human responses, etc. These and many other types of reasonable tasks for AIs are so specialized that there may be no need to develop an AI with general planning and strategy in those cases. The way AI is designed in the future will be shaped by economic incentives, so the costs of imparting planning and strategy in any robust sense to the AI will only occur if it is economically incentivized. Highly specialized non-APS systems may outperform APS systems in most cases due to the additional cost and performance requirements to run an APS over a non-APS.

Carlsmith expects that if APS systems develop, they will be developed in situations where non-APS systems are already operating and may be very powerful.

Here are 3 reasons why APS systems may be developed:
1. Agentic Planning and Strategic Awareness are both instrumental goals for most terminal goals. They may not be the most efficient instrumental goals when taken in context, but in contexts where planning and strategy are specifically required, they will almost certainly be trained in. APS systems avoid testing by trial and error. This could be an important requirement for the AI’s developers. An AI without planning and strategy may be very useful, but only narrowly: prone to breaking as things change.
2. It is possible that giving planning and strategic abilities to an AI is more efficient than the non-APS methodologies at fulfilling specialized needs. Instead of creating many specialized AIs to accomplish the range of required tasks, it may be most efficient (in terms of cost of development) to train a single AI with planning and strategic abilities that is flexible enough to fill all of those individual special needs even if any one specific specialized non-APS alternative is more efficient than the APS at that one specific task.
3. Agentic planning and strategic awareness may be emergent properties of sophisticated systems or be costly to explicitly avoid.

Carlsmith thinks 1. is the most likely of the 3 reasons.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s