Acting Optimally in Partially Observable Stochastic Domains

July, 2013

Anthony R. Cassandra

Brown University

MCC

St. Edwards University

University of Texas

Pronto, LLC

Leslie Pack Kaelbling

Brown University

Massachusetts Institute of Technology

Michael L. Littman

Brown University

Duke University

AT&T

Rutgers University

Brown University

- The AAAI-94 Paper
- A Retrospective

N/A

- Partially Observable Markov Decision Process
- Probabilistic model for sequential decision making.
- Discrete time epochs.
- Discrete set of states.
- Discrete set of actions.
- Discrete set of observations.
- Immediate reward/cost structure.

- Important to not assume everyone knows what a POMDP is.

Markov Models |
Do we have control over the state transitions? |
||
---|---|---|---|

NO | YES | ||

Are the states completely observable? |
YES | ## Markov Chain |
## MDPMarkov Decision Process |

NO | ## HMMHidden Markov Model |
## POMDPPartially ObservableMarkov Decision Process |

- Zzz
- Zzz
- Zzz

- Suppose time scale is every 3 months we need to make a decision.

- May change patient's state, but does not provide information and has a very high cost.

- Does not change patient's state, but provides information.

- Can both provide information and change state.
- Provides better information, but cost is higher.
- Less likely to cure than radiation treatment.

- Doing nothing always an option, but might have high cost if patient not healthy.

- Solution that maps states to actions not possible.
- Mapping observations to actions not possible.
- Techniques use probability distribution over states.
- Bayes' Rule can be used to update the belief state.

- Zzz
- Zzz
- Zzz

- How good is this solution?
- Is there a better solution?
- Need to find the value at all (infinite) belief points.

- Zzz

- Most prior work in Operations Research community.
- Markov property of belief states (Ästrom, 1965).
- Optional solution properties and first algorithm (Sondik, 1971).
- Finite Grid Approximations (Lovejoy, 1991)
- Applications:
*medical diagnosis, machine maintenance, fisheries, questionnaire design, moving target search, search and rescue, target identification, corporate policy, internal auditing, health-care policy making.*

- Mapping states to actions cannot work, so need
- Zzz
- Zzz

- Previous exact algorithm identified optimal solution as dividing belief space into a finite number of linear regions.

- Algorithms just need a way to enumerate the regions.

- Zzz
- Zzz
- Zzz

- Defined linear regions for a given point.
- Bounds of regions define points for next regions.
- Regions defined are very conservative.

- zzz

- Computes regions for each action separately, then combines.
- Defines simpler regions by ignoring exact values at boundaries.
- Looks for points where current value can be improved.

- Exploits region structure better.
- Eliminates redundant work.

A Retrospective

- Model framework is very powerful.
- Exact algorithms too computationally complex (PSPACE Complete).

- Point-based (approximate) methods proved more useful.

- Zzz.

- Tiger problem.
- Solution structures: policy graphs.

- POMDPs becoming "core" in AI: AIMA Text (Russell & Norvig).

- Zzz.

- Previously, code non-existent, hard to find or not runnable.
- pomdp-solve still useful for some researchers.

- Better implementations and toolkits are now available.

- Zzz.

- Robotics and planning.
- Spoken dialog systems.
- Unmanned aircraft collision avoidance.
- Human decision making and aides (cognitive science).

- Zzz
- Zzz
- Zzz

The End