Acting Optimally in Partially Observable Stochastic Domains

Classic Paper Award, AAAI-94

 

Anthony R. Cassandra
Brown University
MCC
St. Edwards University
University of Texas
Pronto, LLC
Leslie Pack Kaelbling
Brown University
Massachusetts Institute of Technology
Michael L. Littman
Brown University
Duke University
AT&T
Rutgers University
Brown University

Outline

N/A

The Model: POMDP

  • Important to not assume everyone knows what a POMDP is.

Related Models

Markov
Models
Do we have control
over the state transitions?
NO YES
Are the states
completely
observable?
YES

Markov Chain

MDP

Markov Decision Process
NO

HMM

Hidden Markov Model

POMDP

Partially Observable
Markov Decision Process
  • Zzz
  • Zzz
  • Zzz
N/A

POMDP Example: Health Care

  • Suppose time scale is every 3 months we need to make a decision.

POMDP Action: Radiation Treatment

  • May change patient's state, but does not provide information and has a very high cost.

POMDP Action: Non-invasive Tests

  • Does not change patient's state, but provides information.

POMDP Action: Surgery

  • Can both provide information and change state.
  • Provides better information, but cost is higher.
  • Less likely to cure than radiation treatment.

POMDP Action: Do Nothing

  • Doing nothing always an option, but might have high cost if patient not healthy.

Belief States

  • Zzz
  • Zzz
  • Zzz
N/A

Solutions Over Belief Space

  • Zzz

Previous Work

  • Mapping states to actions cannot work, so need
  • Zzz
  • Zzz
N/A

Solution Properties and Techniques

  • Zzz
  • Zzz
  • Zzz
N/A

Previous Algorithm (Sondik, 1971)

  • zzz

Witness Algorithm (Cassandra, et al., 1994)

  • Exploits region structure better.
  • Eliminates redundant work.
A Retrospective

Alternative Solution Techniques

  • Zzz.

Model/Problem Exposure

  • Zzz.

Road to Better Implementations

  • Zzz.

More Application Areas

  • Zzz
  • Zzz
  • Zzz
N/A
The End