Makes decision of the next action based on current and past observations and actions.
Labels actions for observation-only data.