Heuristics are essential in situations where finding an optimal solution is computationally expensive or impractical. Instead of exhaustively searching through all possible solutions, heuristics provide a way to prioritize and guide the search toward more promising paths. In some cases, good heuristics lead to an algorithm being optimal, but this is not always the case. Consider the following examples:
From the above examples, it is clear that we need a way to evaluate the "goodness" of a heuristic for a given search problem. Fortunately, there are two mathematical properties of heuristic functions that help us establish this.
function GBFS ( G , s, h, goal ):
PQ.push( h(s,goal), s, [] )
while PQ is not empty:
c, v, path = PQ.pop()
if v not visited:
mark v as visited
path.append(v)
if v == goal:
return path
for all neighbors w of v:
if w not visited:
PQ.push( h(w,goal), w, path )
Note that this algorithm is susxeptible to getting stuck in local optima, since it completely ignores the cost of traversing any edge, but instead relies on the heuristic instead. GBFS, therefore, is incomplete and not optimal.
function ASTAR (G, s, goal):
PQ.push( H(s, goal), 0, s, [] )
while PQ is not empty:
h, c, v, path = PQ.pop()
if v not visited:
mark v as visited
path.append(v)
if v == goal:
return path
for all neighbors w of v:
g = c + WT(v, w)
PQ.push( g + H(w, goal), g, w, path )
Walkthrough: Consider the graph below, with values along the edges representing the cost of the edge, and the values in parentheses below each node representing the heuristic estimate $h(n)$.
At $S$ we have two possible paths, $S->A$ or $S->G$. Let’s evaluate these paths:
$S$ → $A$
$f(A)=g(A)+h(A)$
$f(A)=1+3$
$= 4$
$S$ → $G$
$f(G)=g(G)+h(G)$
$f(G)=10+0$
$= 10$
Since, $f(A) < f(G)$ we add the node $A$ to the $frontier$. At $A$ we have two paths at our disposal, $S->A->B$ or $S->A->C$. Let’s evaluate these paths:
$S$ → $A$ → $C$
$f(C)=g(C)+h(C)$
$f(C)=(1+1)+2$
$= 4$
$S$ → $A$ → $B$
$f(B)=g(B)+h(B)$
$f(B)=(1+2)+4$
$= 7$
Since, $f(C) < f(B)$ we add the node $C$ to the $frontier$. At $C$ we have two paths at our disposal, $S->A->C->D$ or $S->A->C->G$. Let’s evaluate these paths:
$S$ → $A$ → $C$ → $D$
$f(D)=g(D)+h(D)$
$f(D)=(1+1+4)+1$
$= 7$
$S$ → $A$ → $C$ → $G$
$f(G)=g(G)+h(G)$
$f(G)=(1+1+4)+0$
$= 6$
Since, $f(G) < f(D)$ we add the node $G$ to the $frontier$. In the loop following this addition, we pop the node $G$ and undergo a goal test and eventually break out of the loop. Note that the heuristic in this example is consistent, and therefore lets A* search find the optimal path.
Beam search is a heuristic search algorithm which seeks to address the memory complexity of A* search. It does so by focusing on the nodes with the best heuristic within a limited set, and expanding only the $\beta$ best nodes at each level.
Beam Search utilizes a breadth-first style approach, and it generates all successors of the current state, but expands only a specified number, $\beta$, of states based on their heuristic cost. The parameter $\beta$ is known as the beam width. If $b$ represents the branching factor (assuming that the branching factor is constant for simplicity), the number of nodes under consideration at every depth is $\beta \times b$, where $\beta\in [0,1]$. Reducing the beam width results in the pruning of more states. The variation of $\beta$ alters the approach of the algorithm - when it is set to $1/b$, the algorithm emulates a hill-climbing search, where a singular best node is consistently chosen from the successor nodes. On the other hand, $\beta = 1$ corresponds to a standard A* search, where all nodes are selected for expansion.
Employing a beam width serves to bound the memory requirements by limiting the number of sub-trees that are to be explored. However, this comes at the expense of completeness and optimality, as there is a risk that the best solution may not be found. There is always a possibility of pruning branches containing the goal state entirely, which makes finding any solution (and as a result, the optimal solution) impossible.
The beam search algorithm follows the A* search algorithm closely with a few minor changes:
function BEAM-SEARCH (G, s, goal, beam_width):
PQ.push(H(s, goal), 0, s, [])
while PQ is not empty:
h, c, v, path = PQ.get()
if v not visited:
mark v as visited
path.append(v)
if v == goal
return path
neighbors = get_neighbors(G, v)
for w in beam_select_neighbors(neighbors, beam_width):
g = c + WT(v, w)
PQ.push( g + H(w, goal), g, w, path )