Kn65. Donald E. Knuth. On the Translation of Languages from Left to Right

LR(k) Grammatik, Parsing und grundlegenden Eigenschaften

LR(k)

  • L = Left to right parsing
  • R = Reverse, Rightmost Derivation (Auflösung von Rules, Tree Linearisierung: immer rechtestes NonTerminal auflösen)
  • k Lookahead Terminals for unique parsing

i Intro and Defs

  • "intermediates" I (Rules): A, B, C
  • "terminals" T: a, b, c
  • S denotes the "principal intermediate character"
  • ε empty string
  • production is a relation A → Θ mit Θ Strings über I ∪ T
  • Grammar G: {Ap → Xp1 ... Xpnp | rulenumber p ∈ [1,s], np >= 0 }
  • φ → ψ direkte Ableitung: ein Intermediat aus φ ersetzen durch eine Produktion
  • φ ⇒ ψ transitiver Abschluss von →
  • φ ⇛ ψ transitiver+reflexiver Abschluss von → i.e. φ → ψ or φ = ψ
  • language einer Grammatik: {α String über T | S ⇒ α}
  • sentential form: any string α for which S ⇒ α
  • derivation tree or parse diagram
    • Linearisierungen indem ein Intermediate nach dem anderen expandiert wird, Reihenfolge ist irrelevant, also z.B. leftmost oder rightmost auswählen
  • "handle of a tree" to be the leftmost set of adjacent leaves forming a complete branch
  • "pruning off" handles (d.h. leaves der handle entfernen) ==> Schritt für Schritt zurück nach S <==> rightmost derivation in revers
  • (n, p) is "handle of α = X1 ... Xn ... Xt (String over T ∪ I)" iff ∃ derivation tree von α mit der handle Xr+1 ... Xn for production p
  • "k-sentential form" is a sentential form followed by k ⊣ with ⊣ ∉ T ∪ I
  • a grammar is "LR(k)" iff for any k-sentiential
    • iff any handle is always uniquely determined by the string to its left and the k terminal characters to its right. Formally: iff ∀
    • α = X1 ... Xn ... Xn+k Y1 ... Yu is a k-sentential form without Intermediates from Xn+1 to Yu
    • β = X1 ... Xn ... Xn+k Z1 ... Zv is a k-sentential form without Intermediates from Xn+1 to Zv
    • (n, p) handle of α and (n', p') handle of β
    • then n = n' and p = p'
    • this means we do all possible reductions at n, and then step one forward - no backtracking needed
    • remarks
      • handle implies, no reductions left of n, however, we have to consider all possible trees
      • the pushdown automata the infinit number of possible prefixes (for n unlimited) to a finite number of combinations

ii. ANALYSIS OF LR(k) GRAMMARS

method 1: reduce to regular Grammar

define Grammar G' from G by

  • Terminals: T ∪ I ∪ {⊣}
  • Intermediates: [A, α] with α ∈ (T ∪ ⊣)k and [p]
  • Hk(σ) = {α | ∃ β: σ ⇒ αβ}
  • rules
    • [Ap, α] → Xp1 ... Xp(j-1) [Xpj, β] with j <= np and β = Hk(Xp(j+1) ... Xpnp α)
    • [Ap, α] → Xp1 ... Xpnp α [p]

now, the following to are equivalent

  • [S, ⊣k] ⇒ X1 ... Xn ... Xn+k [p] in G'
  • there exists a k-sentential form of G: X1 ... Xn ... Xn+k Y1 ... Yu with handle (n, p) and Xn+1 ... Yu not intermediates

easy to see for Knuth, I don't see how to prove the left-most property of handle (may be is not necessary because it follows from the second property below?)

thus

  • G is LR(k) if and only if
  • [S, ⊣k] ⇒ θ [p] and [S, ⊣k] ⇒ θ φ [q] implies φ = ε and p = q in G'

but G' is regular and well known methods exist to check this

method 2: LR(k) pushdown parser

careful for emtpy productions A → ε! modify Hk(σ) to omit all derivations when an intermediate as the initial character is replaced by ε.

state = set of [p, j, α] meaning we have parsed β Xp1 ... Xpj and there is a sentential form β Ap α ...

stack: S0X1S1X2S2 ... XnSn | Y1 ... Yk ω: left from the | are the already parsed characters and states, right follows the lookahead und the rest of the input

algorithm on stack as above:

  • step1 compute recursively closure S' of S' = Sn ∪ {[q, 0, β] | ∃ [p,j; α] in S', Xp(j+1) = q and β ∈ Hk(Xp(j+2) ... Xpnp α)} : add all productions that could newly start
  • step2 compute
    • Z = {β | ∃ [p, j; alpha;] in S', j < np and β ∈ Hk(Xp(j+1) ... Xpnp α)} : in rule p, but not at end
    • Zp = {α | ∃ [p, np; α]} : at end of rule p
    • if these sets are not disjoint, error: grammar is not LR(k)
    • if lookahead Y1 ... Yk in Z then shift Y1 onto stack ==> S0X1S1X2S2 ... XnSnY1 | Y2...
    • elif lookahead in Yp then reduce stack by rule p ==> S0X1S1X2S2 ... Xn-npSn-npAp | Y1 ...
    • else syntax error
  • step3 after renaming the stack has now the form S0X1S1X2S2 ... XnSnXn+1 | Y1 ... Yk ω
    • compute S' from Sn as in step1 (add all productions that could newly start)
    • compute Sn+1 = {[p, j+1, α] | [p, j; a] ∈ S' and Xpj = Xn+1}: new state after Xn+1
    • if Sn+1 = {[0, 1, ⊢k]} and lookahead = ⊢k then parsing succesfully finished
    • else push Sn+1 on the stack and go to step1