3 min read

Unique Pairs

Lua equips two primary iterator functions, pairs and ipairs. The first finds all index-value pairs within a table, answering each index and value. The second finds continuous numerically-indexed table pairs starting at \(1\), again answering the numerical index and its associated value.

Problem

In Lua, it is sometimes useful to iterate an array of values while skipping duplicates. How does the Lua developer implement this requirement simply and efficiently?

Solution

The answer to the problem requires two functions:

  • uniq.ipairs
  • inext local function

They belong to a uniq module and utilise a table-tuple as the for-loop invariant state. Find the module on GitHub. Study the Lua documentation page regarding generic for semantics in order to fully understand how Lua’s loop iteration mechanism works.

Unique Indexed Pairs

The definition of uniq.ipairs appears below.

--- Iterates unique values in an array.
-- Skips duplicates. The implementation utilises a table by up-value reference.
-- Unique `pairs` does not make sense. Index-value pairs automatically have the
-- unique index property.
-- @tparam {any,...} Array of any values.
-- @treturn func Iterator function.
-- @treturn {{any,...},{[any]=bool}} Tuple of values and matches.
-- @treturn number Initial array index.
function _M.ipairs(values)
  return inext, { values, {} }, 0
end

It returns three things:

  1. The inext iteration function. See its local implementation below.
  2. The for-loop invariant state.
  3. The for-loop variant state.

The invariant state is a packed table comprising a tuple of (1) the original table and (2) a new empty table used to index the original table’s values.

Next Index and Value

The following local function does the heavy lifting. The implementation accepts the for loop’s invariant and variant states. It utilises the invariant state. The argument carries a table tuple. The first element references the original array. The second element references a mutating any-to-Boolean table used to record previously iterated values.

--- Answers the next index and unique value.
-- See [Semantics of the Generic `for`](https://www.lua.org/pil/7.2.html) for
-- details.
-- @tparam {{any,...},{[any]=bool}} forinvar For-loop invariant.
-- @tparam int forvar For-loop variant.
-- @treturn int Current index.
-- @treturn any Current unique value.
local function inext(forinvar, forvar)
  local values, matched = unpack(forinvar)
  local value
  repeat
    forvar = forvar + 1
    value = values[forvar]
    if value == nil then return end
  until not matched[value]
  matched[value] = true
  return forvar, value
end

Usage

A Busted test demonstrates how it works.

describe("uniq", function()
  local UNIQ = require "uniq"

  it("works", function()
    local actual = {}
    for index, value in UNIQ.ipairs { "a", "a", "a", 1, 2, 3, 3, 1, 1 } do
      table.insert(actual, { index, value })
    end
    assert.are.same({
      { 1, "a" },
      { 4, 1 },
      { 5, 2 },
      { 6, 3 },
    }, actual)
  end)
end)

Note that the order of the values does not change. Only the duplicate, or duplicates, disappear whether consecutive or not. Typically, an ipairs loop ignores the index.

Also, notice that the index of the iterated unique value changes depending on the number of duplicates. The iterator keeps skipping values while it finds a previously-matched value.

Conclusions

The iterator makes important assumptions. Firstly, it presumes that the values of the incoming array happily become the indices of another unique look-up table. Lua makes this possible for any type: strings, numbers, other tables, user data, and even functions. What therefore defines uniqueness depends on Lua’s table hashing functionality.

The solution efficiently re-uses Lua’s table hashing functionality. Values from the given table become indices in the unique matching table.

The uniq.ipairs function has precisely the same interface as standard Lua ipairs. The developer can readily interchange them. The standard implement delivers duplicates whereas the uniq version does not.

Memory-space complexity might prove tricky for very large array iterations. The full iteration accumulates a table containing all the values as indices with Boolean true as the corresponding value.