Lua equips two primary iterator functions, pairs
and ipairs
. The first finds all index-value pairs within a table, answering each index and value. The second finds continuous numerically-indexed table pairs starting at \(1\), again answering the numerical index and its associated value.
Problem
In Lua, it is sometimes useful to iterate an array of values while skipping duplicates. How does the Lua developer implement this requirement simply and efficiently?
Solution
The answer to the problem requires two functions:
uniq.ipairs
inext
local function
They belong to a uniq
module and utilise a table-tuple as the for
-loop invariant state. Find the module on GitHub. Study the Lua documentation page regarding generic for
semantics in order to fully understand how Lua’s loop iteration mechanism works.
Unique Indexed Pairs
The definition of uniq.ipairs
appears below.
--- Iterates unique values in an array.
-- Skips duplicates. The implementation utilises a table by up-value reference.
-- Unique `pairs` does not make sense. Index-value pairs automatically have the
-- unique index property.
-- @tparam {any,...} Array of any values.
-- @treturn func Iterator function.
-- @treturn {{any,...},{[any]=bool}} Tuple of values and matches.
-- @treturn number Initial array index.
function _M.ipairs(values)
return inext, { values, {} }, 0
end
It returns three things:
- The
inext
iteration function. See its local implementation below. - The
for
-loop invariant state. - The
for
-loop variant state.
The invariant state is a packed table comprising a tuple of (1) the original table and (2) a new empty table used to index the original table’s values.
Next Index and Value
The following local function does the heavy lifting. The implementation accepts the for
loop’s invariant and variant states. It utilises the invariant state. The argument carries a table tuple. The first element references the original array. The second element references a mutating any-to-Boolean table used to record previously iterated values.
--- Answers the next index and unique value.
-- See [Semantics of the Generic `for`](https://www.lua.org/pil/7.2.html) for
-- details.
-- @tparam {{any,...},{[any]=bool}} forinvar For-loop invariant.
-- @tparam int forvar For-loop variant.
-- @treturn int Current index.
-- @treturn any Current unique value.
local function inext(forinvar, forvar)
local values, matched = unpack(forinvar)
local value
repeat
forvar = forvar + 1
value = values[forvar]
if value == nil then return end
until not matched[value]
matched[value] = true
return forvar, value
end
Usage
A Busted test demonstrates how it works.
describe("uniq", function()
local UNIQ = require "uniq"
it("works", function()
local actual = {}
for index, value in UNIQ.ipairs { "a", "a", "a", 1, 2, 3, 3, 1, 1 } do
table.insert(actual, { index, value })
end
assert.are.same({
{ 1, "a" },
{ 4, 1 },
{ 5, 2 },
{ 6, 3 },
}, actual)
end)
end)
Note that the order of the values does not change. Only the duplicate, or duplicates, disappear whether consecutive or not. Typically, an ipairs
loop ignores the index.
Also, notice that the index of the iterated unique value changes depending on the number of duplicates. The iterator keeps skipping values while it finds a previously-matched value.
Conclusions
The iterator makes important assumptions. Firstly, it presumes that the values of the incoming array happily become the indices of another unique look-up table. Lua makes this possible for any type: strings, numbers, other tables, user data, and even functions. What therefore defines uniqueness depends on Lua’s table hashing functionality.
The solution efficiently re-uses Lua’s table hashing functionality. Values from the given table become indices in the unique matching table.
The uniq.ipairs
function has precisely the same interface as standard Lua ipairs
. The developer can readily interchange them. The standard implement delivers duplicates whereas the uniq
version does not.
Memory-space complexity might prove tricky for very large array iterations. The full iteration accumulates a table containing all the values as indices with Boolean true
as the corresponding value.