LZW without pre-populated string table / alphabet

11^th September 2022

Turns out I’ve made hobby of understanding and simplifying algorithms for quite a while now and wanted to try my hand at compression. So I took Rosetta Code’s LZW compression implementation and set out to understand it. First I used Swift generics to abstract from String to [Character], then to [Hashable] arrays, and finally to [Equatable] arrays.

All normal Swift stuff these days. However, while going through this process I realised that the normal LZW algorithm can be made even simpler through the use of enums to render the pre-populated string table / alphabet redundant.

I’m sure this implicit coding step (that isn’t part of the compression algorithm) has been noticed sometime between 1978 and now, but I found Swift extremely helpful in teasing LZW apart; I’m doubtful that other earlier generation languages would have helped me make so much progress.

Here’s the normal LZW using Swift generics and enums:
import LZW

let original = "TOBEORNOTTOBEORTOBEORNOT"

let compressed = LZW.compress(original.utf8.map({$0}))

print("compressed: \(compressed)")

// compressed: [.element(84), .element(79), .element(66), .element(69), .element(79),
//              .element(82), .element(78), .element(79), .element(84), .index(0),
//              .index(2), .index(4), .index(9), .index(3), .index(5), .index(7)]

let encoded = compressed.map { unit in
    switch unit {
    case .element(let value):
        return UInt(value)
    case .index(let index):
        return UInt(UInt8.max) + UInt(index)
    }
}

print("encoded: \(encoded)")

// encoded: [84, 79, 66, 69, 79, 82, 78, 79, 84, 256, 258, 260, 265, 259, 261, 263]
Here’s the project should you wish to break it: https://github.com/jjrscott/LZW.

PS I was actually trying to construct an integer only implementation of LZW al la Index based binary search but enums just shouted at me 🤷🏻‍♂️.