LZW without pre-populated string table / alphabet
Turns out I’ve made hobby of understanding and simplifying algorithms for quite a while now and wanted to try my hand at compression. So I took Rosetta Code’s LZW compression implementation and set out to understand it. First I used Swift generics to abstract from
String
to[Character]
, then to[Hashable]
arrays, and finally to[Equatable]
arrays.All normal Swift stuff these days. However, while going through this process I realised that the normal LZW algorithm can be made even simpler through the use of enums to render the pre-populated string table / alphabet redundant.
I’m sure this implicit coding step (that isn’t part of the compression algorithm) has been noticed sometime between 1978 and now, but I found Swift extremely helpful in teasing LZW apart; I’m doubtful that other earlier generation languages would have helped me make so much progress.
Here’s the normal LZW using Swift generics and enums:
import LZW let original = "TOBEORNOTTOBEORTOBEORNOT" let compressed = LZW.compress(original.utf8.map({$0})) print("compressed: \(compressed)") // compressed: [.element(84), .element(79), .element(66), .element(69), .element(79), // .element(82), .element(78), .element(79), .element(84), .index(0), // .index(2), .index(4), .index(9), .index(3), .index(5), .index(7)] let encoded = compressed.map { unit in switch unit { case .element(let value): return UInt(value) case .index(let index): return UInt(UInt8.max) + UInt(index) } } print("encoded: \(encoded)") // encoded: [84, 79, 66, 69, 79, 82, 78, 79, 84, 256, 258, 260, 265, 259, 261, 263]
Here’s the project should you wish to break it: https://github.com/jjrscott/LZW.
PS I was actually trying to construct an integer only implementation of LZW al la Index based binary search but enums just shouted at me 🤷🏻♂️.