Swift Copy-on-Write PSA: Mutating Dictionary Entries

Yesterday, the ever-insightful Tim Vermeulen pointed out a significant performance bug in some of the iOS-Developers production code. Swift's copy-on-write system usually helps efficiency by delaying the copying of structs until it's actually needed. However, as shown here, you can run into some pitfalls if you aren't vigilant of what's going on under the hood.

The code in question performed a common task: modify and update an entry in a dictionary.

public extension Sequence {
    /**
     Groups the elements in a `Sequence` by the provided `condition`
     */
    public func grouped<T: Hashable>(by condition: (Iterator.Element) -> T) -> [T: [Iterator.Element]] {
        var result: [T: [Iterator.Element]] = [:]
        for item in self {
            let key = condition(item)
            var items = result[key] ?? []
            items.append(item)
            result[key] = items
        }
        return result
    }
}

The "gotcha" here is that at the time of append, the array is referenced twice – once by the dictionary, and once by the variable items. This triggers copy-on-write, meaning that the entire array is copied to a new location in memory, even though your intention is to mutate a single array.

A simple fix is to just remove the dictionary entry before mutating:

public func groupedNoCopy<T: Hashable>(by condition: (Iterator.Element) -> T) -> [T: [Iterator.Element]] {
    var result: [T: [Iterator.Element]] = [:]
    for item in self {
        let key = condition(item)
        var items = result.removeValue(forKey: key) ?? []
        items.append(item)
        result[key] = items
    }
    return result
}

We would expect this to outperform the original more and more as the number of items in each group increases. Running some benchmarks (under release config) confirms this. I grouped 100,000 items into either 12,000 groups or 2 groups – the more groups there are, the fewer items in each group, and therefore fewer copy-on-write actions are taken.

Grouping with copy - many groups: 0.0258380174636841 seconds
Grouping with copy - few groups: 1.02893602848053 seconds
Grouping without copy - many groups: 0.024412989616394 seconds
Grouping without copy - few groups: 0.00962799787521362 seconds

At the high end, this change results in a 100x difference! This proves that it is important to pay attention to your value and reference semantics even in everyday snippets like this.

Bonus Enhancement

Trevor Squires noted that it would also be significantly faster by putting the group arrays in a reference-type wrapper and skipping the dictionary remove/insert pair altogether. I put together a test and indeed, it is consistently 2-5x faster this way, for situations with both many or few groups.

class SharedArray<T> {
    var storage: [T] = []
    func append(_ value: T) {
        storage.append(value)
    }
}

public func groupedWithWrapper<T: Hashable>(by condition: (Iterator.Element) -> T) -> [T: [Iterator.Element]] {
    var groups: [T: SharedArray<Iterator.Element>] = [:]
    for item in self {
        let key = condition(item)
        if let items = groups[key] {
            items.append(item)
        } else {
            let items = SharedArray<Iterator.Element>()
            items.append(item)
            groups[key] = items
        }
    }
    var result: [T: [Iterator.Element]] = [:]
    for group in groups {
        result[group.key] = group.value.storage
    }
    return result
}
Grouping with wrapper - many groups: 0.0156880021095276 seconds
Grouping with wrapper - few groups: 0.00186103582382202 seconds

View the code for these benchmarks here.