struct Pf::USet32

Overview

A thread-safe, persistent set of 32-bit unsigned integers.

TODO The current implementation is far more stupid versus e.g. Pf::Set wrt no-change semantics. In other words, you will pay even if you #add something that already exists. Optimization of all set operations wrt no-change remains future work.

How it works?

The most important takeaway is that with USet32, integer magnitude directly affects performance.

See the diagram below.

                (0-31)             (0-15)           (0-15)
LSB          bitmap index         n0 index         n2 index       MSB
               ─────────           ───────         ───────
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   ───────────           ─────────         ───────         ───────
    bit index           chunk index        n1 index        n3 index
      (0-63)              (0-31)            (0-15)          (0-15)




    ┌─────┐                        ▲
    │N3   │                        │
    └──┬──┘                        │
       │  16-way branch            │
       ▼                           │
    ┌─────┐                        │
    │N2   │                        │
    └──┬──┘                        │
       │  16-way branch            │
       ▼                           │
    ┌─────┐                        │
    │N1   │                        │
    └──┬──┘                        │
       │  16-way branch            │
       ▼                           │
    ┌─────┐                        │
    │N0   │                        │
    └──┬──┘                        │
       │  16-way branch            │  instantiated on-demand
       ▼                           │  as the numbers grow
  ┌─────────┐                      │
  │wide node│                      │
  └────┬────┘                      │
       │  32-way branch            │
       ▼                           │
  ┌─────────┐                      │
  │  chunk  │                      │
  └────┬────┘                      │
       │  32-way branch            │
       ▼                           │
  ┌───────────────────────────┐    │
  │ bitmap                    │    │
  │                           │    │
  │ 0 0 0 0 0 0 0 ... 0 0 0 0 │    │
  │ ───────────────────────── │    │
  │          64 bits          │    │
  └───────────────────────────┘    │

USet32 effectively "reifies" the number tree (each number is a path through that tree), providing an indexing data structure on top of the u32 number line.

Note how this implementation penalizes large numbers or sets containing large numbers (numbers far away from origin).

Bitmaps are very cheap, so if your numbers are frequently in the 0-63 range, you're well set. The larger your number, the more allocation overhead you will have and the "taller" the tree will grow.

Everything from chunk upwards adds an allocation / a pointer dereference. On one hand, that's nice because you will copy less and share more. On the other hand, modern CPUs penalize pointer chasing so you're going to suffer: dozens of nanoseconds up to 100ns for an insertion is not unheard of.

We will probably provide support for USet64-256 in the future, because as you can see, this construction scales rather trivially; and USet256 especially would be useful for storing hashes such as SHA256 and BLAKE3. In that case, easing indirection and control overhead would become a high-priority goal.

Performance

Let's add all i32s. We can't add all u32s because BitArray crashes on that. We don't want to crash our only competitor in Crystal-land.

require "benchmark"
require "bit_array"

Benchmark.ips do |x|
  x.report("uset32") do
    _ = Pf::USet32.transaction do |set|
      (0..Int32::MAX).each do |n|
        set << n.to_u32
      end
    end
  end

  x.report("bit array") do
    bits = BitArray.new(Int32::MAX)
    (0..Int32::MAX).each do |n|
      bits.unsafe_put(n, true)
    end
  end
end

This reports the following on my machine (Ryzen 3 2200G):

   uset32   9.28m (107.77s ) (± 0.00%)  4.79GB/op  46.95× slower
bit array 435.66m (  2.30s ) (± 1.14%)   256MB/op        fastest

Horrendous, isn't it? Except remember that bit array allocates all the memory upfront, and does a neat sequ pass where it doesn't follow even a single pointer. Paradise for a modern CPU!

Now, USet32 is a dynamic, immutable set that is explicitly designed in such a way that large numbers are penalized (versus, say, Roaring bitmaps, where magnitude does not matter and instead small bitmaps are penalized). USet32 lets you work with small numbers practically for free.

If you have a different origin far from zero in your domain, you can normalize the numbers before inserting them into USet32. To support negative numbers, create two sets: one for negative numbers and another for positive ones. This is also the way to store Int32s efficiently, since their raw binary representation is very far from zero if cast to UInt32.

USet32 is particularly well-suited for recording small, sequential ids, with an occasional immutable add or set operation. USet32 is suited for situations where you have thousands of small immutable sets, especially when they have a common ancestor.

In fact, USet32 works beautifully well for set operations, since, point A, it's a trie, and point B, we have bitmaps at the leaves. This is almost perfect for any kind of set operation, be it union, intersection, difference, etc.

Consider, for instance, the following micro-benchmarks:

require "benchmark"

xs_r = (0u32...100_000u32).to_a.shuffle!
xs_uset = xs_r.to_pf_uset32
xs_set = xs_r.to_set

ys_r = (0u32...20_000u32).to_a.shuffle!
ys_uset = ys_r.to_pf_uset32
ys_set = ys_r.to_set

zs_r = (80_000u32...150_000u32).to_a.shuffle!
zs_uset = zs_r.to_pf_uset32
zs_set = zs_r.to_a.shuffle!.to_set

Benchmark.ips do |x|
  x.report("USet32: create 100k") { xs_r.to_pf_uset32 }
  x.report("Set: create 100k") { xs_r.to_set }
end

Benchmark.ips do |x|
  x.report("USet32: subset") { ys_uset.subset_of?(xs_uset) }
  x.report("Set: subset") { ys_set.subset_of?(xs_set) }
end

Benchmark.ips do |x|
  x.report("USet32: xs union zs") { xs_uset | zs_uset }
  x.report("Set: xs union zs") { xs_set | zs_set }
end

Benchmark.ips do |x|
  x.report("USet32: xs intersects? zs") { xs_uset.intersects?(zs_uset) }
  x.report("Set: xs intersects? zs") { xs_set.intersects?(zs_set) }
end

Benchmark.ips do |x|
  x.report("USet32: xs intersect zs") { xs_uset & zs_uset }
  x.report("Set: xs intersect zs") { xs_set & zs_set }
end

Benchmark.ips do |x|
  x.report("USet32: xs difference zs") { xs_uset - zs_uset }
  x.report("Set: xs difference zs") { xs_set - zs_set }
end

They give the following results on my machine:

USet32: create 100k 336.02  (  2.98ms) (± 1.15%)  231kB/op   1.49× slower
   Set: create 100k 500.30  (  2.00ms) (± 2.98%)  2.0MB/op        fastest

USet32: subset  25.07M ( 39.88ns) (±12.30%)  32.0B/op          fastest
   Set: subset   2.61k (382.91µs) (± 0.71%)   0.0B/op  9600.69× slower

USet32: xs union zs 543.26k (  1.84µs) (± 3.44%)  3.38kB/op          fastest
   Set: xs union zs 189.07  (  5.29ms) (± 1.72%)   6.0MB/op  2873.25× slower

# NOTE: the quality of this benchmark needs improvement because it hits the fast
# path in both cases.
USet32: xs intersects? zs  82.90M ( 12.06ns) (± 1.57%)  0.0B/op   1.07× slower
   Set: xs intersects? zs  89.11M ( 11.22ns) (± 1.76%)  0.0B/op        fastest

USet32: xs intersect zs 592.29k (  1.69µs) (± 5.73%)  2.79kB/op          fastest
   Set: xs intersect zs 340.63  (  2.94ms) (± 7.84%)   769kB/op  1738.79× slower

USet32: xs difference zs   1.24M (804.59ns) (± 6.80%)    608B/op          fastest
   Set: xs difference zs 147.97  (  6.76ms) (± 4.26%)  3.75MB/op  8399.25× slower

TODO compare with BitArray and CRoaring. We will likely win over BitArray but lose to CRoaring due to indirection, although probably only on very large bitmaps.

USet32 makes set operations and checks so cheap one almost wants to forgive it the horrendous construction performance.

Obviously I'm not going to compare Set's dup.add with #add because that's not how the world works; this will be as humiliating for sets as construction perf is for USet32. Everything has its trade-offs.

NOTE You are recommended to compile with --mcpu=native to make sure the Crystal compiler and LLVM consider vectorization if your CPU supports it. Or maybe they consider it anyway; however, my timings were slightly slower without the flag.

Inspired by

Included Modules

Defined in:

permafrost/uset32.cr

Constructors

Instance Method Summary

Instance methods inherited from module Enumerable(UInt32)

to_pf_bidi to_pf_bidi, to_pf_map(& : T -> Tuple(K, V)) : Pf::Map(K, V) forall K, V
to_pf_map
to_pf_map
, to_pf_set : Pf::Set(T) to_pf_set, to_pf_uset32 : Pf::USet32 to_pf_uset32

Constructor Detail

def self.[] : USet32 #

Constructs an empty set.


[View source]
def self.[](*values : UInt32) : USet32 #

Constructs a set with the given values.


[View source]
def self.additive_identity : self #

[View source]
def self.new(ee : Enumerable(UInt32)) : USet32 #

Construct a set from the given enumerable ee.


[View source]
def self.new : USet32 #

Constructs an empty set.


[View source]
def self.transaction(& : Commit -> ) : USet32 #

Lets you build a possibly large set using Commit, which offers mutable versions of methods from USet32.

WARNING The yielded Commit must, under no circumstances, outlive the block or escape it otherwise, through retained closures, instance variables, etc. It uses pointers into stack-allocated memory, so using it after the block returns will lead to UB.

set = Pf::USet32.transaction do |commit|
  commit << 1
  commit << 2
  if commit.includes?(2)
    commit << 3
  end
end

set # Pf::USet32[1, 2, 3]

[View source]

Instance Method Detail

def &(other : USet32) : USet32 #

Intersection: returns a new set containing integers common to this and other sets.

Pf::USet32[1, 2, 3] & Pf::USet32[2, 3, 4] # => Pf::USet32[2, 3]

[View source]
def +(other : USet32) : USet32 #

Alias of #|.


[View source]
def -(other : USet32) : USet32 #

Difference: returns a new set containing integers from this set that are not in the other set.

Pf::USet32[1, 2, 3] - Pf::USet32[2, 3, 4] # => Pf::USet32[1]

[View source]
def ==(other : USet32) : Bool #

[View source]
def |(other : USet32) : USet32 #

Union: returns a new set containing integers from this and other sets.

Pf::USet32[1, 2, 3] | Pf::USet32[4, 5, 6] # => Pf::USet32[1, 2, 3, 4, 5, 6]

[View source]
def add(value : UInt32) : USet32 #

Returns a copy of this set with value present.

set = Pf::USet32[1, 2, 3]

set.add(4) # => Pf::USet32[1, 2, 3, 4]
set        # => Pf::USet32[1, 2, 3]

[View source]
def add?(value : UInt32) : Tuple(USet32, Bool) #

Returns a copy of this set with value present, followed by a boolean indicating whether value was added during the call.

set = Pf::USet32[1, 2, 3]

set.add?(4) # => {Pf::USet32[1, 2, 3, 4], true}
set.add?(1) # => {Pf::USet32[1, 2, 3], false}

set # => Pf::USet32[1, 2, 3]

[View source]
def delete(value : UInt32) : USet32 #

Returns a copy of this set without value.

set = Pf::USet32[1, 2, 3]

set.delete(2) # => Pf::USet32[1, 3]
set           # => Pf::USet32[1, 2, 3]

[View source]
def each(& : UInt32 -> ) : Nil #

Yields integers stored in this set.


[View source]
def hash(hasher) #
Description copied from struct Struct

See Object#hash(hasher)


[View source]
def includes?(value) : Bool #

Returns true if this set contains value. Returns false otherwise.

set = Pf::USet32[1, 2, 3]
set.includes?(2)   # => true
set.includes?(100) # => false

[View source]
def inspect(io) #

[View source]
def intersects?(other : USet32) : Bool #

Returns true if this set has common integers with other. Returns false otherwise.

xs = Pf::USet32[1, 2, 3]
ys = Pf::USet32[4, 5, 6]
zs = Pf::USet32[3, 5, 6]

xs.intersects?(ys) # => false, nothing in common
xs.intersects?(zs) # => true, they both have `3`

[View source]
def pretty_print(pp) #

[View source]
def proper_subset_of?(other : USet32) : Bool #

Returns true if all integers in this set are contained in other, and other is larger than this set. Returns false otherwise.

xs = Pf::USet32[1, 2, 3]
ys = Pf::USet32[1, 2, 3, 4, 5, 6]

xs.proper_subset_of?(ys) # => true

ys.subset_of?(ys)        # => true
ys.proper_subset_of?(ys) # => false

[View source]
def proper_superset_of?(other : USet32) : Bool #

Returns true if this set contains all integers from other, and other is smaller than this set. Returns false otherwise.

xs = Pf::USet32[1, 2, 3]
ys = Pf::USet32[1, 2, 3, 4, 5, 6]

ys.proper_superset_of?(xs) # => true

ys.superset_of?(ys)        # => true
ys.proper_superset_of?(ys) # => false

[View source]
def size : Int32 #

Returns the number of integers in this set.

set = Pf::USet32[1, 2, 3]
set.size # => 3

[View source]
def subset_of?(other : USet32) : Bool #

Returns true if all integers in this set are contained on other. Returns false otherwise.

xs = Pf::USet32[1, 2, 3]
ys = Pf::USet32[1, 2, 3, 4, 5, 6]

xs.subset_of?(ys) # => true
ys.subset_of?(xs) # => false

[View source]
def superset_of?(other : USet32) : Bool #

Returns true if this set contains all integers from other. Returns false otherwise.

xs = Pf::USet32[1, 2, 3]
ys = Pf::USet32[1, 2, 3, 4, 5, 6]

xs.superset_of?(ys) # => false
ys.superset_of?(xs) # => true

[View source]
def to_s(io) #

[View source]