Searching for Fixed-Radius Near Neighbors with Cell Lists

Table of Contents

Introduction

This article defines the fixed-radius near neighbors search problem and why efficient algorithms are essential. We present two algorithms for solving the problem, Brute Force and Cell Lists. We explain both algorithms in-depth and analyze their performance. We accompanied this article with a Julia implementation of the Cell Lists algorithm, located in CellLists.jl repository. We can use the algorithm in a simulation or as a reference for implementing it in another language. Cell Lists is applicable in simulations with finite range interaction potentials where we are required to compute all close pairs in each time step, such as particle simulations. For long-range interaction potentials, we recommend reading about Barnes-Hut simulation and Fast multipole method.

Fixed-radius near neighbors is an old computer science problem closely related to sorting. We define the problem as finding all pairs of points within a certain distance apart, referred to as all close pairs. The author of 1 has surveyed different methods for solving the fixed-radius near neighbors problem. These methods include brute force, projection that is, dimension-wise sorting and querying using binary search, cell techniques, and k-d trees. The paper 2 analyzes the brute force, projection, and cell techniques. In the context of particle simulations, 3 discusses Verlet Lists, a method with similarities to Cell Lists. These methods have also been applied to large scale particle simulations using GPU by 4.

Similar to sorting, the computational complexities of the different methods in respect to the number of particles $n$ are $O(n^2)$ for the Brute Force approach, $O(n\log n)$ for comparison sort based approaches such as projection and k-d trees, and $O(n)$ for bucket sort based approaches such as cell techniques. However, the $O$-notation hides the constant that depends exponentially on the dimension $d$ for comparison and bucket-based approaches. Nevertheless, in low dimensional spaces, these methods can be significantly more efficient than the Brute Force.

Regarding analyzing algorithms, we recommend Introduction to Algorithms 5 textbook. Foundations, sorting, hash tables, multithreaded algorithms, and computational geometry sections are relevant to this article. The computational geometry section contains a chapter (33.4) about finding the closest pair of points in two dimensions, which is related to finding all close pairs.

Fixed-Radius Near Neighbors Problem

Let $I=\{1,…,n\}$ denote the indices of $nāˆˆā„•$ points, $r>0$ denote a fixed radius, and $š©_iāˆˆā„^d$ denote the coordinates of the point $iāˆˆI$ in $dāˆˆā„•$ dimensional Euclidean space and norm as $\|ā‹…\|$. The near neighbors of point $i$ are the distinct points $jāˆˆI, jā‰ i$ within radius $r,$ denoted as

$$ N_r(i) = \{jāˆˆIāˆ£jā‰ i, \|š©_i-š©_j\| \le r\}. \tag{1} \label{eq:1} $$

Using the definition of near neighbors, we define all close pairs as

$$ \begin{aligned} N_r &= \{(i,j)āˆ£iāˆˆI, jāˆˆN_r(i)\} \\ &=\{(i, j) \mid i,j\in I, i\ne j, \|š©_i-š©_j\| \le r\}. \end{aligned} \tag{2} \label{eq:2} $$

Because the norm is symmetric, that is $\|š©_i-š©_j\|=\|š©_j-š©_i\|,$ the order of $i$ and $j$ does not matter. We can simplify the all close pairs to symmetric all close pairs where we use the set $\{i, j\}$ to denote the inclusion of either $(i, j)$ or $(j, i),$ defined as

$$ \hat{N}_r=\{\{i, j\} \mid i,j\in I, i ā‰  j, \|š©_i-š©_j\| \le r\}. \tag{3} \label{eq:3} $$

The next sections discuss how to implement algorithms for fixed-radius near neighbors search.

Brute Force Algorithm

The Brute Force is the simplest algorithm for computing the near neighbors. It serves as a benchmark and building block for the Cell Lists algorithm, and it is useful for testing the correctness of more complex algorithms.

The Brute Force algorithm directly follows the definition symmetric all close pairs. It iterates over the points pairs, checks the distance condition, and yields all pairs that satisfy the condition. The following Julia code demonstrates the Brute Force approach.

@inline function distance_condition(p1::Vector{T}, p2::Vector{T}, r::T) where T <: AbstractFloat
    sum(@. (p1 - p2)^2) ā‰¤ r^2
end
function brute_force(p::Array{T, 2}, r::T) where T <: AbstractFloat
    ps = Vector{Tuple{Int, Int}}()
    n, d = size(p)
    for i in 1:(n-1)
        for j in (i+1):n
            if @inbounds distance_condition(p[i, :], p[j, :], r)
                push!(ps, (i, j))
            end
        end
    end
    return ps
end
n, d = 10, 2
r = 0.1
p = rand(n, d)
brute_force(p, r)

The Brute Force algorithm performs a total number of iterations:

$$B(n) = āˆ‘_{i=1}^{n-1} i = \frac{n(n - 1)}{2} \in O(n^2). \tag{4}$$

Since Brute Force has quadratic complexity, it does not scale well for many points, motivating the search of more efficient algorithms.

Cell Lists Algorithm

Definition

The Cell Lists algorithm works by partitioning the space into a grid with cell size $r$ and then associates each cell to the points that belong to it. The partitioning guarantees that all points within distance $r$ are inside a cell or its neighboring cells. We refer to the set of all such point pairs as cell list pairs $C.$ It contains all close pairs by the relation $\hat{N}_r āŠ† C āŠ† \hat{N}_{R},$ where $R=2r\sqrt{d}$ is the maximum distance between two points in neighboring cells. Therefore, we can find all close pairs by iterating over cell list pairs and checking the distance condition.

The Julia code is available as a GitHub repository at CellLists.jl.

We implement the Cell Lists algorithm in Julia language. The algorithm has two parts. The first one constructs the Cell List, and the second one iterates over it to produce the symmetric all close pairs.

Constructor

The CellList constructor maps each cell to the indices of points in that cell.

struct CellList{d}
    indices::Dict{CartesianIndex{d}, Vector{Int}}
end

function CellList(p::Array{T, 2}, r::T, offset::Int=0) where T <: AbstractFloat
    @assert r > 0
    n, d = size(p)
    cells = @. Int(fld(p, r))
    data = Dict{CartesianIndex{d}, Vector{Int}}()
    for j in 1:n
        @inbounds cell = CartesianIndex(cells[j, :]...)
        if haskey(data, cell)
            @inbounds push!(data[cell], j + offset)
        else
            data[cell] = [j + offset]
        end
    end
    CellList{d}(data)
end

Near Neighbors

We define the indices to the neighboring cells as follows:

@inline function neighbors(d::Int)
    n = CartesianIndices((fill(-1:1, d)...,))
    return n[1:fld(length(n), 2)]
end

We only include half of the neighboring cells to avoid double counting when iterating over the neighboring cells of all non-empty cells. We also define functions for brute-forcing over a set of points.

@inline function distance_condition(p1::Vector{T}, p2::Vector{T}, r::T) where T <: AbstractFloat
    sum(@. (p1 - p2)^2) ā‰¤ r^2
end

@inline function brute_force!(ps::Vector{Tuple{Int, Int}}, is::Vector{Int}, p::Array{T, 2}, r::T) where T <: AbstractFloat
    for (k, i) in enumerate(is[1:(end-1)])
        for j in is[(k+1):end]
            if @inbounds distance_condition(p[i, :], p[j, :], r)
                push!(ps, (i, j))
            end
        end
    end
end

@inline function brute_force!(ps::Vector{Tuple{Int, Int}}, is::Vector{Int}, js::Vector{Int}, p::Array{T, 2}, r::T) where T <: AbstractFloat
    for i in is
        for j in js
            if @inbounds distance_condition(p[i, :], p[j, :], r)
                push!(ps, (i, j))
            end
        end
    end
end

Finding all cell list pairs works by iterating over all non-empty cells and finding the pairs within that cell using Brute Force. Iterating over its neighboring cells checks if the cell is non-empty and finds the pairs between the current and the non-empty neighboring cell using Brute Force.

function near_neighbors(c::CellList{d}, p::Array{T, 2}, r::T) where d where T <: AbstractFloat
    ps = Vector{Tuple{Int, Int}}()
    offsets = neighbors(d)
    # Iterate over non-empty cells
    for (cell, is) in c.data
        # Pairs of points within the cell
        brute_force!(ps, is, p, r)
        # Pairs of points with non-empty neighboring cells
        for offset in offsets
            neigh_cell = cell + offset
            if haskey(c.data, neigh_cell)
                @inbounds js = c.data[neigh_cell]
                brute_force!(ps, is, js, p, r)
            end
        end
    end
    return ps
end

Usage

We can use the Cell Lists by first constructing the CellList and then query the symmetric all close pairs with near_neighbors.

n, d = 10, 2
r = 0.1
p = rand(n, d)
c = CellList(p, r)
indices = near_neighbors(c, p, r)

We receive the same output as for brute force.

indices2 = brute_force(p, r)
@assert Set(Set.(indices)) == Set(Set.(indices2))

In the next section, we analyze the Cell Lists algorithm’s performance compared to Brute Force and theoretical limits and discuss the assumptions when Cell Lists is practical.

Analysis

Number of Iterations

We denote the number of non-empty cells using

$$\hat{n}āˆˆ\{1,2,...,n\}. \tag{5}$$

Each non-empty cell has $n_1,n_2,…,n_{\hat{n}}$ points such that that the total number of points is $n=n_1+n_2+…+n_{\hat{n}}.$ Then, the number of iterations to find cell list pairs is

$$M(\hat{n})=āˆ‘_{i=1}^{\hat{n}} \left(B(n_i) + āˆ‘_{jāˆˆN(i)} n_iā‹…n_j \right)$$
$$= \textcolor{darkblue}{āˆ‘_{i=1}^{\hat{n}} B(n_i)} + \textcolor{darkred}{āˆ‘_{i=1}^{\hat{n}-1} āˆ‘_{jāˆˆN(i)} n_iā‹…n_j} \tag{6}$$

where $N(i)$ is half of the non-empty neighboring cells of cell $i$ where $N(\hat{n})=āˆ….$ We can iterate over the neighboring cell pairs (highlighted in red) in two ways.

Method 1: For each non-empty cell $iāˆˆ\{1,…,\hat{n}-1\},$ iterate over the remaining non-empty cells $j=\{i,…,\hat{n}\}$ and check if they are a neighbor, which takes $B(\hat{n})$ iterations.

Method 2: For each non-empty cell $iāˆˆ\{1,…,\hat{n}-1\},$ iterate over neighboring cells and check if they are non-empty (as in near_neighbors code), which takes $\hat{n}ā‹…m_d$ iterations where $m_d=(3^d-1)/2$ is the number of neighboring cells in dimension $d.$

We derive the dimensionality condition such that method 2 performs fewer iterations compared to method 1:

$$\begin{aligned} \hat{n}ā‹…m_d &< B(\hat{n}) \\ \hat{n} &> 3^d, \end{aligned}\tag{7}$$

When true, finding neighboring cells grows linearly, necessary for the Cell Lists to be efficient compared to Brute Force. Note that higher-dimensional spaces require exponentially higher numbers of non-empty cells $\hat{n}$ to satisfy the condition.

Worst-Case Instances

If one cell contains all the points, we have $\hat{n}=1,$ and $n_{1}=n,$ the number of iterations equal to Brute Force

$$M(1)=\textcolor{darkblue}{B(n_1)}+\textcolor{darkred}{0}=B(n). \tag{8}$$

If all points are in unique cells and we do not satisfy the dimensionality condition, we have $\hat{n}=n$ and $n_1,n_2,…n_{\hat{n}}=1$ and $\hat{n}ā‰¤3^d,$ the number of iterations equals to Brute Force

$$M(n)=\textcolor{darkblue}{\hat{n}}+\textcolor{darkred}{B(\hat{n})}=B(n)+n. \tag{9}$$

Curse of Dimensionality

Cell Lists also suffers from the curse of dimensionality. As we increase the dimension, the number of possible cells grows exponentially. That is,

$$āˆ_{i=1}^d \left(\frac{l_i}{r}\right) ā‰„ \left(\frac{l}{r}\right)^d, \tag{10}$$

where $l_1,l_2,…,l_{d}$ are the lengths of the minimum bounding box and $\min\{l_1,l_2,…,l_{d}\}=l>r$ is the minimum length. Therefore, if we draw two random cells from a non-exponential probability distribution where possible cells are its states, the probability that they are the same cell decreases exponentially as the dimension grows. Consequently, the number of non-empty cells approaches the number of points $\hat{n}ā†’n,$ and the dimensionality condition requires larger $\hat{n}$ to satisfy. Thus the performance approaches the worst-case.

Practical Instances

In many practical instances, we can assume a limit to how many points can occupy a cell. For example, two and three-dimensional physical simulations particles have finite size and non-overlapping and satisfy the dimensionality condition.

For a problem instance that satisfies the dimensionality condition $\hat{n}>3^d,$ and the number of points per cell is limited by $1ā‰¤n_1,n_2,…,n_{\hat{n}}ā‰¤\bar{n}$ where $\bar{n}$ is independent of $n.$ By assuming that every cell is filled $n_1,n_2,…,n_{\hat{n}}=\bar{n}$, we have $n=\hat{n}ā‹…\bar{n}$, and we obtain an upper bound for the number of iterations

$$ M(\hat{n}) ā‰¤ \textcolor{darkblue}{\hat{n}ā‹…B(\bar{n}) }+ \textcolor{darkred}{\hat{n}ā‹…m_dā‹…\bar{n}^2} ā‰¤ (m_d+1)ā‹…\bar{n}ā‹…n \tag{11} $$

With fixed dimension and maximum density, the number of iterations is linearly proportional to the number of points $O(n)$. The constant term can be large; therefore, we should always measure the Cell Lists’ performance against the Brute Force on real machines with realistic problem instances.

Multithreaded Cell Lists Algorithm

Multithreading

We can run the Cell Lists algorithm faster by using multithreading, which is a form of parallelism. Generally, we can think of parallelizing algorithms through divide and conquer. However, we do not need nested parallelism for Cell Lists. Therefore, we only consider a single step of divide and conquer, as follows:

  1. Divide: Partition input into multiple tasks. Different tasks should not write data to the same memory locations to avoid race conditions.
  2. Conquer: Schedule tasks to run each task on available threads.
  3. Combine: After all tasks have finished, combine the results by reducing them into one with a binary operation.

We use the same performance measure for multithreaded algorithms as Introduction to Algorithms 5. The book defines the running time $T_P$ of multithreaded computation using $P$ threads, work $T_1$ as the time to execute all tasks using a single thread, and span $T_āˆž$ as the time of the longest-running strand of computation. Speedup $T_1/T_P$ of a multithreaded algorithm tells how many times faster the computation is on $P$ threads than a single thread. Parallelism $T_1/T_āˆž$ gives an upper bound to the possible speedup $$T_1/T_Pā‰¤T_1/T_āˆž < P.$$

The construction of CellList is easy to divide into near equal-sized tasks, making it easy to parallelize efficiently. However, parallelizing near_neighbors efficiently is more complicated because we do not know the task sizes before iteration. We will use Julia’s Multithreading library.

Multithreaded Constructor

using Base.Threads

function Base.merge(c1::CellList{d}, c2::CellList{d}) where d
    CellList{d}(merge(vcat, c1.data, c2.data))
end

function CellList(p::Array{T, 2}, r::T, ::Val{:parallel}) where T <: AbstractFloat
    t = nthreads()
    n, d = size(p)
    # Divide
    cs = cumsum(fill(fld(n, t), t-1))
    parts = zip([0; cs], [cs; n])
    # Conquer
    results = Array{CellList{d}, 1}(undef, t)
    @threads for (i, (a, b)) in collect(enumerate(parts))
        results[i] = CellList(p[(a+1):b, :], r, a)
    end
    # Combine
    return reduce(merge, results)
end

The multithreaded constructor divides the points p into equal-sized parts, and for each part, it spawns a thread and constructs a CellList. Finally, it merges the resulting CellLists into one.

Multithreaded Near Neighbors

using Base.Threads
import Base.Threads.@spawn

function greedy_partition(m::Vector{Int}, n::Int)
    sizes = zeros(Int, n)
    subsets = [Vector{Int}() for _ in 1:n]
    for i in reverse(sortperm(m))
        _, j = findmin(sizes)
        sizes[j] += m[i]
        push!(subsets[j], i)
    end
    return subsets
end

function near_neighbors_part(c::CellList{d}, data, p, r, offsets) where d
    ps = Vector{Tuple{Int, Int}}()
    for (cell, is) in data
        # Pairs of points within the cell
        brute_force!(ps, is, p, r)
        # Pairs of points with non-empty neighboring cells
        for offset in offsets
            neigh_cell = cell + offset
            if haskey(c.data, neigh_cell)
                @inbounds js = c.data[neigh_cell]
                brute_force!(ps, is, js, p, r)
            end
        end
    end
    return ps
end

function p_near_neighbors(c::CellList{d}, p::Array{T, 2}, r::T; t::Int=nthreads()) where d where T <: AbstractFloat
    @assert t ā‰„ 1
    offsets = neighbors(d)
    data = collect(c.data)
    subsets = greedy_partition(@.(length(values(data))^2), t)
    tasks = Array{Task}(undef, t)
    @sync for (i, subset) in collect(enumerate(subsets))
        @async tasks[i] = @spawn near_neighbors_part(c, data[subset], p, r, offsets)
    end
    pts = fetch.(tasks)
    return reduce(vcat, pts)
end
The multithreaded near neighbors algorithm is experimental, and it’s not guaranteed to always perform better than the serial code. You can try to improve the code or suggesting improvements in the comments.

The number of iterations done by the brute brute_force! is quadratically proportional to the input size. Therefore, we divide the iterations into t equal-sized parts by the quadratic number of points in each cell, aiming to balance the workload per task. By increasing t, we can decrease span and improve parallelism, but with a cost of increased overhead. We should choose t as a multiple of the number of threads available, for example, t=3*nthreads().

Usage

We can use the multithreaded algorithms for Cell Lists as follows:

using Base.Threads
n, d = 10000, 2
r = 0.05
p = rand(n, d)
c = CellList(p, r, Val(:parallel))
indices = p_near_neighbors(c, p, r; t=3*nthreads())

If p_near_neighbors is not performing well, you can use the series version near_neighbors instead. We can also iterate over the near neighbors using multithreading.

@threads for (i, j) in indices
    # perform computations on near neighbors p[i, :] and p[j, :]
end

When running your code, remember to set the number of threads before executing Julia.

export JULIA_NUM_THREADS=2 && julia

Benchmarks

Method

We benchmarked each algorithm against multiple different instances of points with fixed size $n$, dimension $d$, and radius $r.$ We draw the coordinate values of each instance of points from a uniform distribution between zero and one. We measured the performance of each instance using BenchmarkTools.jl and selected the median time. For the resulting set of measurements, we computed the mean and quantiles with probability levels 0.01, and 0.99 to account for the variation between instances. We included the benchmarking code in the CellLists.jl repository under benchmarks directory.

Cell Lists vs. Brute Force

Time in microseconds as a function of number of points $n$ for dimension $d=2$ and radius $r=0.01$.
Time in microseconds as a function of number of points $n$ for dimension $d=3$ and radius $r=0.01$.
Time in microseconds as a function of number of points $n$ for dimension $d=4$ and radius $r=0.01$.
Time in microseconds as a function of number of points $n$ for dimension $d=5$ and radius $r=0.01$.

The figures above show how Cell Lists compares to Brute Force as we increase dimension with a constant radius. As can be seen, Cell Lists performs better in low dimensions and beat Brute Force with a small number of points, but requires more points with a larger dimension.

Effects of Dimensionality

Time in microseconds as a function of number of points $n$ for dimensions $d=2,…,6$ and radius $r=0.05$.

The figure above shows the relation between Cell Lists with increasing dimension and constant radius. As we can see, the increase in time scales roughly to $3^d.$

Single vs. Multithreaded Constructor

Serial time $T_1$ vs. multithreaded time $T_2$ for constructor in microseconds as a function of number of points $n$ for dimension $d=2$ and radius $r=0.01$.
The speedup $T_2/T_1$ of multithreaded vs. serial constructor as a function of number of points $n$ for dimension $d=2$ and radius $r=0.01$.

As we can see, multithreading significantly speeds up the Cell Lists construction. Dimensionality does not have significant effect on the construction time.

Single vs. Multithreaded Near Neighbors

Currently, my benchmarking code is not returning consistent benchmarks with multi-threaded near neighbors. I will update the figures once the results are sensible.

Conclusions

I first encountered the fixed-radius near neighbors problem while developing the Crowd Dynamics simulation, which is mechanistically similar to molecular simulation with finite distance interactions (local interactions). The algorithms presented in this article can be used for research purposes or building small scale simulations. Large scale simulations may require additional improvements such as parallelism or modifying the Cell Lists into Verlet Lists.

Contribute

If you enjoyed or found benefit from this article, it would help me share it with other people who might be interested. If you have any feedback, improvement suggestions, or constructive criticism, you can mention them in the comment section. For example, if you find that the article is missing something essential or has mistakes, you can suggest improvements or source material. If I decide to add the improvements, I will add attribute you and reference to the source.

For more content, check out my YouTube channel and join my newsletter . Since creating content and open-source libraries take time and effort, consider becoming a sponsor .

References


  1. Bentley, J. L. (1975). A Survey of techniques for fixed radius near neighbor searching, 37. ↩︎

  2. Mount, D. (2005). Geometric Basics and Fixed-Radius Near Neighbors, 1ā€“8. ↩︎

  3. Yao, Z., Wang, J. S., Liu, G. R., & Cheng, M. (2004). Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method. Computer Physics Communications, 161(1ā€“2), 27ā€“35. https://doi.org/10.1016/j.cpc.2004.04.004 ↩︎

  4. Hoetzlein, R. (2014). Fast Fixed-Radius Nearest Neighbors: Interactive Million-Particle Fluids. GPU Technology Conference. ↩︎

  5. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. ↩︎

Jaan Tollander de Balsch
Jaan Tollander de Balsch
Computer Science & Applied Mathematics

Jaan Tollander de Balsch is a computer scientist with a background in applied mathematics.

comments powered by Disqus