# Searching for Fixed-Radius Near Neighbors with Cell Lists

## Table of Contents

## Introduction

This article defines the *fixed-radius near neighbors search* problem and why efficient algorithms are essential. We present two algorithms for solving the problem, *Brute Force* and *Cell Lists*. We explain both algorithms in-depth and analyze their performance. We accompanied this article with a Julia implementation of the Cell Lists algorithm, located in
**CellLists.jl** repository. We can use the algorithm in a simulation or as a reference for implementing it in another language. Cell Lists is applicable in simulations with finite range interaction potentials where we are required to compute all close pairs in each time step, such as particle simulations. For long-range interaction potentials, we recommend reading about
Barnes-Hut simulation and
Fast multipole method.

Fixed-radius near neighbors is an old computer science problem closely related to sorting. We define the problem as finding all pairs of points within a certain distance apart, referred to as *all close pairs*. The author of ^{1} has surveyed different methods for solving the fixed-radius near neighbors problem. These methods include brute force, projection that is, dimension-wise sorting and querying using binary search, cell techniques, and
k-d trees. The paper ^{2} analyzes the brute force, projection, and cell techniques. In the context of particle simulations, ^{3} discusses Verlet Lists, a method with similarities to
Cell Lists. These methods have also been applied to large scale particle simulations using GPU by ^{4}.

Similar to sorting, the computational complexities of the different methods in respect to the number of particles $n$ are $O(n^2)$ for the Brute Force approach, $O(n\log n)$ for comparison sort based approaches such as projection and k-d trees, and $O(n)$ for bucket sort based approaches such as cell techniques. However, the $O$-notation hides the constant that depends exponentially on the dimension $d$ for comparison and bucket-based approaches. Nevertheless, in low dimensional spaces, these methods can be significantly more efficient than the Brute Force.

Regarding analyzing algorithms, we recommend *Introduction to Algorithms* ^{5} textbook. Foundations, sorting, hash tables, multithreaded algorithms, and computational geometry sections are relevant to this article. The computational geometry section contains a chapter (33.4) about *finding the closest pair of points* in two dimensions, which is related to finding all close pairs.

## Fixed-Radius Near Neighbors Problem

Let $I=\{1,…,n\}$ denote the indices of $nāā$ points, $r>0$ denote a fixed radius, and $š©_iāā^d$ denote the coordinates of the point $iāI$ in $dāā$ dimensional Euclidean space and norm as $\|ā
\|$. The **near neighbors** of point $i$ are the distinct points $jāI, jā i$ within radius $r,$ denoted as

Using the definition of near neighbors, we define **all close pairs** as

Because the norm is symmetric, that is $\|š©_i-š©_j\|=\|š©_j-š©_i\|,$ the order of $i$ and $j$ does not matter. We can simplify the all close pairs to **symmetric all close pairs** where we use the set $\{i, j\}$ to denote the inclusion of either $(i, j)$ or $(j, i),$ defined as

The next sections discuss how to implement algorithms for fixed-radius near neighbors search.

## Brute Force Algorithm

The Brute Force is the simplest algorithm for computing the near neighbors. It serves as a benchmark and building block for the Cell Lists algorithm, and it is useful for testing the correctness of more complex algorithms.

The Brute Force algorithm directly follows the definition symmetric all close pairs. It iterates over the points pairs, checks the distance condition, and yields all pairs that satisfy the condition. The following Julia code demonstrates the Brute Force approach.

```
@inline function distance_condition(p1::Vector{T}, p2::Vector{T}, r::T) where T <: AbstractFloat
sum(@. (p1 - p2)^2) ā¤ r^2
end
```

```
function brute_force(p::Array{T, 2}, r::T) where T <: AbstractFloat
ps = Vector{Tuple{Int, Int}}()
n, d = size(p)
for i in 1:(n-1)
for j in (i+1):n
if @inbounds distance_condition(p[i, :], p[j, :], r)
push!(ps, (i, j))
end
end
end
return ps
end
```

```
n, d = 10, 2
r = 0.1
p = rand(n, d)
brute_force(p, r)
```

The Brute Force algorithm performs a total number of iterations:

$$B(n) = ā_{i=1}^{n-1} i = \frac{n(n - 1)}{2} \in O(n^2). \tag{4}$$

Since Brute Force has quadratic complexity, it does not scale well for many points, motivating the search of more efficient algorithms.

## Cell Lists Algorithm

### Definition

The Cell Lists algorithm works by partitioning the space into a grid with cell size $r$ and then associates each cell to the points that belong to it. The partitioning guarantees that all points within distance $r$ are inside a cell or its neighboring cells. We refer to the set of all such point pairs as **cell list pairs** $C.$ It contains all close pairs by the relation $\hat{N}_r ā C ā \hat{N}_{R},$ where $R=2r\sqrt{d}$ is the maximum distance between two points in neighboring cells. Therefore, we can find all close pairs by iterating over cell list pairs and checking the distance condition.

**CellLists.jl**.

We implement the Cell Lists algorithm in Julia language. The algorithm has two parts. The first one constructs the Cell List, and the second one iterates over it to produce the symmetric all close pairs.

### Constructor

The `CellList`

constructor maps each cell to the indices of points in that cell.

```
struct CellList{d}
indices::Dict{CartesianIndex{d}, Vector{Int}}
end
function CellList(p::Array{T, 2}, r::T, offset::Int=0) where T <: AbstractFloat
@assert r > 0
n, d = size(p)
cells = @. Int(fld(p, r))
data = Dict{CartesianIndex{d}, Vector{Int}}()
for j in 1:n
@inbounds cell = CartesianIndex(cells[j, :]...)
if haskey(data, cell)
@inbounds push!(data[cell], j + offset)
else
data[cell] = [j + offset]
end
end
CellList{d}(data)
end
```

### Near Neighbors

We define the indices to the neighboring cells as follows:

```
@inline function neighbors(d::Int)
n = CartesianIndices((fill(-1:1, d)...,))
return n[1:fld(length(n), 2)]
end
```

We only include half of the neighboring cells to avoid double counting when iterating over the neighboring cells of all non-empty cells. We also define functions for brute-forcing over a set of points.

```
@inline function distance_condition(p1::Vector{T}, p2::Vector{T}, r::T) where T <: AbstractFloat
sum(@. (p1 - p2)^2) ā¤ r^2
end
@inline function brute_force!(ps::Vector{Tuple{Int, Int}}, is::Vector{Int}, p::Array{T, 2}, r::T) where T <: AbstractFloat
for (k, i) in enumerate(is[1:(end-1)])
for j in is[(k+1):end]
if @inbounds distance_condition(p[i, :], p[j, :], r)
push!(ps, (i, j))
end
end
end
end
@inline function brute_force!(ps::Vector{Tuple{Int, Int}}, is::Vector{Int}, js::Vector{Int}, p::Array{T, 2}, r::T) where T <: AbstractFloat
for i in is
for j in js
if @inbounds distance_condition(p[i, :], p[j, :], r)
push!(ps, (i, j))
end
end
end
end
```

Finding all cell list pairs works by iterating over all non-empty cells and finding the pairs within that cell using Brute Force. Iterating over its neighboring cells checks if the cell is non-empty and finds the pairs between the current and the non-empty neighboring cell using Brute Force.

```
function near_neighbors(c::CellList{d}, p::Array{T, 2}, r::T) where d where T <: AbstractFloat
ps = Vector{Tuple{Int, Int}}()
offsets = neighbors(d)
# Iterate over non-empty cells
for (cell, is) in c.data
# Pairs of points within the cell
brute_force!(ps, is, p, r)
# Pairs of points with non-empty neighboring cells
for offset in offsets
neigh_cell = cell + offset
if haskey(c.data, neigh_cell)
@inbounds js = c.data[neigh_cell]
brute_force!(ps, is, js, p, r)
end
end
end
return ps
end
```

### Usage

We can use the Cell Lists by first constructing the `CellList`

and then query the symmetric all close pairs with `near_neighbors`

.

```
n, d = 10, 2
r = 0.1
p = rand(n, d)
c = CellList(p, r)
indices = near_neighbors(c, p, r)
```

We receive the same output as for brute force.

```
indices2 = brute_force(p, r)
@assert Set(Set.(indices)) == Set(Set.(indices2))
```

In the next section, we analyze the Cell Lists algorithm’s performance compared to Brute Force and theoretical limits and discuss the assumptions when Cell Lists is practical.

## Analysis

### Number of Iterations

We denote the **number of non-empty cells** using

Each non-empty cell has $n_1,n_2,…,n_{\hat{n}}$ points such that that the total number of points is $n=n_1+n_2+…+n_{\hat{n}}.$ Then, the number of iterations to find cell list pairs is

where $N(i)$ is half of the non-empty neighboring cells of cell $i$ where $N(\hat{n})=ā .$ We can iterate over the neighboring cell pairs (highlighted in red) in two ways.

**Method 1**: For each non-empty cell $iā\{1,…,\hat{n}-1\},$ iterate over the remaining non-empty cells $j=\{i,…,\hat{n}\}$ and check if they are a neighbor, which takes $B(\hat{n})$ iterations.

**Method 2**: For each non-empty cell $iā\{1,…,\hat{n}-1\},$ iterate over neighboring cells and check if they are non-empty (as in `near_neighbors`

code), which takes $\hat{n}ā
m_d$ iterations where $m_d=(3^d-1)/2$ is the **number of neighboring cells** in dimension $d.$

We derive the **dimensionality condition** such that method 2 performs fewer iterations compared to method 1:

When true, finding neighboring cells grows linearly, necessary for the Cell Lists to be efficient compared to Brute Force. Note that higher-dimensional spaces require exponentially higher numbers of non-empty cells $\hat{n}$ to satisfy the condition.

### Worst-Case Instances

If one cell contains all the points, we have $\hat{n}=1,$ and $n_{1}=n,$ the number of iterations equal to Brute Force

$$M(1)=\textcolor{darkblue}{B(n_1)}+\textcolor{darkred}{0}=B(n). \tag{8}$$

If all points are in unique cells and we do not satisfy the dimensionality condition, we have $\hat{n}=n$ and $n_1,n_2,…n_{\hat{n}}=1$ and $\hat{n}ā¤3^d,$ the number of iterations equals to Brute Force

$$M(n)=\textcolor{darkblue}{\hat{n}}+\textcolor{darkred}{B(\hat{n})}=B(n)+n. \tag{9}$$

### Curse of Dimensionality

Cell Lists also suffers from the curse of dimensionality. As we increase the dimension, the number of possible cells grows exponentially. That is,

$$ā_{i=1}^d \left(\frac{l_i}{r}\right) ā„ \left(\frac{l}{r}\right)^d, \tag{10}$$

where $l_1,l_2,…,l_{d}$ are the lengths of the minimum bounding box and $\min\{l_1,l_2,…,l_{d}\}=l>r$ is the minimum length. Therefore, if we draw two random cells from a non-exponential probability distribution where possible cells are its states, the probability that they are the same cell decreases exponentially as the dimension grows. Consequently, the number of non-empty cells approaches the number of points $\hat{n}ān,$ and the dimensionality condition requires larger $\hat{n}$ to satisfy. Thus the performance approaches the worst-case.

### Practical Instances

In many practical instances, we can assume a limit to how many points can occupy a cell. For example, two and three-dimensional physical simulations particles have finite size and non-overlapping and satisfy the dimensionality condition.

For a problem instance that satisfies the dimensionality condition $\hat{n}>3^d,$ and the number of points per cell is limited by $1ā¤n_1,n_2,…,n_{\hat{n}}ā¤\bar{n}$ where $\bar{n}$ is independent of $n.$ By assuming that every cell is filled $n_1,n_2,…,n_{\hat{n}}=\bar{n}$, we have $n=\hat{n}ā \bar{n}$, and we obtain an upper bound for the number of iterations

With fixed dimension and maximum density, the number of iterations is linearly proportional to the number of points $O(n)$. The constant term can be large; therefore, we should always measure the Cell Lists’ performance against the Brute Force on real machines with realistic problem instances.

## Multithreaded Cell Lists Algorithm

### Multithreading

We can run the Cell Lists algorithm faster by using multithreading, which is a form of parallelism. Generally, we can think of parallelizing algorithms through divide and conquer. However, we do not need nested parallelism for Cell Lists. Therefore, we only consider a single step of divide and conquer, as follows:

*Divide*: Partition input into multiple tasks. Different tasks should not write data to the same memory locations to avoid race conditions.*Conquer*: Schedule tasks to run each task on available threads.*Combine*: After all tasks have finished, combine the results by reducing them into one with a binary operation.

We use the same performance measure for multithreaded algorithms as *Introduction to Algorithms* ^{5}. The book defines the running **time** $T_P$ of multithreaded computation using $P$ threads, **work** $T_1$ as the time to execute all tasks using a single thread, and **span** $T_ā$ as the time of the longest-running strand of computation. **Speedup** $T_1/T_P$ of a multithreaded algorithm tells how many times faster the computation is on $P$ threads than a single thread. **Parallelism** $T_1/T_ā$ gives an upper bound to the possible speedup $$T_1/T_Pā¤T_1/T_ā < P.$$

The construction of `CellList`

is easy to divide into near equal-sized tasks, making it easy to parallelize efficiently. However, parallelizing `near_neighbors`

efficiently is more complicated because we do not know the task sizes before iteration. We will use Julia’s
Multithreading library.

### Multithreaded Constructor

```
using Base.Threads
function Base.merge(c1::CellList{d}, c2::CellList{d}) where d
CellList{d}(merge(vcat, c1.data, c2.data))
end
function CellList(p::Array{T, 2}, r::T, ::Val{:parallel}) where T <: AbstractFloat
t = nthreads()
n, d = size(p)
# Divide
cs = cumsum(fill(fld(n, t), t-1))
parts = zip([0; cs], [cs; n])
# Conquer
results = Array{CellList{d}, 1}(undef, t)
@threads for (i, (a, b)) in collect(enumerate(parts))
results[i] = CellList(p[(a+1):b, :], r, a)
end
# Combine
return reduce(merge, results)
end
```

The multithreaded constructor divides the points `p`

into equal-sized parts, and for each part, it spawns a thread and constructs a `CellList`

. Finally, it merges the resulting `CellList`

s into one.

### Multithreaded Near Neighbors

```
using Base.Threads
import Base.Threads.@spawn
function greedy_partition(m::Vector{Int}, n::Int)
sizes = zeros(Int, n)
subsets = [Vector{Int}() for _ in 1:n]
for i in reverse(sortperm(m))
_, j = findmin(sizes)
sizes[j] += m[i]
push!(subsets[j], i)
end
return subsets
end
function near_neighbors_part(c::CellList{d}, data, p, r, offsets) where d
ps = Vector{Tuple{Int, Int}}()
for (cell, is) in data
# Pairs of points within the cell
brute_force!(ps, is, p, r)
# Pairs of points with non-empty neighboring cells
for offset in offsets
neigh_cell = cell + offset
if haskey(c.data, neigh_cell)
@inbounds js = c.data[neigh_cell]
brute_force!(ps, is, js, p, r)
end
end
end
return ps
end
function p_near_neighbors(c::CellList{d}, p::Array{T, 2}, r::T; t::Int=nthreads()) where d where T <: AbstractFloat
@assert t ā„ 1
offsets = neighbors(d)
data = collect(c.data)
subsets = greedy_partition(@.(length(values(data))^2), t)
tasks = Array{Task}(undef, t)
@sync for (i, subset) in collect(enumerate(subsets))
@async tasks[i] = @spawn near_neighbors_part(c, data[subset], p, r, offsets)
end
pts = fetch.(tasks)
return reduce(vcat, pts)
end
```

The number of iterations done by the brute `brute_force!`

is quadratically proportional to the input size. Therefore, we divide the iterations into `t`

equal-sized parts by the quadratic number of points in each cell, aiming to balance the workload per task. By increasing `t`

, we can decrease span and improve parallelism, but with a cost of increased overhead. We should choose `t`

as a multiple of the number of threads available, for example, `t=3*nthreads()`

.

### Usage

We can use the multithreaded algorithms for Cell Lists as follows:

```
using Base.Threads
n, d = 10000, 2
r = 0.05
p = rand(n, d)
c = CellList(p, r, Val(:parallel))
indices = p_near_neighbors(c, p, r; t=3*nthreads())
```

If `p_near_neighbors`

is not performing well, you can use the series version `near_neighbors`

instead. We can also iterate over the near neighbors using multithreading.

```
@threads for (i, j) in indices
# perform computations on near neighbors p[i, :] and p[j, :]
end
```

When running your code, remember to set the number of threads before executing Julia.

```
export JULIA_NUM_THREADS=2 && julia
```

## Benchmarks

### Method

We benchmarked each algorithm against multiple different instances of points with fixed size $n$, dimension $d$, and radius $r.$ We draw the coordinate values of each instance of points from a uniform distribution between zero and one. We measured the performance of each instance using
BenchmarkTools.jl and selected the median time. For the resulting set of measurements, we computed the mean and quantiles with probability levels 0.01, and 0.99 to account for the variation between instances. We included the benchmarking code in the
CellLists.jl repository under `benchmarks`

directory.

### Cell Lists vs. Brute Force

The figures above show how Cell Lists compares to Brute Force as we increase dimension with a constant radius. As can be seen, Cell Lists performs better in low dimensions and beat Brute Force with a small number of points, but requires more points with a larger dimension.

### Effects of Dimensionality

The figure above shows the relation between Cell Lists with increasing dimension and constant radius. As we can see, the increase in time scales roughly to $3^d.$

### Single vs. Multithreaded Constructor

As we can see, multithreading significantly speeds up the Cell Lists construction. Dimensionality does not have significant effect on the construction time.

### Single vs. Multithreaded Near Neighbors

## Conclusions

I first encountered the fixed-radius near neighbors problem while developing the Crowd Dynamics simulation, which is mechanistically similar to molecular simulation with finite distance interactions (local interactions). The algorithms presented in this article can be used for research purposes or building small scale simulations. Large scale simulations may require additional improvements such as parallelism or modifying the Cell Lists into Verlet Lists.

## Contribute

If you enjoyed or found benefit from this article, it would help me share it with other people who might be interested. If you have any feedback, improvement suggestions, or constructive criticism, you can mention them in the comment section. For example, if you find that the article is missing something essential or has mistakes, you can suggest improvements or source material. If I decide to add the improvements, I will add attribute you and reference to the source.

For more content, check out my YouTube channel and join my newsletter . Since creating content and open-source libraries take time and effort, consider becoming a sponsor .

## References

Bentley, J. L. (1975). A Survey of techniques for fixed radius near neighbor searching, 37. ↩︎

Mount, D. (2005). Geometric Basics and Fixed-Radius Near Neighbors, 1ā8. ↩︎

Yao, Z., Wang, J. S., Liu, G. R., & Cheng, M. (2004). Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method. Computer Physics Communications, 161(1ā2), 27ā35. https://doi.org/10.1016/j.cpc.2004.04.004 ↩︎

Hoetzlein, R. (2014). Fast Fixed-Radius Nearest Neighbors: Interactive Million-Particle Fluids. GPU Technology Conference. ↩︎

Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. ↩︎