This is a follow up to CS Lessons #002: Data structures. In the previous article, I wrote quite a lot about time complexity and illustrated this with some simple examples. Today I want to show a little bit more complex problem. It was actually a part of national programming competition here in Poland many years ago. It is simple, yet interesting problem, because it has 3 solutions, each with different time complexity. You can find original problem description here.

## Problem definition

You are given two integer numbers `n`

and `r`

, such that `1 <= r < n <= 250`

and a two-dimensional array `W`

of size `n x n`

. Each element of this array is either `0`

or `1`

. Your goal is to compute density map D for array W, using radius of `r`

. Density map is also two-dimensional array, where each value represent number of `1s`

in matrix W within specified radius.

Given the following input array `W`

of size 5 and radius 1 (`n = 5`

, `r = 1`

)

```
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
```

the density map looks like this:

```
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
```

For the same input with radius 4 (`n = 5`

, `r = 4`

), the density map is:

```
9 9 9 9 9
9 9 9 9 9
9 9 9 9 9
9 9 9 9 9
9 9 9 9 9
```

If this is not clear, take a look at picture below.

This is Minesweeper game. Whenever you left-click on a field you will see how many mines are in the adjacent fields. This is almost the same as our density map. The only exception is that radius is always 1. And you will blow up when you click on a mine.

## Testing

Before you read on, I encourage you to try to solve this problem yourself and come back later to check you solution. You can download tests for this problem here. Each test contains two files `map*.in`

and `map*.out`

. Input file looks like this:

```
5 1
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
```

First line contains two integer, first one is size of an array `n`

and second one is radius `r`

. The rest is input array `W`

. Output file contains just density map `D`

:

```
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
```

So if you want to make your program to work with those test, you need to read input data from file with this specified format and save output data to output file. You can use script below to run you solution against all tests.

```
require "benchmark"
(0..13).each do |index|
# This test is not included in zip package
next if index == 10
time = Benchmark.realtime do
`ruby solution.rb map_tests/map#{index}.in map.out`
end
expected = File.read("map_tests/map#{index}.out")
actual = File.read("map.out")
result = actual == expected ? "Passed" : "Failed"
formatted_time = "%0.3f" % time
puts "Test %02d: %s, time: %0.4fs" % [index, result, time]
end
```

Save this code as `harness.rb`

. Put your solution in the same directory and rename it to `solution.rb`

. Unzip downloaded test archive in the same directory as well. Run `ruby harness.rb`

.

## Preparations

First we need couple of helper methods, for reading input from file, writing output etc. Those methods will be used by all different solutions.

```
class DensityMapSolver
attr_reader :size, :radius, :input
def initialize(input_filename, output_filename)
@input_filename = input_filename
@output_filename = output_filename
@input = []
end
# Read input data from input file.
def read_input_data
File.open(@input_filename) do |file|
file.each.with_index do |line, index|
# If this is a first line, read array size n and radius r
if index == 0
@size, @radius = line.split(" ").map(&:to_i)
else
@input.push(line.split.map(&:to_i))
end
end
end
end
# Write computed density map to output file
def write_output_data
File.open(@output_filename, "w") do |file|
@output.each do |row|
file.puts(row.join(" "))
end
end
end
# Get the value at position (x, y) or 0 if (x, y) is outside array range
def get(array, x, y)
if x >= 0 && x < size && y >= 0 && y < size
array[y][x]
else
0
end
end
# Set the value at position (x, y) or no-op if (x, y) is outside array range
def set(array, x, y, value)
if x >= 0 && x < size && y >= 0 && y < size
array[y][x] = value
end
end
# Initialize two-dimensional array
def init_empty_array
(0...size).map { [] }
end
# Computes sum of all elements that are in specified range,
# from (x1, y1) to (x2, y2). At the beginning it limits the range
# to only elements that are inside array range.
def compute_sum(array, x1, x2, y1, y2)
sum = 0
x1 = [0, x1].max
x2 = [x2, size - 1].min
y1 = [0, y1].max
y2 = [y2, size - 1].min
(x1..x2).each do |x|
(y1..y2).each do |y|
sum += get(array, x, y)
end
end
sum
end
end
```

Let's have a look at `compute_sum`

method. If we invoke this method on our example input matrix with arguments `compute_sum(input, 1, 3, 1, 3)`

it will compute the sum of all elements marked with the star:

```
1 0 0 0 1
1 * * * 0
1 * * * 0
0 * * * 1
0 1 0 0 0
```

Invoking with `compute_sum(input, -1, 1, -1, 1)`

:

```
* * 0 0 1
* * 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
```

As you can see it ignores all elements that lie outside of array range. This will make all solutions much simpler as we don't have to worry about this issue. The running time of this method is proportional to the size of the passed range. In the worst case you can compute the sum of all elements in the array by invoking `compute_sum(input, 0, size, 0, size)`

, where `size`

is the size of our input array. So the time complexity of this method is `O(n^2)`

.

## Naive solution

The naive solution is very simple. All we need to do is to calculate for each element of output array how many `1`

are in the input array within specified radius. For example, if we are calculating value for position (3,3) and the radius is 2, we need to sum all elements from position (1, 1) to (5, 5). This means invoking `compute_sum(array, 1, 5, 1, 5)`

.

```
def naive_solution
# call to empty method, it generates two-dimensional array filled with zeroes
out = empty
size.times do |x|
size.times do |y|
# calculate sum of elements in input array with specified radius
set(out, x, y, compute_sum(input, x - radius, x + radius, y - radius, y + radius))
end
end
print(out)
end
```

So how fast is this solution? Well, not much. We have two main loops that iterate over output array, each one iterates over 0 to n - 1. So the time complexity of those two loops is O(n^2). Now inside this inner loop we are calculating the sum of elements using `compute_sum`

method. We already know that this method has time complexity of `O(n^2)`

, so the total time complexity is `O(n^2 * n^2)`

, which is `O(n^4)`

.

## Better solution

The problem with naive solution is that it sums the same values multiple times. Let's assume that we are calculating sum for middle element at position `(2, 2)`

by invoking `compute_sum(input, 1, 3, 1, 3)`

```
1 0 0 0 1
1 * * * 0
1 * * * 0
0 * * * 1
0 1 0 0 0
```

And then we want to compute sum for element at position `(3, 2)`

by invoking `compute_sum(input, 2, 4, 1, 3)`

```
1 0 0 0 1
1 1 * * *
1 0 * * *
0 0 * * *
0 1 0 0 0
```

By looking at the starred values, you can see that we need to sum 6 values again. This is a huge waste of time, and it will be more obvious when you have bigger value of radius `r`

. As you can see the sum for new field does not include values inside left column from `(1, 1)`

to `(1, 3)`

and includes values inside right column from `(4, 1)`

to `(4, 3)`

. So the sum for new field is just value for the previous field on the left, minus left column and plus right column. The same solution can be applied when moving down. It will the sum for previous field, minus top column and plus bottom column. Here is the implementation.

```
def run_better_solution
@output = init_empty_array
(0...size).each do |y|
if y == 0
set(@output, 0, 0, compute_sum(input, 0 - radius, 0 + radius, 0 - radius, 0 + radius))
else
top_row = compute_sum(input, 0, radius, y - radius - 1, y - radius - 1)
bottom_row = compute_sum(input, 0, radius, y + radius, y + radius)
set(@output, 0, y, get(@output, 0, y - 1) - top_row + bottom_row)
end
(1...size).each do |x|
left_column = compute_sum(input, x - radius - 1, x - radius - 1, y - radius, y + radius)
right_column = compute_sum(input, x + radius, x + radius, y - radius, y + radius)
set(@output, x, y, get(@output, x - 1, y) - left_column + right_column)
end
end
end
```

The most time consuming operation in this solution is the nested loop, which is executed `O(n^2)`

times. Inside this loop we calculate the sum of left and right column. Since both can have maximum size of `n`

, then time complexity for this loop is `O(n)`

. Combining this together we have time complexity `O(n^3)`

. Much better than naive solution.

## Optimal solution

We can still do better than `O(n^3)`

. We can use the same idea with calculating sums from previous solution, but this time when computing sum for particular field we will sum only values that only lie in the same row within specified radius `r`

. We need to store all those sums in temporary array.

Here is an illustration of how bounding box is moving in the input array. Star indicate fields that are summed up.

```
# Calculating sum for (0,0), radius 1, sum is 1
* * 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# Calculating sum for (1,0), radius 1, sum is 1
* * * 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# Calculating sum for (2,0), radius 1, sum is 0
1 * * * 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# Calculating sum for (3,0), radius 1, sum is 1
1 0 * * *
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# Calculating sum for (4,0), radius 1, sum is 1
1 0 0 * *
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# And then the same for next row ...
```

And here is the result:

```
# sums only for rows
1 1 0 1 1
2 3 2 1 0
1 1 0 0 0
0 0 1 2 2
1 1 1 0 0
```

Now we need to repeat this operation using temporary matrix as input and to only sum values in columns. Below is an illustration of this process:

```
# Calculating sum for (0,0), radius 1, sum is 3
* 1 0 1 1
* 3 2 1 0
1 1 0 0 0
0 0 1 2 2
1 1 1 0 0
# Calculating sum for (0,1), radius 1, sum is 4
* 1 0 1 1
* 3 2 1 0
* 1 0 0 0
0 0 1 2 2
1 1 1 0 0
# Calculating sum for (0,2), radius 1, sum is 3
1 1 0 1 1
* 3 2 1 0
* 1 0 0 0
* 0 1 2 2
1 1 1 0 0
# Calculating sum for (0,3), radius 1, sum is 2
1 1 0 1 1
2 3 2 1 0
* 1 0 0 0
* 0 1 2 2
* 1 1 0 0
# Calculating sum for (0,4), radius 1, sum is 1
1 1 0 1 1
2 3 2 1 0
1 1 0 0 0
* 0 1 2 2
* 1 1 0 0
# And the the same for next column ...
```

Here is the result:

```
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
```

And we have correct answer. Here is the implementation of this solution.

```
def run_optimal_solution
@temp = init_empty_array
size.times do |y|
sum = compute_sum(input, 0, radius, y, y)
set(@temp, 0, y, sum)
1.upto(size - 1) do |i|
sum = sum - get(input, i - radius - 1, y) + get(input, i + radius, y)
set(@temp, i, y, sum)
end
end
@output = init_empty_array
size.times do |x|
sum = compute_sum(@temp, x, x, 0, radius)
set(@output, x, 0, sum)
1.upto(size - 1) do |i|
sum = sum - get(@temp, x, i - radius - 1) + get(@temp, x, i + radius)
set(@output, x, i, sum)
end
end
end
```

So first we are iterating over input array by using nested loop. This loop is executed `O(n^2)`

times, but code inside is constant. Because we are only interested in sum of values in this particular row, to calculate sum for next field we can just take sum from previous field, subtract one value on the left and add one value on the right. Next we repeat the same process for calculating sums for columns. Once again this nested loop is executed `O(n^2)`

times and code inside a loop is `O(1)`

. Total time complexity is `(2 * n^2)`

, which is just `O(n^2)`

.

## Bonus: Official solution

I want to show one more way how to solve this problem. It is the official solution, described by the author of this problem. It is quite simple and very elegant. It is better to explain it using one-dimensional array.

Assume you have an input array `[0, 1, 1, 0, 1, 1, 0, 1, 0, 1]`

and you want to quickly tell what is the sum of elements within specified range, ie. from 0 to 9 or from 4 to 7. What we can do is to create another array `S`

with cumulative sums, so each elements is the sum of all elements from 0 to this particular one. This array will look like this `[0, 1, 2, 2, 3, 4, 4, 5, 5, 6]`

. Now we can represent sum of any range using this simple formula, `sum(x, y) = sum(0, y) - sum(0, x - 1)`

. For example `sum(4, 7) = sum(0, 7) - sum(0, 3)`

. To get the value of `sum(0, x)`

we can just look it up inside array `S`

, `sum(4, 7) = S[7] - S[3]`

, which is `5 - 2 = 3`

. Using array `S`

we can calculate sum of elements from any range by doing at most two lookups and then subtracting them. This is a constant time.

Now, we need to generalize this solution to two-dimensional array. We need to create array `S`

where `S[x][y]`

will the sum of all elements from position `(0, 0)`

to `(x, y)`

. It will look like this for original input array:

```
# original input matrix
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# array S
1 1 1 1 2
2 3 4 4 5
3 4 5 5 6
3 4 5 6 8
3 5 6 7 9
```

Now let's look at some examples of how we can use array S:

```
# We want to calculate sum of elements from (2, 0) to (4, 2), all starred elements
1 0 * * *
1 1 * * *
1 0 * * *
0 0 0 1 1
0 1 0 0 0
# First we can lookup sum of elements from (0, 0) to (4, 2) inside S[4][2]
* * * * *
* * * * *
* * * * *
0 0 0 1 1
0 1 0 0 0
# And then we need to subtract elements from (0, 0) to (1, 2), all elements marked with -
# We can lookup sum of those elements inside S[1][2]
- - 0 0 1
- - 1 0 0
- - 0 0 0
0 0 0 1 1
0 1 0 0 0
# Result
S[4][2] = 6
S[1][2] = 4
# Sum of elements from (2, 0) to (4, 2) is 6 - 4 = 2
```

We have correct answer and we calculated it by using only two lookups and one subtraction, so in constant time. Things can get a little more complicated when we want to calculate this for range in the middle of an array.

```
# We want to calculate sum of elements from (2, 2) to (4, 4), all starred elements
1 0 0 0 1
1 1 1 0 0
1 0 * * *
0 0 * * *
0 1 * * *
# First we can lookup sum of elements from (0, 0) to (4, 4), which is basically sum of all elements in the array
# We can lookup this value in S[4][4]
* * * * *
* * * * *
* * * * *
* * * * *
* * * * *
# And now we need to subtract all elements marked with -
- - - - -
- - - - -
- - 0 0 0
- - 0 1 1
- - 0 0 0
```

Now we have a problem, because we cannot simply lookup the sum for the whole region marked with `-`

. We need to split it in two pieces.

```
# Horizontal piece, we can lookup this value inside S[4][1]
- - - - -
- - - - -
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
# And vertical piece, we can lookup this value inside S[1][4]
- - 0 0 1
- - 1 0 0
- - 0 0 0
- - 0 1 1
- - 0 0 0
```

Now we can subtract those two pieces from the total sum, but there is another problem. Note that those two pieces have 4 elements in common, so when we subtract both piecec we actually subtracted this small common region two times. We need to fix this by adding it back again. Fortunately we can also lookup value for this region inside array `S`

.

```
# Here is this common region marked with plus sign, we can lookup it up inside S[1][1]
+ + 0 0 1
+ + 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
```

So the final formula is `sum(x1, x2, y1, y2) = S[x2][y2] - S[x2][y1 - 1] - S[x1 - 1][y2] + S[x1 - 1][y1 - 1]`

. Here is the implementation.

```
def run_official_solution
sums = init_empty_array
size.times do |y|
row_sum = 0
size.times do |x|
row_sum += get(@input, x, y)
top = y == 0 ? 0 : get(sums, x, y - 1)
set(sums, x, y, row_sum + top)
end
end
@output = init_empty_array
size.times do |y|
size.times do |x|
x1 = [0, x - radius].max
x2 = [x + radius, size - 1].min
y1 = [0, y - radius].max
y2 = [y + radius, size - 1].min
value = get(sums, x2, y2) - get(sums, x2, y1 - 1) - get(sums, x1 - 1, y2) + get(sums, x1 - 1, y1 - 1)
set(@output, x, y, value)
end
end
end
```

The time complexity of this solution is also `O(n^2)`

. First we need to calculate array S which takes `O(n^2)`

time, because we need to iterate over all input array. To calculate value for new field we can just use the value from field on the left and from the field above. Then we can compute our output density map. Again we need to iterate over whole array which takes `O(n^2)`

iterations, but in each iteration we just do 4 lookups in array S, two subtraction, one addition and then setting the value. So this is constant time.

## Summary

At the end let's run all those solutions and compare how they perform. Here is simple snippet that you need to add to actually invoke those solutions:

```
# We are passing input and output filenames as parameters to ruby script
solver = DensityMapSolver.new(ARGV[0], ARGV[1])
solver.read_input_data
# Just uncomment one of those
# solver.run_naive_solution
# solver.run_optimal_solution
# solver.run_better_solution
# solver.run_official_solution
solver.write_output_data
```

Here are results on my computer:

```
# Naive solution, O(n^4)
Test 00: Passed, time: 0.0478s
Test 01: Passed, time: 0.0520s
Test 02: Passed, time: 0.0501s
Test 03: Passed, time: 0.0723s
Test 04: Passed, time: 0.0732s
Test 05: Passed, time: 7.6287s
Test 06: Passed, time: 19.8851s
# I stopped at this point, it would probably took minutes or hours from now on
# Better solution, O(n^3)
Test 00: Passed, time: 0.0474s
Test 01: Passed, time: 0.0607s
Test 02: Passed, time: 0.0545s
Test 03: Passed, time: 0.0584s
Test 04: Passed, time: 0.0701s
Test 05: Passed, time: 0.1855s
Test 06: Passed, time: 0.5073s
Test 07: Passed, time: 1.0138s
Test 08: Passed, time: 0.5545s
Test 09: Passed, time: 1.5681s
Test 11: Passed, time: 1.7348s
Test 12: Passed, time: 2.2292s
Test 13: Passed, time: 2.1215s
# Optimal solution, O(n^2)
Test 00: Passed, time: 0.0472s
Test 01: Passed, time: 0.0538s
Test 02: Passed, time: 0.0589s
Test 03: Passed, time: 0.0486s
Test 04: Passed, time: 0.0536s
Test 05: Passed, time: 0.0631s
Test 06: Passed, time: 0.0919s
Test 07: Passed, time: 0.1022s
Test 08: Passed, time: 0.1233s
Test 09: Passed, time: 0.1303s
Test 11: Passed, time: 0.1507s
Test 12: Passed, time: 0.1591s
Test 13: Passed, time: 0.1414s
# Official solution, O(n^2)
Test 00: Passed, time: 0.0469s
Test 01: Passed, time: 0.0539s
Test 02: Passed, time: 0.0488s
Test 03: Passed, time: 0.0516s
Test 04: Passed, time: 0.0590s
Test 05: Passed, time: 0.0827s
Test 06: Passed, time: 0.0993s
Test 07: Passed, time: 0.1399s
Test 08: Passed, time: 0.1962s
Test 09: Passed, time: 0.1995s
Test 11: Passed, time: 0.1936s
Test 12: Passed, time: 0.1899s
Test 13: Passed, time: 0.1963s
```

## Summary

It ended as a another long post, but I hope you enjoyed it. As you can see faster solution is not always that much complicated comparing to naive solution. And the difference in the running time is huge.

PS. You can find the source code with tests file in this repo.