Shuffle-athon

Posted on November 10, 2017

A modest problem was posed the other day.

A pair of my fellow undergrads were presenting their paper on estimating disease survivability in moving populations using Monte Carlo. It was a short course, so the research and presentations were kept relatively simple. They setup their populations as a one-dimensional arrays and simulated people moving around by iterating through the array and based on a probability, removed people and inserted them in different positions. The order of the people who didn’t move didn’t change, so they had to shift every person after the insertion spot one index over.

This is reminiscent of the insertion step in insertion sort with the elements being incomparable, which has a time complexity of O(n)O(n). The lecturer tasked with grading our presentations quickly saw that their shuffling method would be a bottleneck when doing simulations and questioned the efficiency of their approach.
Illustration of their shuffling strategy

This situation requires fast insertions and deletions, where the relative order of elements not involved with the insertion or deletion don’t change. This is exactly what a linked list can give us. A linked list’s drawback, the inefficient indexing, matters little here. The only time indexing would be useful is when picking the people who should move and where they should move to, but this can be done in a constant number of passes. A possibly useful algorithm which we can use is reservoir sampling. Reservoir sampling allows you to pick random elements from an array in a single pass where each element has an equal chance of being picked.

I’ve posted a code snippet below implementing a minimal doubly linked list using reservoir sampling for population movement in O(n)O(n) time complexity.
from typing import Any
import random
import math


class LLNode(object):
    def __init__(self, value: Any, next_node: "LLNode" = None,
        prev_node: "LLNode" = None):
        self.value = value
        self.next_node = next_node
        self.prev_node = prev_node

    def __repr__(self):
        return str(self.value)


class LinkedList(object):
    def __init__(self):
        self.head = LLNode(None)
        self.tail = LLNode(None, prev_node=self.head)
        self.head.next_node = self.tail
        self.count = 0

    def shuffle(self):
        number_of_elements = min(10, max(0, math.floor(
                random.normalvariate(0.35, 0.1705) * self.count)))
        temp_list = self.reservoir_sample(number_of_elements)

        for node in temp_list:
            self.__delete(node)

        random.shuffle(temp_list)
        new_locations = self.reservoir_sample(number_of_elements, True)
        for location in new_locations:
            self.__insert(temp_list.pop(), location.prev_node, location)

    # Reservoir sampling with the possibility of the
    # same element being chosen multiple times.
    # This is done by iterating over the same element
    # multiple times while still increasing the index
    def reservoir_sample(self, sample_size: int,
        allow_same_location: bool = False) -> [LLNode]:
        new_array = []
        current_node = self.head
        number_of_iterations = 1
        i = 0
        if allow_same_location:
            number_of_iterations = sample_size
        while current_node.next_node.value is not None:
            if len(new_array) < sample_size:
                new_array.extend([current_node.next_node] * number_of_iterations)
                i += number_of_iterations
            else:
                for j in range(number_of_iterations):
                    if random.randint(1, min(2, i)) > 1:
                        new_array[random.randint(0, len(new_array) - 1)] = current_node.next_node
                        i += 1
            current_node = current_node.next_node
        return new_array

    def append(self, new_val: Any):
        new_node = LLNode(new_val)
        self.__insert(new_node, self.tail.prev_node, self.tail)
        self.count += 1

    def traverse(self) -> [LLNode]:
        result = []
        current = self.head.next_node
        while current.value is not None:
            result.append(current.value)
            current = current.next_node
        return result

    @staticmethod
    def __insert(new_node, prev_node: LLNode, next_node: LLNode):
        new_node.next_node, next_node.prev_node = next_node, new_node
        new_node.prev_node, prev_node.next_node = prev_node, new_node

    @staticmethod
    def __delete(node: LLNode):
        prev_node, next_node = node.prev_node, node.next_node
        prev_node.next_node, next_node.prev_node = next_node, prev_node