Data Structure & Algorithm Notes

Big oh

  • describes computational complexity of an algorithm
  • always relative to input size
  • split in 2 parts
    1. space - amount of memory allocated
    2. time - amount of time; how the num of operation changes respect to input changes

what is time complexity of following?

for (int i = 0; i < arr.length; i++) {
    for (int j = i; j < arr.length; j++) {
        print(arr[i] + arr[j])
    }
}
Answer O(n^2)
  1. the inner loops run for n times
  2. then for n-1, n-2
  3. sum of n, n-1, n-2 is a series. n*(n+1)/2.
  • O(1) - no matter input size, algorithm always uses constant time
  • O(logn) - somewhere in algorithm, input is being reduced by percentage

Recursion

  • every recursive algorithm can be converted to iterative and vice-versa; but the solution might be elegant in one form
  • uses callstack to mimic loops;
  • needs a base condition (condition at start of recursive function that terminates the calls)
  • use it to breakdown a problem into subproblem; whose solutions can then be combined to solve original problem

For vs While

  • I used to think its mostly same; but its important to understand the nuance b/w the 2 to solve DSA problems
  • for - rigid; while - free flowing

visualization

for

  • works like stair; you start from bottom; you step up; you have fixed length
  • nature - pulsating & rythmic like wall clock

while

  • works like rolling a dice; wait until condition is met or continue throwing until a condition is met

Array & Strings

  • array -> static/dynamic; default assumption: dynamic array
  • string -> mutable/immutable; default assumption: immutable
  • string is basically an array

Two Pointers Technique 🔄

intro

  • very common, used to solve array/string problems
  • pointers - 2
  • iterables - 1 or 2

variation 1: 1 iterable

function fn(arr):
    left = 0
    right = arr.length - 1

    while left < right:
        // Do some logic here depending on the problem
        // Do some more logic here to decide on one of the following:
            1. left++
            2. right--
            3. Both left++ and right--

strength:

  • we will never have more than O(n) iterations

uses:

  • effective for comparing pair of elements
  • finding subarray/substring, reversing array

variation 2: 2 iterable

function fn(arr1, arr2):
    i = j = 0
    while i < arr1.length AND j < arr2.length:
        // Do some logic here depending on the problem
        // Do some more logic here to decide on one of the following:
            1. i++
            2. j++
            3. Both i++ and j++

    // Step 4: make sure both iterables are exhausted
    // Note that only one of these loops would run
    while i < arr1.length:
        // Do some logic here depending on the problem
        i++

    while j < arr2.length:
        // Do some logic here depending on the problem
        j++

strength

  • will never have more than O(m+n) iterations

other variations

  • in variation 1, we started index at first & last. some problems might require starting at other index
  • in variation 2, we initialized pointers at 0 but ran on 2 different inputs; sometimes there will be only 1 input but we will still initialize both at 0th index & move both of them forward (2 pointer is just a technique)

Problems to solve

  1. https://leetcode.com/problems/reverse-string/description/
  2. https://leetcode.com/problems/squares-of-a-sorted-array/description/

variables

  • left, right (single array)
  • i, j (2 arrays)

Sliding Window

1.2 Sliding Window 🖼️

intro

  • implemented using 2 pointers
  • extremely common and versatile
  • adjacent elements; the subarray can be defined by just 2 indices, left and right
  • another name for subarray in this context is window

identifying sliding window problem

  1. problem will define a "valid" subarray
    • constraint metric - some atrribute of subarray; (example - sum of subarray, unique elements in subarray)
    • numeric restriction on constraint metric - sum of subarray should be 7
  2. the problem will ask you to find valid subarrays
    • most common - find best valid subarray, longest subarray etc or finding total valid subarrays

examples

  • Find the longest subarray with a sum less than or equal to k
  • Find the longest substring that has at most one "0"
  • Find the number of subarrays that have a product less than k

the algorithm

  • initially we have left = right = 0, which means the first subarray we look is just the first element
  • then we want to expand the size of our window; this is done by incrementing right
  • but what if after adding a new element the subarray becomes invalid? in that case we need to remove elements from our window until it becomes valid again, to remove we increment left, which shrinks the window

build the algorithm

  1. window is invalid = while(current>k) current-=arr[left]; left++
  2. length of window = right - left + 1

concepts

  • it grows as large as it can, then it shrinks
  • works because input only has positive integers
  • why is it okay to forget about elements that are removed from the window; they were already considered in the past

nuances

  • left can be greater than right, but only by 1 position. in that case (right - left + 1) ie, (0 - 1 + 1) = 0

visualization

  • modified snake game. the snake keeps growing
  • snake moves right, it ingests what it finds, now it checks whether it needs to excrete something or it can keep in tummy
  • 2 loops give us 2 degree of freedom
    • its direction is outer for loop
    • its body expand or contract based on condition (inner while loop)
function fn(arr):
    left = 0
    for (int right = 0; right < arr.length; right++):
        Do some logic to "add" element at arr[right] to window

        while WINDOW_IS_INVALID:
            Do some logic to "remove" element at arr[left] from window
            left++

        Do some logic to update the answer

why is it efficient

  • sub arrays can be of length n, n-1, n-2; therefore comparisons would create O(n^2) solution. but because of sliding window O(n).
  • A sliding window guarantees a maximum of 2n window iterations - the right pointer can move n times and the left pointer can move n times.
  • the outer loop runs O(n), the inner loop runs O(n) -> but the amortized cost of inner loop is O(1);
    • If the while loop were to run n times on one iteration of the for loop, that would mean it wouldn't run at all for all the other iterations of the for loop. This is what we refer to as amortized analysis - even though the worst case for an iteration inside the for loop is O(n), it averages out to O(1) when you consider the entire runtime of the algorithm.
  • this shows nested loop doesn't directly imply O(n^2)

variation 1 : number of subarray

  • calculate num of subarray-> when right is fixed, the number of subarray is actually size of window; so calculation is easier; at each iteration of outerloop, the num of subarray can be size of window

variation 2: Fixed window

  • sometimes problem specify fixed length of window
  • these problems are easy; (we add one element on the right and remove one element on the left to maintain the length)
function fn(arr, k):
    curr = some data to track the window

    // build the first window
    for (int i = 0; i < k; i++)
        Do something with curr or other variables to build first window

    ans = answer variable, probably equal to curr here depending on the problem
    for (int i = k; i < arr.length; i++)
        Add arr[i] to window
        Remove arr[i - k] from window
        Update ans

    return ans

varation 3: many sliding window problem also makes use of hashmap

variable names

  • left, right, k, curr

problems

Prefix Sum 📋

  • sum of all elements upto index i
  • helps us find sum of any subarray in O(1)
  • the sum of subarray from i to j is
    • prefix[j] - prefix[i-1]
    • or prefix[j] - prefix[i] + num[i] (if we don't want to deal with out of bound case when i=0)
  • it only costs O(n) to build. all future subarray queries become O(1); so it usually improves an algorithm's time complexity by factor of O(n)
  • building prefix sum is form of pre-processing
prefix = [nums[0]]
for (int i = 1; i < nums.length; i++)
    prefix.append(nums[i] + prefix[i - 1])

variables

  • prefix

problems

common patterns

subarray vs subsquence vs subset

  • subarray: contiguous elements
  • subsequence: not necessarily contiguous
  • subset: not necessarily contiguous, order doesn't matter

solving problems

  • subarray: sliding window, prefix sum (input is integer array and you need to find sum of subarray)
  • subsequence: dynamic programming
  • subset: backtracking, you can sort the array because order doesn't matter

nature of difficulty

  • subarray: easy
  • subsequence: medium
  • subset: easy

examples

[1,2,3,4]

  • subarray: [1,2], [2,3,4]
  • subsequence: [1,3], [1], [3,2,1]
  • subset: [1,3], [1], [3,2,1]

Hashing (using Map and Set)

Data Structures

  • 2 things to take care of in data structures- interface & implementation
  • extremely common. reduces time complexity by O(n)
  • modulo operation. if num%x; final value will be b/w [0, x-1]
  • hash function -> converts key to index. array has index
  • hash function + array = hashmap (hashtable / dictionary)
  • hashmap -> maps key to value
  • unordered, can add/remove in O(1)
  • multiple ways to handle collision; most common is chaining
  • how to reduce collison? size of array and modulus should be prime number

Set

  • when you want to just check if element exists
  • hashes key to integer
  • diff b/w set and hashamp is set has no value
  • don't track frequency; add same element 100 times, it will still have 1

hashmap in JS

let map = new Map();
map.set('a', 1);
map.get('a'); // 1
map.has('a'); // true
map.size; // 1
map.delete('a');
let set = new Set();
set.add('a');
set.has('a'); // true
set.size; // 1
set.delete('a');

key difference

set -> add
map -> set

nuances

  • string? -> make use of fact that there are only 26 alphabets
  • nums? sum of nums, xor of nums

questions

Counting

  • counting? think about hashmap
  • Count the number of subarrays with an "exact" constraint
  • uses 3 concepts prefixes, hashmap & diff

nuance

  • when thinking of hash function, think what is common b/w all the values

Linked List

  • generally linkedlist is given as input
  • there are few problems that will use LL to optimise the solution
  • add or delete element at any position in O(n); unless you have direct reference to the node
  • much better than dynamic array which requries O(n) for add/delete at any position
  • disadvantage - no random access
  • more overhead than array

traversal

let getSum = head => {
    let ans = 0;
    while (head) {
        ans += head.val;
        head = head.next;
    }
    return ans;
}

insert

let addNode = (prevNode, nodeToAdd) => {
    nodeToAdd.next = prevNode.next;
    prevNode.next = nodeToAdd;
}


let deleteNode = prevNode => {
    prevNode.next = prevNode.next.next;
}

doubly linked list

let addNode = (node, nodeToAdd) => {
    let prevNode = node.prev;
    nodeToAdd.next = node;
    nodeToAdd.prev = prevNode;
    prevNode.next = nodeToAdd;
    node.prev = nodeToAdd;
}

let deleteNode = node => {
    let prevNode = node.prev;
    let nextNode = node.next;
    prevNode.next = nextNode;
    nextNode.prev = prevNode;
}

fast and slow pointers

  • similar to 2 pointer technique
  • break the problem down into steps
function fn(head):
    slow = head
    fast = head

    while fast and fast.next:
        // Do something here
        slow = slow.next
        fast = fast.next.next

variables

  • slow, fast
  • prev, curr, temp (in case of reversal)

Reversing linked list

  • have a pointer to prev, curr, next
  • solution to LL are simple and elegant
  • to solve think what you need and solve problem one step at a time
  • Most linked list problems don't have tricks, you just need a strong understanding of how to move pointers around. The best way to train this skill is to practice.

Stack

  • The characteristic that makes something a "stack" is that you can only add and remove elements from the same end

where is it used

  • whenever you can recognize the LIFO pattern.
  • problems that involves elements in the input interacting with each other
  • matching elements together, querying some property such as "how far is the next largest element"
  • evaluating a mathematical equation given as a string
  • comparing elements against each other

Queue

  • trickier to implement than stack
  • operations: enqueue - push; dequeue - shift
  • queues are less common than stacks but generally problem is more difficult
  • most commonly used in BFS

Monotonic

  • elements are always sorted
  • monotonic stack/queue, maintain their sorted propery by removing element that would violate the sorted property before adding new element
  • useful in problems where you have to find the "next" greater/smaller element
stack = []
for num in nums:
    while stack.length > 0 AND stack.top >= num:
        stack.pop()
    // Between the above and below lines, do some logic depending on the problem
    stack.push(num)

Binary trees

  • most common type of interview question
  • makes heavy use of recursion
  • LL and tree are types of graph
  • real life examples:
    • file systems
    • comment thread
    • company hierarchy
  • depth of tree: how far from root; root has depth 0
  • when traveling LL we usually do it iteratively. with tree we usually do it recursively

DFS

  • preorder, postorder, inorder
  • priortize depth
let dfs = node => {
    if (!node) {
        return;
    }

    dfs(node.left);
    dfs(node.right);
    return;
}
  • each function call solves and returns the answer to the original problem as if the subtree rooted at the current node was the input
  • preorder: root, left, right
    • logic is done first then move to children
  • inorder: left, root, right
    • no logic is done until you reach a node without left child; since calling on left takes priority
  • postorder: left, right, root
    • no logic until you reach leaf node; since calling on children takes priority; root is last
  • The name of each traversal is describing when the current node's logic is performed.
    • pre: before children
    • in: between children
    • post: after children
  • switching b/w 3 types is trivial, but you need to know which one to use when
  • to calculate depth - you use postorder

time & space complexity

  • time: O(n) (you need to visit each node once)
  • space:
    • worst - O(n) - tree can be straight line
    • best - O(log n) - balanced tree

BFS

  • DFS uses stack (recursion is stack), BFS uses iteration and queue

when to use BFS vs DFS

  • if you have to visit every node then it doesn't matter
  • its rare to find problem that using DFS is better than BFS
  • implementing DFS is usually less code and easier because of recursion
    • so generally people end up using DFS
  • on flip side, there are few problems where BFS makes more sense (generally when dealing with level)
  • in some cases, solution might be in Right tree, and if u had used DFS, you will waste time visiting all left nodes
  • in some cases, solution might be in last level, and if u had used BFS, you will waste time visiting all levels
  • space complexity in DFS is height of tree
  • space complexity in BFS is width of tree
  • in some cases, DFS will use less space, in some BFS
  • perfect tree - BFS uses more space O(n), DFS uses less space O(log n)
  • lopsided tree - DFS uses more space O(n), BFS uses less space (O(1))
let printAllNodes = (root) => {
  let queue = [root]
  while (queue.length) {
    let nextQueue = []

    for (let i = 0; i < queue; i++) {
      let node = queue[i]

      // do some logic here on the current node
      console.log(node.val)

      // put the next level onto the queue
      if (node.left) {
        nextQueue.push(node.left)
      }
      if (node.right) {
        nextQueue.push(node.right)
      }
    }

    queue = nextQueue
  }
}
  • shift is way slower than pop (atleast 2x)

Binary Search Tree

  • a type of binary tree
  • all values in left subtree are less than root
  • all values in right subtree are greater than root
  • operations like search, insert, delete are O(log n) in average case
  • to print elements in sorted order, do in-order DFS traversal

Graphs

  • Graphs are part of our everyday lives. Without even trying too hard, you can model literally anything as a graph.

terminology

  • Edges of a node can either be directed or undirected.
  • In binary trees, the edges were directed. Binary trees are directed graphs.
  • In binary trees, there must only be one connected component
  • In binary trees, all nodes except the root had an indegree of 1 (due to their parent). All nodes have an outdegree of 0, 1, or 2. An outdegree of 0 means that it is a leaf. Specific to trees, we used the parent/child terms instead of "neighbors".
  • A graph can be either cyclic or acyclic.

how are graphs given in an algorithm

  • In linked list problems, the head of the linked list is given. In binary tree problems, the root of the tree is given
  • An important thing to understand is that with linked lists and binary trees, you are literally given objects in memory that contain data and pointers. With graphs, the graph doesn't literally exist in memory.
  • In fact, only the "idea" of the graph exists.
  • The problem statement may or may not explicitly state the input is a graph. Sometimes there might be a story, and you need to determine that the input is a graph
  1. Array of edges
  • example: edges = [[0, 1], [1, 2], [2, 0], [2, 3]]
  • In this input format, the input will be a 2D array. Each element of the array will be in the form x, y, which indicates that there is an edge between x and y
  • At every node, we would need to iterate over the entire input to find the neighbors. This is very slow!
  • Before starting the traversal, we can pre-process the input so that we can easily find all neighbors of any given node
  • The easiest way to accomplish this is using a hash map.
let buildGraph = edges => {
    let graph = new Map();
    for (const [x, y] of edges) {
        if (!graph.has(x)) {
            graph.set(x, []);
        }

        graph.get(x).push(y);

        // if (!graph.has(y)) {
        //     graph.set(y, []);
        // }

        // graph.get(y).push(x);
        // uncomment the above lines if the graph is undirected
    }
}
  1. adjacency list
  • example: graph = [[1], [2], [0, 3], []]
  • The input will be a 2D integer array.
  • graphi will be a list of all the outgoing edges from the ith node.
  • most convenient format
  1. adjacency matrix
  • example: graph = [[0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 1], [0, 0, 0, 0]]
  • 2D matrix of size n x n
  • if graph[i]j == 1, that means there is an outgoing edge from node i to node j
  • can build hashmap to see all neighbors of a node else will have to iterate over the entire row
  1. matrix
  • more subtle but very common
  • The input will be a 2D matrix and the problem will describe a story
  • In this case, each square (row, col) of the matrix is a node, and the neighbors are (row - 1, col), (row, col - 1), (row + 1, col), (row, col + 1)

code diff b/w graphs and trees

  • While a binary tree has a root node to start traversal from, a graph does not always have an obvious "start" point.
  • in a graph, we might need to convert the input into a hash map first. When traversing a tree, we refer to node.left and node.right at each node. When traversing a graph, we will need to use a for loop to iterate over the neighbors of the current node, since a node could have any number of neighbors
  • To prevent cycles and unnecessarily visiting a node more than once, we can use a set seen. Before we visit a node, we first check if the node is in seen. If it isn't, we add it to seen before visiting it. This allows us to only visit each node once in O(1) time because adding and checking for existence in a set takes constant time.

Graph BFS

  • easy

Single Source shortest path

  • DFS
  • BFS
  • Dijkstra
  • Bellman-Ford
  • Floyd-Warshall

Heap

  • priority queue is an abstract data structure
  • add an element in O(log n) time
  • remove an element in O(log n) time
  • get the maximum element in O(1) time
  • The ability to find the max/min element in constant time, while only needing logarithmic time to maintain this ability through changes makes a heap an extremely powerful data structure.

implementation

  • There are multiple ways to implement a heap, although the most popular way is called a binary heap using an array.
  • If a node is at index i, then its children are at indices 2i + 1 and 2i + 2. When elements are added or removed, operations are done to maintain the aforementioned property of parent.val <= child.val. The number of operations needed scales logarithmically with the number of elements in the heap, and the process is known as "bubbling up".
  • A heap is a great option whenever you need to find the maximum or minimum of something repeatedly.

2 heaps

  • Using multiple heaps is uncommon and the problems that require it are generally on the harder side. If a problem involves finding a median, it's a good thing to think about

Greedy

  • study

Backtracking

  • The most brute force way to solve a problem is through exhaustive search.
  • Backtracking is an optimization that involves abandoning a "path" once it is determined that the path cannot lead to a solution (Abandoning a path is also sometimes called "pruning".)

why it works

  • In an exhaustive search, we generate all possibilities and then check them for solutions. In backtracking, we prune paths that cannot lead to a solution, generating far fewer possibilities.
  • Backtracking is a great tool whenever a problem wants you to find all of something, or there isn't a clear way to find a solution without checking all logical possibilities. On LeetCode, a strong hint that you should use backtracking is if the input constraints are very small (n <= ~15), as backtracking algorithms usually have exponential time complexities.
function backtrack(curr) {
    if (base case) {
        Increment or add to answer
        return
    }

    for (iterate over input) {
        Modify curr
        backtrack(curr)
        Undo whatever modification was done to curr
    }
}

tree analogy

  • think of it like working in a tree
  • Each call to the function backtrack represents a node in the tree. Each iteration in the for loop represents a child of the current node, and calling backtrack in that loop represents moving to a child
  • The line where you undo the modifications is the "backtracking" step and is equivalent to moving back up the tree from a child to its parent.
  • At any given node, the path from the root to the node represents a candidate that is being built. The leaf nodes are complete solutions and represent when the base case is reached.

common type of problems - Generation

  • generate all of something

Given their exponential solution space, it is tricky to ensure that the generated solutions are complete and non-redundant. It is essential to have a clear and easy-to-reason strategy.

There are generally three strategies to do it:

Recursion

Backtracking

Lexicographic generation based on the mapping between binary bitmasks and the corresponding permutations / combinations / subsets.

As one would see later, the third method could be a good candidate for the interview because it simplifies the problem to the generation of binary numbers, therefore it is easy to implement and verify that no solution is missing.

Besides, as a bonus, it generates lexicographically sorted output for the sorted inputs.

Dynamic Programming

  • Usually, problems where you use DP can only be solved with DP (in a reasonable time complexity)
  • If you don't know DP, then it is almost impossible to solve a DP problem, even if it's an easy one
  • dynamic programming is just optimized recursion
  • The arguments that a recursive function takes represents a state.
  • The difference with DP is that states can be repeated, usually an exponential number of times. To avoid repeating computation, we use something called memoization. When we find the answer (the return value) for a given state, we cache that value (usually in a hash map). Then in the future, if we ever see the same state again, we can just refer to the cached value without needing to re-calculate it.
var fibonacci = function(n) {
    if (n == 0) {
        return 0;
    }

    if (n == 1) {
        return 1;
    }

    return fibonacci(n - 1) + fibonacci(n - 2);
}
  • This algorithm has a time complexity of O(2^n)
/**
 * @param {number} n
 * @return {number}
 */
var fibonacci = function(n) {
    if (n == 0) {
        return 0;
    }

    if (n == 1) {
        return 1;
    }

    if (memo.has(n)) {
        return memo.get(n);
    }

    memo.set(n, fibonacci(n - 1) + fibonacci(n - 2));
    return memo.get(n);
}

let memo = new Map();
  • This improves our time complexity to O(n) - which is, of course, extremely fast compared to O(2^n). The first approach is just basic recursion - by memoizing results to avoid duplicate computation, it becomes dynamic programming.

Top Down

  • This method of using recursion and memoization is also known as "top-down" dynamic programming
  • Another way to approach a dynamic programming problem is with a "bottom-up" algorithm. In bottom-up, we start at the bottom (base cases) and work our way up to larger problems. This is done iteratively and also known as tabulation. Here is the bottom-up version of Fibonacci:
/**
 * @param {number} n
 * @return {number}
 */
var fibonacci = function(n) {
    let arr = new Array(n + 1).fill(0);
    // base case - the second Fibonacci number is 1
    arr[1] = 1;
    for (let i = 2; i <= n; i++) {
        arr[i] = arr[i - 1] + arr[i - 2];
    }

    return arr[n];
}
  • Top-down and bottom-up refer only to how you decide to implement your algorithm. There is fundamentally nothing different between the two approaches. Every top-down implementation can be implemented bottom-up and vice versa. The things that define a DP algorithm are the base cases and recurrence relation (which we will talk about more in the next article).
  • Usually, a bottom-up implementation is faster. This is because iteration has less overhead than recursion, although this is less impactful if your language implements tail recursion.
  • However, a top-down approach is usually easier to write. With recursion, the order that we visit states does not matter. With iteration, if we have a multidimensional problem, it can sometimes be difficult figuring out the correct configuration of your for loops.

is it DP problem?

  • The problem will be asking for an optimal value (max or min) of something or the number of ways to do something.
  • At each step, you need to make a "decision", and decisions affect future decisions.
  • The second characteristic is usually what differentiates greedy and DP. The idea behind greedy is that local decisions do not affect other decisions.

Step

  • first step to creating DP algorithms is deciding on what state variables are necessary.
  • The number of state variables used is the dimensionality of an algorithm. For example, if an algorithm uses only one variable like i, then it is one dimensional. If a problem has multiple state variables, it is multi-dimensional. Some problems might require as many as five dimensions.

time and space complexity

  • Complexity analysis for DP algorithms is very easy. Like with trees/graphs, we calculate each state only once. Therefore, if there are N possible states, and the work done at each state is F, then your time complexity will be O(N * F). Notice that this is the exact same argument we used in the tree and graph problems.
  • In many problems, the space complexity can be improved when implementing bottom-up, but not top-down. We'll see this in the rest of the chapter.

Framework

  1. First, we need to decide what the function is returning.
  2. Second, we need to decide on what arguments the function should take (state variables).
  • Which is the recurrence relation of this problem. Typically, finding the recurrence relation is the hardest part of constructing a DP algorithm. This one is relatively straightforward, but we'll see later that recurrence relations can be much more complicated.
  • The recurrence relation is useless on its own
  • We need base cases so that our function eventually returns actual values.
  • Figuring out the base cases is usually pretty easy and just requires a little bit of logical thinking. Make sure to read every question carefully.

bucket sort