Data Structure & Algorithm Notes

Big oh

describes computational complexity of an algorithm
always relative to input size
split in 2 parts
1. space - amount of memory allocated
2. time - amount of time; how the num of operation changes respect to input changes

what is time complexity of following?

for (int i = 0; i < arr.length; i++) {
    for (int j = i; j < arr.length; j++) {
        print(arr[i] + arr[j])
    }
}

Answer


O(n^2)

the inner loops run for n times
then for n-1, n-2
sum of n, n-1, n-2 is a series. n*(n+1)/2.

O(1) - no matter input size, algorithm always uses constant time
O(logn) - somewhere in algorithm, input is being reduced by percentage

Recursion

every recursive algorithm can be converted to iterative and vice-versa; but the solution might be elegant in one form
uses callstack to mimic loops;
needs a base condition (condition at start of recursive function that terminates the calls)
use it to breakdown a problem into subproblem; whose solutions can then be combined to solve original problem

For vs While

I used to think its mostly same; but its important to understand the nuance b/w the 2 to solve DSA problems
for - rigid; while - free flowing

visualization

for

works like stair; you start from bottom; you step up; you have fixed length
nature - pulsating & rythmic like wall clock

while

works like rolling a dice; wait until condition is met or continue throwing until a condition is met

Array & Strings

array -> static/dynamic; default assumption: dynamic array
string -> mutable/immutable; default assumption: immutable
string is basically an array

Two Pointers Technique 🔄

intro

very common, used to solve array/string problems
pointers - 2
iterables - 1 or 2

variation 1: 1 iterable

function fn(arr):
    left = 0
    right = arr.length - 1

    while left < right:
        // Do some logic here depending on the problem
        // Do some more logic here to decide on one of the following:
            1. left++
            2. right--
            3. Both left++ and right--

strength:

we will never have more than O(n) iterations

uses:

effective for comparing pair of elements
finding subarray/substring, reversing array

variation 2: 2 iterable

function fn(arr1, arr2):
    i = j = 0
    while i < arr1.length AND j < arr2.length:
        // Do some logic here depending on the problem
        // Do some more logic here to decide on one of the following:
            1. i++
            2. j++
            3. Both i++ and j++

    // Step 4: make sure both iterables are exhausted
    // Note that only one of these loops would run
    while i < arr1.length:
        // Do some logic here depending on the problem
        i++

    while j < arr2.length:
        // Do some logic here depending on the problem
        j++

strength

will never have more than O(m+n) iterations

other variations

in variation 1, we started index at first & last. some problems might require starting at other index
in variation 2, we initialized pointers at 0 but ran on 2 different inputs; sometimes there will be only 1 input but we will still initialize both at 0th index & move both of them forward (2 pointer is just a technique)

variables

left, right (single array)
i, j (2 arrays)

common patterns

subarray vs subsquence vs subset

subarray: contiguous elements
subsequence: not necessarily contiguous
subset: not necessarily contiguous, order doesn't matter

solving problems

subarray: sliding window, prefix sum (input is integer array and you need to find sum of subarray)
subsequence: dynamic programming
subset: backtracking, you can sort the array because order doesn't matter

nature of difficulty

subarray: easy
subsequence: medium
subset: easy

examples

[1,2,3,4]

subarray: [1,2], [2,3,4]
subsequence: [1,3], [1], [3,2,1]
subset: [1,3], [1], [3,2,1]

Hashing (using Map and Set)

Data Structures

2 things to take care of in data structures- interface & implementation
extremely common. reduces time complexity by O(n)
modulo operation. if num%x; final value will be b/w [0, x-1]
hash function -> converts key to index. array has index
hash function + array = hashmap (hashtable / dictionary)
hashmap -> maps key to value
unordered, can add/remove in O(1)
multiple ways to handle collision; most common is chaining
how to reduce collison? size of array and modulus should be prime number

Set

when you want to just check if element exists
hashes key to integer
diff b/w set and hashamp is set has no value
don't track frequency; add same element 100 times, it will still have 1

hashmap in JS

let map = new Map();
map.set('a', 1);
map.get('a'); // 1
map.has('a'); // true
map.size; // 1
map.delete('a');

let set = new Set();
set.add('a');
set.has('a'); // true
set.size; // 1
set.delete('a');

key difference

set -> add
map -> set

nuances

string? -> make use of fact that there are only 26 alphabets
nums? sum of nums, xor of nums

Counting

counting? think about hashmap
Count the number of subarrays with an "exact" constraint
uses 3 concepts prefixes, hashmap & diff

nuance

when thinking of hash function, think what is common b/w all the values

Linked List

generally linkedlist is given as input
there are few problems that will use LL to optimise the solution
add or delete element at any position in O(n); unless you have direct reference to the node
much better than dynamic array which requries O(n) for add/delete at any position
disadvantage - no random access
more overhead than array

traversal

let getSum = head => {
    let ans = 0;
    while (head) {
        ans += head.val;
        head = head.next;
    }
    return ans;
}

insert

let addNode = (prevNode, nodeToAdd) => {
    nodeToAdd.next = prevNode.next;
    prevNode.next = nodeToAdd;
}


let deleteNode = prevNode => {
    prevNode.next = prevNode.next.next;
}

doubly linked list

let addNode = (node, nodeToAdd) => {
    let prevNode = node.prev;
    nodeToAdd.next = node;
    nodeToAdd.prev = prevNode;
    prevNode.next = nodeToAdd;
    node.prev = nodeToAdd;
}

let deleteNode = node => {
    let prevNode = node.prev;
    let nextNode = node.next;
    prevNode.next = nextNode;
    nextNode.prev = prevNode;
}

fast and slow pointers

similar to 2 pointer technique
break the problem down into steps

function fn(head):
    slow = head
    fast = head

    while fast and fast.next:
        // Do something here
        slow = slow.next
        fast = fast.next.next

variables

slow, fast
prev, curr, temp (in case of reversal)

Reversing linked list

have a pointer to prev, curr, next
solution to LL are simple and elegant
to solve think what you need and solve problem one step at a time
Most linked list problems don't have tricks, you just need a strong understanding of how to move pointers around. The best way to train this skill is to practice.

Stack

The characteristic that makes something a "stack" is that you can only add and remove elements from the same end

where is it used

whenever you can recognize the LIFO pattern.
problems that involves elements in the input interacting with each other
matching elements together, querying some property such as "how far is the next largest element"
evaluating a mathematical equation given as a string
comparing elements against each other

Queue

trickier to implement than stack
operations: enqueue - push; dequeue - shift
queues are less common than stacks but generally problem is more difficult
most commonly used in BFS

Monotonic

elements are always sorted
monotonic stack/queue, maintain their sorted propery by removing element that would violate the sorted property before adding new element
useful in problems where you have to find the "next" greater/smaller element

stack = []
for num in nums:
    while stack.length > 0 AND stack.top >= num:
        stack.pop()
    // Between the above and below lines, do some logic depending on the problem
    stack.push(num)

Binary trees

most common type of interview question
makes heavy use of recursion
LL and tree are types of graph
real life examples:
- file systems
- comment thread
- company hierarchy
depth of tree: how far from root; root has depth 0
when traveling LL we usually do it iteratively. with tree we usually do it recursively

DFS

preorder, postorder, inorder
priortize depth

let dfs = node => {
    if (!node) {
        return;
    }

    dfs(node.left);
    dfs(node.right);
    return;
}

each function call solves and returns the answer to the original problem as if the subtree rooted at the current node was the input
preorder: root, left, right
- logic is done first then move to children
inorder: left, root, right
- no logic is done until you reach a node without left child; since calling on left takes priority
postorder: left, right, root
- no logic until you reach leaf node; since calling on children takes priority; root is last
The name of each traversal is describing when the current node's logic is performed.
- pre: before children
- in: between children
- post: after children
switching b/w 3 types is trivial, but you need to know which one to use when
to calculate depth - you use postorder

time & space complexity

time: O(n) (you need to visit each node once)
space:
- worst - O(n) - tree can be straight line
- best - O(log n) - balanced tree

BFS

DFS uses stack (recursion is stack), BFS uses iteration and queue

when to use BFS vs DFS

if you have to visit every node then it doesn't matter
its rare to find problem that using DFS is better than BFS
implementing DFS is usually less code and easier because of recursion
- so generally people end up using DFS
on flip side, there are few problems where BFS makes more sense (generally when dealing with level)
in some cases, solution might be in Right tree, and if u had used DFS, you will waste time visiting all left nodes
in some cases, solution might be in last level, and if u had used BFS, you will waste time visiting all levels
space complexity in DFS is height of tree
space complexity in BFS is width of tree
in some cases, DFS will use less space, in some BFS
perfect tree - BFS uses more space O(n), DFS uses less space O(log n)
lopsided tree - DFS uses more space O(n), BFS uses less space (O(1))

let printAllNodes = (root) => {
  let queue = [root]
  while (queue.length) {
    let nextQueue = []

    for (let i = 0; i < queue; i++) {
      let node = queue[i]

      // do some logic here on the current node
      console.log(node.val)

      // put the next level onto the queue
      if (node.left) {
        nextQueue.push(node.left)
      }
      if (node.right) {
        nextQueue.push(node.right)
      }
    }

    queue = nextQueue
  }
}

shift is way slower than pop (atleast 2x)

Binary Search Tree

a type of binary tree
all values in left subtree are less than root
all values in right subtree are greater than root
operations like search, insert, delete are O(log n) in average case
to print elements in sorted order, do in-order DFS traversal

Graphs

Graphs are part of our everyday lives. Without even trying too hard, you can model literally anything as a graph.

terminology

Edges of a node can either be directed or undirected.
In binary trees, the edges were directed. Binary trees are directed graphs.
In binary trees, there must only be one connected component
In binary trees, all nodes except the root had an indegree of 1 (due to their parent). All nodes have an outdegree of 0, 1, or 2. An outdegree of 0 means that it is a leaf. Specific to trees, we used the parent/child terms instead of "neighbors".
A graph can be either cyclic or acyclic.

how are graphs given in an algorithm

In linked list problems, the head of the linked list is given. In binary tree problems, the root of the tree is given
An important thing to understand is that with linked lists and binary trees, you are literally given objects in memory that contain data and pointers. With graphs, the graph doesn't literally exist in memory.
In fact, only the "idea" of the graph exists.
The problem statement may or may not explicitly state the input is a graph. Sometimes there might be a story, and you need to determine that the input is a graph

Array of edges

example: edges = [[0, 1], [1, 2], [2, 0], [2, 3]]
In this input format, the input will be a 2D array. Each element of the array will be in the form x, y, which indicates that there is an edge between x and y
At every node, we would need to iterate over the entire input to find the neighbors. This is very slow!
Before starting the traversal, we can pre-process the input so that we can easily find all neighbors of any given node
The easiest way to accomplish this is using a hash map.

let buildGraph = edges => {
    let graph = new Map();
    for (const [x, y] of edges) {
        if (!graph.has(x)) {
            graph.set(x, []);
        }

        graph.get(x).push(y);

        // if (!graph.has(y)) {
        //     graph.set(y, []);
        // }

        // graph.get(y).push(x);
        // uncomment the above lines if the graph is undirected
    }
}

adjacency list

example: graph = [[1], [2], [0, 3], []]
The input will be a 2D integer array.
graphi will be a list of all the outgoing edges from the ith node.
most convenient format

adjacency matrix

example: graph = [[0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 1], [0, 0, 0, 0]]
2D matrix of size n x n
if graph[i]j == 1, that means there is an outgoing edge from node i to node j
can build hashmap to see all neighbors of a node else will have to iterate over the entire row

matrix

more subtle but very common
The input will be a 2D matrix and the problem will describe a story
In this case, each square (row, col) of the matrix is a node, and the neighbors are (row - 1, col), (row, col - 1), (row + 1, col), (row, col + 1)

code diff b/w graphs and trees

While a binary tree has a root node to start traversal from, a graph does not always have an obvious "start" point.
in a graph, we might need to convert the input into a hash map first. When traversing a tree, we refer to node.left and node.right at each node. When traversing a graph, we will need to use a for loop to iterate over the neighbors of the current node, since a node could have any number of neighbors
To prevent cycles and unnecessarily visiting a node more than once, we can use a set seen. Before we visit a node, we first check if the node is in seen. If it isn't, we add it to seen before visiting it. This allows us to only visit each node once in O(1) time because adding and checking for existence in a set takes constant time.

Graph BFS

easy

Single Source shortest path

DFS
BFS
Dijkstra
Bellman-Ford
Floyd-Warshall

Heap

priority queue is an abstract data structure
add an element in O(log n) time
remove an element in O(log n) time
get the maximum element in O(1) time
The ability to find the max/min element in constant time, while only needing logarithmic time to maintain this ability through changes makes a heap an extremely powerful data structure.

implementation

There are multiple ways to implement a heap, although the most popular way is called a binary heap using an array.
If a node is at index i, then its children are at indices 2i + 1 and 2i + 2. When elements are added or removed, operations are done to maintain the aforementioned property of parent.val <= child.val. The number of operations needed scales logarithmically with the number of elements in the heap, and the process is known as "bubbling up".
A heap is a great option whenever you need to find the maximum or minimum of something repeatedly.

2 heaps

Using multiple heaps is uncommon and the problems that require it are generally on the harder side. If a problem involves finding a median, it's a good thing to think about

Greedy

study

Backtracking

The most brute force way to solve a problem is through exhaustive search.
Backtracking is an optimization that involves abandoning a "path" once it is determined that the path cannot lead to a solution (Abandoning a path is also sometimes called "pruning".)

why it works

In an exhaustive search, we generate all possibilities and then check them for solutions. In backtracking, we prune paths that cannot lead to a solution, generating far fewer possibilities.
Backtracking is a great tool whenever a problem wants you to find all of something, or there isn't a clear way to find a solution without checking all logical possibilities. On LeetCode, a strong hint that you should use backtracking is if the input constraints are very small (n <= ~15), as backtracking algorithms usually have exponential time complexities.

function backtrack(curr) {
    if (base case) {
        Increment or add to answer
        return
    }

    for (iterate over input) {
        Modify curr
        backtrack(curr)
        Undo whatever modification was done to curr
    }
}

tree analogy

think of it like working in a tree
Each call to the function backtrack represents a node in the tree. Each iteration in the for loop represents a child of the current node, and calling backtrack in that loop represents moving to a child
The line where you undo the modifications is the "backtracking" step and is equivalent to moving back up the tree from a child to its parent.
At any given node, the path from the root to the node represents a candidate that is being built. The leaf nodes are complete solutions and represent when the base case is reached.

common type of problems - Generation

generate all of something

Given their exponential solution space, it is tricky to ensure that the generated solutions are complete and non-redundant. It is essential to have a clear and easy-to-reason strategy.

There are generally three strategies to do it:

Recursion

Backtracking

Lexicographic generation based on the mapping between binary bitmasks and the corresponding permutations / combinations / subsets.

As one would see later, the third method could be a good candidate for the interview because it simplifies the problem to the generation of binary numbers, therefore it is easy to implement and verify that no solution is missing.

Besides, as a bonus, it generates lexicographically sorted output for the sorted inputs.

Dynamic Programming

Usually, problems where you use DP can only be solved with DP (in a reasonable time complexity)
If you don't know DP, then it is almost impossible to solve a DP problem, even if it's an easy one
dynamic programming is just optimized recursion
The arguments that a recursive function takes represents a state.
The difference with DP is that states can be repeated, usually an exponential number of times. To avoid repeating computation, we use something called memoization. When we find the answer (the return value) for a given state, we cache that value (usually in a hash map). Then in the future, if we ever see the same state again, we can just refer to the cached value without needing to re-calculate it.

var fibonacci = function(n) {
    if (n == 0) {
        return 0;
    }

    if (n == 1) {
        return 1;
    }

    return fibonacci(n - 1) + fibonacci(n - 2);
}

This algorithm has a time complexity of O(2^n)
TRY https://leetcode.com/problems/fibonacci-number/

/**
 * @param {number} n
 * @return {number}
 */
var fibonacci = function(n) {
    if (n == 0) {
        return 0;
    }

    if (n == 1) {
        return 1;
    }

    if (memo.has(n)) {
        return memo.get(n);
    }

    memo.set(n, fibonacci(n - 1) + fibonacci(n - 2));
    return memo.get(n);
}

let memo = new Map();

This improves our time complexity to O(n) - which is, of course, extremely fast compared to O(2^n). The first approach is just basic recursion - by memoizing results to avoid duplicate computation, it becomes dynamic programming.

Top Down

This method of using recursion and memoization is also known as "top-down" dynamic programming
Another way to approach a dynamic programming problem is with a "bottom-up" algorithm. In bottom-up, we start at the bottom (base cases) and work our way up to larger problems. This is done iteratively and also known as tabulation. Here is the bottom-up version of Fibonacci:

/**
 * @param {number} n
 * @return {number}
 */
var fibonacci = function(n) {
    let arr = new Array(n + 1).fill(0);
    // base case - the second Fibonacci number is 1
    arr[1] = 1;
    for (let i = 2; i <= n; i++) {
        arr[i] = arr[i - 1] + arr[i - 2];
    }

    return arr[n];
}

Top-down and bottom-up refer only to how you decide to implement your algorithm. There is fundamentally nothing different between the two approaches. Every top-down implementation can be implemented bottom-up and vice versa. The things that define a DP algorithm are the base cases and recurrence relation (which we will talk about more in the next article).
Usually, a bottom-up implementation is faster. This is because iteration has less overhead than recursion, although this is less impactful if your language implements tail recursion.
However, a top-down approach is usually easier to write. With recursion, the order that we visit states does not matter. With iteration, if we have a multidimensional problem, it can sometimes be difficult figuring out the correct configuration of your for loops.

is it DP problem?

The problem will be asking for an optimal value (max or min) of something or the number of ways to do something.
At each step, you need to make a "decision", and decisions affect future decisions.
The second characteristic is usually what differentiates greedy and DP. The idea behind greedy is that local decisions do not affect other decisions.

Step

first step to creating DP algorithms is deciding on what state variables are necessary.
The number of state variables used is the dimensionality of an algorithm. For example, if an algorithm uses only one variable like i, then it is one dimensional. If a problem has multiple state variables, it is multi-dimensional. Some problems might require as many as five dimensions.

time and space complexity

Complexity analysis for DP algorithms is very easy. Like with trees/graphs, we calculate each state only once. Therefore, if there are N possible states, and the work done at each state is F, then your time complexity will be O(N * F). Notice that this is the exact same argument we used in the tree and graph problems.
In many problems, the space complexity can be improved when implementing bottom-up, but not top-down. We'll see this in the rest of the chapter.

Framework to solve DP problems

3 steps method
we have taken example of min-cost of climbing stairs

a function or data structure that will compute/contain the answer to the problem for a given state
1. Ex, for top down, we need to decide what the function is returning.
2. Second, we need to decide on what arguments the function should take (state variables).
- example: dp(i) -> min cost to reach stairs till i step
A recurrence relation to transition b/w states

example: dp(100) = min(dp(99) + cost99, dp(98) + cost98)
This is the recurrence relation of this problem. Typically, finding the recurrence relation is the hardest part of constructing a DP algorithm. This one is relatively straightforward, but we'll see later that recurrence relations can be much more complicated.

base cases

By itself, the recurrence relation will continue forever until dp(-infinity).
The recurrence relation is useless on its own
We need base cases so that our function eventually returns actual values.
Figuring out the base cases is usually pretty easy and just requires a little bit of logical thinking. Make sure to read every question carefully.

common patterns

Hashing (using Map and Set)

Counting

Linked List

fast and slow pointers

Reversing linked list

Stack

Queue

Monotonic

Binary trees

BFS

Binary Search Tree

Graphs

Graph BFS

Single Source shortest path

Heap

Greedy

Backtracking

Dynamic Programming

Framework to solve DP problems

bucket sort