3 Data Structure
Big O Notationβ
Substution Methodβ
- Base case: show that the statement is true for the smallest value of π (typically 0)
- Induction step: assuming that the statement is true for any , show that it must also hold for
Master Methodβ
Compare which part is bigger
case1: for some ,
case2:
for some ,
case3: for some , and
P and NPβ
Problems, Instances and Algorithms:
- Problem: Max-Flow
- Instance: A graph G with edge-capacities, two vertices s, t, and an integer k
- Algorithm: It takes as input an instance, and outputs either 'YES' or 'NO'
Reductions:
- A B: A reduces to B if an instance is YES iff is YES
- A B: A is polynomial time reducible to B. There is an algorithm A that has the following properties:
- given an instance of X, A produces an instance
- A runs in time .
- Answer to is YES iff answer to is YES.
- Theorem1: If A B and B P, then A P
- Theorem2: If A B and B C, then A C (transitivity)
P and NP:
- P(Polynomial): The set of all decision problems that have an algorithm that runs intime for some constant
- NP(nondeterministic polynomial): NP is the set of existential questions that can be verified in polynomial time
- NP hard: B is NP hard if A B for all A NP
- NP complete
- Definition: B in NP and B is NP hard
- Theorem: If B C, C NP, and B is NP complete, then C is NP complete(Proof using transitivity)
Typical NP complete problemβ
- SAT
- 3-SAT
- Clique
- Independent Set
- vertex cover
- Hamiltonian Cycle
3SAT Independent Set
- If the formula is satisfiable, there is at least one true literal in each clause. Let S be a set of one such true literal from each clause. |S| = k and no two nodes in S are connected by an edge.
- If the graph has an independent set S of size k, we know that it has one node from each βclause triangle.β Set those terms to true. This is possible because no 2 are negations of each other.
DNF(disjunctive normal form)
CNF(conjunctive normal form)
Hashmapβ
Implement | contains(x) | add(x) |
---|---|---|
Bushy BSTs | Ξ(log N) | Ξ(log N) |
Separate Chaining Hash Table With No Resizing | Ξ(N) | Ξ(N) |
β¦ With Resizing | Ξ(1) | Ξ(1) |
Separate Chaining Data Indexed Arrayβ
Data is converted into a hash code. The hash code is then reduced to a valid index.
Javaβs hashCode() function for Strings
public int hashCode() {// From Left to right: High to Low
int h = cachedHashValue;
if (h == 0 && this.length() > 0) {
for (int i = 0; i < this.length(); i++) {
h = 31 * h + this.charAt(i);
}
cachedHashValue = h;
}
return h;
}
31: the hashCode base should be a small prime to avoid overflow
suppose the number of buckets: M, number of items: N
then complexities of contains and add are O(Q = N/M)
To make them O(1), strategy: When N/M is β₯ 1.5, double M
bool contains(int x, vector<list<int>>& hash_table){
int n = hash_table.size();
int index = (x%n + n)%n;
list<int>& slot = hash_table[index];
return find(slot.begin(),slot.end(),x)!=slot.end();
}
bool add(int x,vector<list<int>>& hash_table){
int n = hash_table.size();
int index = (x%n + n)%n;
hash_table[index].push_back(x);
}
Open Addressingβ
An alternate way to handle collisions is to use open addressing
If target bucket is already occupied, use a different bucket
- Linear probing
- Use next address, and if already occupied, just keep scanning one by one.
- Quadratic probing
- Use next address, and if already occupied, try looking 4 ahead, then 9 ahead, then 16 ahead, ...
Sortβ
Counting Sortβ
Procedure:
- Find the range
- Count number of occurrences of each item
- Iterate through list, using count array to decide where to put everything
If the size of the hashmap is R, then
- Runtime: Ξ(N+R)
- Memory: Ξ(N+R)(Result Ξ(N))
Radix Sortβ
LSD Radix Sortβ
Sort each digit independently from rightmost digit towards left
W: Width of each item in # digits
Runtime: Ξ(WN+WR)
MSD Radix Sortβ
- Best Case: finish in one counting sort pass, looking only at the top digit: Ξ(N + R)
- Worst Case: look at every character, degenerating to LSD sort: Ξ(WN + WR)
List,Set,Mapβ
Heapβ
Binary heap: Binary tree that is complete and obeys heap property
- Min-heap: Every node is less than or equal to both of its children
- Max-heap: Every node is greater than or equal to both of its children
Priority Queue Implementation
Implement | Ordered Array | Bushy BST | Hash Table | Heap |
---|---|---|---|---|
add | Ξ(N) | Ξ(log N) | Ξ(1) | Ξ(log N) |
getSmallest | Ξ(1) | Ξ(log N) | Ξ(N) | Ξ(1) |
removeSmallest | Ξ(N) | Ξ(log N) | Ξ(N) | Ξ(log N) |
Trieβ
Each Node Stores One Character, keys in blue
Implementationβ
- DataIndexedCharMap
- BST
- Separate Chaining Hash Table
Analysisβ
Contains and Add in O(L)(the length of the key), for each node:
- DataIndexedCharMap is Ξ(1).
- BST is O(log R), where R is size of alphabet.
- Hash Table is O(R), where R is size of alphabet.
Space:
- DataIndexedCharMap: 128 links per node.
- BST: C links per node, where C is the number of children.
- Separate Chaining Hash Table: C links per node.