String Matching
Introduction
Search for a string (pattern) in a large body of text
- T[0...n − 1]: The text (or haystack) being searched within
- P[0...m − 1]: The pattern (or needle) being searched for
- Return smallest i such that for
This is the first occurrence of P in T
Brute Force: Θ(mn)
Knuth-Morris-Pratt Algorithm
When a mismatch occurs, the most we can shift the pattern: the largest prefix of P[0...j] that is a suffix of P[1...j]
KMP Failure Array
The failure array : fail[j] is defined as the length of the largest prefix of P[0...j] that is also a suffix of P[1...j]
- fail[0] = 0
- If a mismatch occurs at P[j] T[i] we set j = fail[j − 1]

Construct the FailArray: almost same as KMP, pattern becomes part of P
Algorithm
- Runtime: Θ(m + n)
- Space: Θ(m + n)
int KMP(){
    //T: String of length n (text), P: String of length m (pattern)
    // Construct FailArray, but each case add F[i]
    int i = 1, j = 0;
    while(i < m){
        if(p[j] == p[i]){
            fail[i] = j + 1; // 0...j
            ++i;
            ++j;
        }
        else{
            if(j > 0)j = fail[j-1];
            else{
                fail[i] = 0;
                ++i;
            } 
        }
    }
    // KMP 
    i = 0, j = 0;
    while(i < n){
        if(p[j] == t[i]){
            ++j;
            ++i;
            if(j == m){
                j = fail[j - 1];
                result.push_back(i - m); // starting point
            }
        }
        else{
            if(j > 0)j = fail[j - 1];
            else ++i;
        }
    }
}
Regular Expression
Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
Solution:
The matching should cover the entire input string (not partial).
Iterate Two strings, match one by one
- case 0: 一般情况judge(str[i-1],pattern[j-1]) - dp[i][j]=dp[i-1][j-1]
 
- case 1: pattern[j-1]=='*': - 两种小情况: - (1)not use preceding char - (2)preceding char use multiple times - dp[i][j]=dp[i][j-2]||judge(str[i-1],pattern[j-2]&&dp[i-1][j])
 
Code
class Solution {
public:
    bool match(string s, string p) {
        int n=p.size();
        int m=s.size();
        vector<vector<int>> dp(m+1,vector<int>(n+1,0));
        dp[0][0]=1;
        for(int i=2;i<=n;++i){
            if(p[i-1]=='*')dp[0][i]=dp[0][i-2];
        }
        for(int i=1;i<=m;++i)for(int j=1;j<=n;++j){
            if(judge(s[i-1],p[j-1]))
                dp[i][j]=dp[i-1][j-1];
            else if(p[j-1]=='*')
                dp[i][j]=dp[i][j-2]||judge(s[i-1],p[j-2])&&dp[i-1][j];
        }
        return dp[m][n];
    }
private:
    bool judge(char a,char b){return a==b||b=='.';}
};
Regular Expressions
| Category | Pattern | Matches | 
|---|---|---|
| Disjuncitons | /[A-Z|a-z]/ | An upper/lower case letter | 
| /[0-9]/ | A single digit | |
| /[^A-Z]/ | Negations | |
| /\d/ | any digit | |
| /\D/ | any non-digit | |
| /./ | Any character except for newline | |
| Anchors | /^/ | Start of Input | 
| /$/ | End of Input | |
| Frequencies | /?/ | Optional(one or none) | 
| /*/ | zero or more | |
| /+/ | one or more | |
| /{m}/ | m repetitions | |
| /{m,n}/ | m to n repetitions | |
| Others | /()/ | capture group |