Longest Repeated Substring Problem | Longest Duplicate Substring Problem with Code

Longest repeated substring problem is a problem of finding the longest substring that occurs at least twice in a given string. This is also one of the important interview questions.

Problem Statement

Given a string S, consider all duplicated substrings: (contiguous) substrings of S that occur more than once.  (The occurrences may overlap.) Return any duplicated substring that has the longest possible length.  (If S does not have a duplicated substring, the answer is "".)

Example 1:

Input: “banana”, Output: “ana”

Example 2:

Input: “abcd”, Output: “”

Optimized Solution Using Binary Search & Rabin-Karp

The task of searching longest repeated substring can be divided into following two sub tasks

Solution to Longest Repeated Substring Problem

Subtask 1: Perform a search by a substring length L in interval 1 to N

IA na├»ve solution to check all possible string length one by one would be in-efficient. The fact that if there is a duplicate string of length k then there will be duplicated string of length k – 1 could be used to optimize the algorithm. Binary search algorithm reduces the complexity of searching the length to O(logN).

Subtask 2: Then check if there is a duplicate substring of length L

The optimum way to check for duplicate sub-string of a given length is by Rabin-karp method. It uses hashing to find an exact match of a pattern string in a text.

The idea of the algorithm is

  • Calculate the hash for the pattern of length L
  • Move a sliding window of length L along the string of length N
  • Check if the hash of string in the sliding window is equal to hash pattern
    • If yes, check if two string are equal
Visualizing Rabin – Karp Method

Improvement in Rabin-Karp for our problem

For solving longest duplicate sub-string problem; we need to make the following improvement in Rabin-Karp.

  • Search multiple patterns instead of one by storing previous hash in a set.
  • Use rolling hash instead of calculating it every time
  • Use bigger hashing mod to calculate hash in constant time reduces complexity to O(N)

Java Code Snippet

class Solution {
    long mod=0;
    public String longestDupSubstring(String S) {
        
        mod=(long)1<<32;
        int n=S.length();
        
        int left=1, right=n;
        char[] nums=S.toCharArray();
        
        
        while(left<=right){
            int mid=left+ (right-left)/2;
            
            if(search(mid,n,nums)!=-1) left=mid+1;
            else right=mid-1;
        }
        
        int start=search(left-1,n,nums);
        return S.substring(start,start+left-1);
        
    }
    
    int search(int l,int n, char[] nums){
        
        long h=0;
        for(int i=0;i<l;i++){
            h=(h*26 + (nums[i] - 'a'))%mod;
        }
        
        Set<Long> set=new HashSet<>();
        set.add(h);
        long aL = 1;
        for (int i = 1; i <= l; ++i) aL = (aL * 26) % mod;
        
        for(int i=1;i<n-l+1;i++){
            h=(long)(h*26-(nums[i-1]-'a')*aL%mod +mod)%mod;
            h= (h+(nums[i+l-1]-'a'))%mod;
            if(set.contains(h)) return i;
            set.add(h);
        }
        
        return -1;
    }
}

Performance

Above algorithm is better than most other algorithm. It has time complexity of O(nlog(n)) and space complexity of O(n).

Leave a Reply