# Longest Repeated Substring Problem | Longest Duplicate Substring Problem with Code

**Longest repeated substring problem** is a problem of finding the longest substring that occurs at least twice in a given string. This is also one of the important interview questions.

## Problem Statement

Given a string `S`

, consider all *duplicated substrings*: (contiguous) substrings of S that occur more than once. (The occurrences may overlap.) Return **any** duplicated substring that has the longest possible length. (If `S`

does not have a duplicated substring, the answer is `""`

.)

### Example 1:

**Input: **“banana”, **Output: **“ana”

### Example 2:

**Input: **“abcd”, **Output: **“”

**Optimized Solution **Using Binary Search & Rabin-Karp

The task of searching longest repeated substring can be divided into following two sub tasks

### Subtask 1: Perform a search by a substring length L in interval 1 to N

IA naïve solution to check all possible string length one by one would be in-efficient. The fact that if there is a duplicate string of length k then there will be duplicated string of length k – 1 could be used to optimize the algorithm.** Binary search algorithm** reduces the complexity of searching the length to O(logN).

### Subtask 2: Then check if there is a duplicate substring of length L

The optimum way to check for duplicate sub-string of a given length is by Rabin-karp method. It uses hashing to find an exact match of a pattern string in a text.

The idea of the algorithm is

- Calculate the hash for the pattern of length L
- Move a sliding window of length L along the string of length N
- Check if the hash of string in the sliding window is equal to hash pattern
- If yes, check if two string are equal

#### Improvement in Rabin-Karp for our problem

For solving longest duplicate sub-string problem; we need to make the following improvement in Rabin-Karp.

- Search multiple patterns instead of one by storing previous hash in a set.
- Use rolling hash instead of calculating it every time
- Use bigger hashing mod to calculate hash in constant time reduces complexity to O(N)

## Java Code Snippet

```
class Solution {
long mod=0;
public String longestDupSubstring(String S) {
mod=(long)1<<32;
int n=S.length();
int left=1, right=n;
char[] nums=S.toCharArray();
while(left<=right){
int mid=left+ (right-left)/2;
if(search(mid,n,nums)!=-1) left=mid+1;
else right=mid-1;
}
int start=search(left-1,n,nums);
return S.substring(start,start+left-1);
}
int search(int l,int n, char[] nums){
long h=0;
for(int i=0;i<l;i++){
h=(h*26 + (nums[i] - 'a'))%mod;
}
Set<Long> set=new HashSet<>();
set.add(h);
long aL = 1;
for (int i = 1; i <= l; ++i) aL = (aL * 26) % mod;
for(int i=1;i<n-l+1;i++){
h=(long)(h*26-(nums[i-1]-'a')*aL%mod +mod)%mod;
h= (h+(nums[i+l-1]-'a'))%mod;
if(set.contains(h)) return i;
set.add(h);
}
return -1;
}
}
```

## Performance

Above algorithm is better than most other algorithm. It has time complexity of O(nlog(n)) and space complexity of O(n).