# Longest Repeated Substring Problem | Longest Duplicate Substring Problem with Code

Longest repeated substring problem is a problem of finding the longest substring that occurs at least twice in a given string. This is also one of the important interview questions.

## Problem Statement

Given a string `S`, consider all duplicated substrings: (contiguous) substrings of S that occur more than once.  (The occurrences may overlap.) Return any duplicated substring that has the longest possible length.  (If `S` does not have a duplicated substring, the answer is `""`.)

### Example 1:

Input: “banana”, Output: “ana”

### Example 2:

Input: “abcd”, Output: “”

## Optimized Solution Using Binary Search & Rabin-Karp

The task of searching longest repeated substring can be divided into following two sub tasks

### Subtask 1: Perform a search by a substring length L in interval 1 to N

IA naïve solution to check all possible string length one by one would be in-efficient. The fact that if there is a duplicate string of length k then there will be duplicated string of length k – 1 could be used to optimize the algorithm. Binary search algorithm reduces the complexity of searching the length to O(logN).

### Subtask 2: Then check if there is a duplicate substring of length L

The optimum way to check for duplicate sub-string of a given length is by Rabin-karp method. It uses hashing to find an exact match of a pattern string in a text.

The idea of the algorithm is

• Calculate the hash for the pattern of length L
• Move a sliding window of length L along the string of length N
• Check if the hash of string in the sliding window is equal to hash pattern
• If yes, check if two string are equal

#### Improvement in Rabin-Karp for our problem

For solving longest duplicate sub-string problem; we need to make the following improvement in Rabin-Karp.

• Search multiple patterns instead of one by storing previous hash in a set.
• Use rolling hash instead of calculating it every time
• Use bigger hashing mod to calculate hash in constant time reduces complexity to O(N)

## Java Code Snippet

```class Solution {
long mod=0;
public String longestDupSubstring(String S) {

mod=(long)1<<32;
int n=S.length();

int left=1, right=n;
char[] nums=S.toCharArray();

while(left<=right){
int mid=left+ (right-left)/2;

if(search(mid,n,nums)!=-1) left=mid+1;
else right=mid-1;
}

int start=search(left-1,n,nums);
return S.substring(start,start+left-1);

}

int search(int l,int n, char[] nums){

long h=0;
for(int i=0;i<l;i++){
h=(h*26 + (nums[i] - 'a'))%mod;
}

Set<Long> set=new HashSet<>();
long aL = 1;
for (int i = 1; i <= l; ++i) aL = (aL * 26) % mod;

for(int i=1;i<n-l+1;i++){
h=(long)(h*26-(nums[i-1]-'a')*aL%mod +mod)%mod;
h= (h+(nums[i+l-1]-'a'))%mod;
if(set.contains(h)) return i;