TALEND SCENARIOS: DE-DUPLICATION

 1. Deduplication of Travel Paths 

Input

Source

Destination

Distance

Mumbai

Kolkata

500

Hyderabad

Mumbai

500

Kolkata

Mumbai

500

Mumbai

Hyderabad

500

Mumbai

Delhi

500

Delhi

Mumbai

500

Output

Source

Destination

Distance

Mumbai

Kolkata

500

Mumbai

Hyderbad

500

Mumbai

Delhi

500



Job Design:

1. Write input into Fixed Flow Input Component


2. Connect Main row to tJavaRow component




We are using String.compareTo() function for string comparison. This will compare two strings and give an integer value.

//Code generated according to input schema and output schema
output_row.Source = input_row.Source.compareTo(input_row.Destination)>0 ?input_row.Source: input_row.Destination;
output_row.Destination = input_row.Source.compareTo(input_row.Destination)>0 ?input_row.Destination: input_row.Source;
output_row.Distance = input_row.Distance;

This will give us the string in a uniform format

3. Use all keys for De-duplication in tUniqRow component




4. Final output will look like this

Source Code: Link














1 comment:

  1. Just suggestion for more accuracy,
    “input_row.Source.trim().compareToIgnoreCase(input_row.Destination.trim())”

    ReplyDelete