Some backup software use incremental or differential backups.So, they do a full backup at the first time, then they only store the changes for next backups.Some software are more intelligent and use a monitoring solution in file or block level.
Let's say an example. You created a text file with a word of "Hello" inside the text file and saved it yesterday. The file backed up during the daily backup process. Today, you open the file, change "Hello" to "Hello1" and save it with different name. As you see here, you just add one more character to file. The second file (new file) is a version of the first file and they are almost the same. The second file is actually a delta to the original. A delta is an offset of the original file.
Now, if we use the file-compare backup method, it will backup the entire new file because it has a different file name. If we use file-level hashing method, it will backup the entire new file again because the new file would be offset by the new data and a new hash would happen.
The best solution is delta versioning solution to de-duplicate this file.
It has two different methods:
1) Block-level delta vesioning
2) Sub-block-level delta versioning.
Block-level delta versioning:
This method is like snapshot that I explained before. This method monitors all updates on disk at block level and stores only the data that changed in relation to original data. This method really decreases the amount of data to send to Disaster Recovery site and it's really efficient.
Sub-block-level delta versioning (Microscan):
It's the same as the Block-level delta versioning but it's more efficient and it works in byte level and not block level. In figure, if we use our previous example, you add one character to the original file which means you need a single 512 bytes (one sector) to store the changes to the original file. Now, if you use an array replication software to send the file to Disaster Recovery site, it will send an entire 32K disk track because that's the minimum level of block definition for array-based solution. If we use a file level monitoring solution, we still need to send the entire 8K because the file system needs 8K to write the data.
Hope you enjoyed.