Checksum

In a network when data is transferred between machines there is an expectation that some data might have transmission or storage errors. This is due to aging software, natural disasters, natural radiation, vibrations, ect…

In order to solve this issue and verify data integrity there is the checksum.

The checksum is generally generated from cryptographic functions: MD5, SHA-1, SHA-256, SHA-512, BLAKE-2, and BLAKE-3. The size of the checksum block is generally much smaller than the actual size of the block of data it is verifying. The generated checksum is the same number of characters every time and is of string type in most languages.

In order to recover from a checksum error there are two ways:

  1. 1. The data for the checksum is re-sent and verified a second time
  2. 2. Some checksum implementations are have error correction code and can fix the corrupted data on the receiving side

In some instances checksum is not used in requests such as UDP protocols. Why? Because UDP protocols generally want fast, but not reliable, data transmission. An example of this are live streams. If the video had to stop and re-send every time the user experience would be hurt, on top of that the user does not notice packet loss or artifacts from time to time.

Alternative uses of checksum

An alternative use of checksum is checking changes in data across services. Since checksum is used to verify data corruption, this is the same as trying to verify the data from the sender is exactly the same as the data on the receiver.

With this in mind we can use checksum to determine when data between servers is out of data. Instead of always re-sending large blocks of data between a server and client we can use checksum to verify if there are any changes first, which is a much smaller and faster check, and then send updates if and only if checksum does not match. This can save a lot of time, IO, and money for systems as it can result in small IO requests.

Database Introduction