I am in the middle of a great book right now called The Mother Tongue. It’s about the origin of the English language. In the early pages it pokes a little fun at the attempt some companies who have a false mastery of the language.
I seem to buy a lot of these products and they are hilarious! I’ve been meaning to post this image for a while! It’s the back of the box of my network cable tester.
My favorite is the “Attention”, “Do not change it on your mind.” I can’t even fathom what it’s trying to warn me against. I’ve got a manual for my XO brand radio that is just as bad! Most of the time you have no clue what it’s trying to tell you. The Radio, by the way, is not much better. The English on it is pretty simple, but there are large sections of the user interface that are not translated from Chinese! It’s also a pretty poor MP3 player which sucks because that is the feature I purchased the radio for.
There are three main tools used in cryptography. They are hashing, synchronous encryption, and asynchronous encryption. A very superficial definition could be:
Hashing – Creating a unique value based on blocks of input. Even a minor change in the input, say a value changes by 1 bit, will cause a good hash to change drastically. Ideally, every bit in the right place is the only way to create this number. Hashing is a one-way function so it is not useful for data hiding, but it is very useful for validating the authenticity of data such as a download or a certificate. It is used to ensure that a message has not been tampered with. If even a little change is introduced the unique value becomes very different. If two different blocks of data are hashed using an algorithm that produces the same resultant value it is called a collision, and the algorithm is considered compromised. This is because a malicious hacker could change the value of the data and still have the resulting hashed value not change. When this happens the hash algorithm is useless. This actually happens a lot!
Synchronous Encryption – This means changing an input block using a key into something unrecognizable by anyone not in possession of that key. This type of encryption is what people most commonly think of when you talk about encryption and has been used since about 1900 BC! Depending on the size of the key and the technique used, It is also the most difficult to break.
Asynchronous Encryption – Two keys are generated that are mathematically linked. A key that is used to write (public key) and another key to read (private key). The idea is that a public key can be given to whomever wants to communicate with us. It doesn’t matter if a hacker captures this key because it doesn’t allow them to read the message that the client sends back. In the case of SSL/TSL the message that the client sends back is a key to use for synchronous encryption.
Only the recipient whom has the private key will be able to unlock this message and be able to further communicate with the key that was sent. The only way to break this type of encryption is to guess the key, but if you have sufficient key lengths (1024 is now standard in the US, for example), then a brute-force attack would take longer than the Universe has to live to guess the key. Furthermore, because we passed a random set of data (a new synchronous encryption key) a hacker wouldn't know if they successfully cracked the message or not.
This may sound all well and good but SSL/TSL breaks down if the authenticity of the RP (Remote Party / Server) cannot be verified. That is why we have CA’s, or Certificate Authorities. The job of a CA is to vouch for the authenticity of the remote party. This means that IF YOU GET A CERTIFICATE WARNING THIS IS THE ONLY THING PROTECTING YOU FROM A MAN-IN-THE-MIDDLE ATTACK. If someone relayed communication between you and the remote party the certificate warning is the only thing that will protect you.
Actually, all three of these cryptographic technologies are used in SSL/TSL. If you have ever gone to an https website and clicked on the “lock” icon, this is what you are likely to see.
The first tab states that the certificate’s purpose is to “Ensure the identity of a remote computer” and has some information about the CA and expiration date. In the second tab you can see that the hash algorithm is SHA1 and that the public key is an RSA 1024 bit key. Symmetric communication takes over when we pick a key and a symmetric encryption algorithm. The server may reject the algorithm we selected and you would be forced to pick again until you both agreed on which algorithm to use. Now days it’s likely to be Rijndael AES encryption that is chosen to communicate the rest of your session.
Back to Hashes
Hashes are usually considered the weakest link in this chain of cryptography. To date the following hash algorithms have been compromised: HAVAL, MD2, MD4, MD5, PANAMA, RadioGatun, RIPEMD, SHA-0, SHA-1, and Tiger. Many of these algorithms are not completely compromised, bur rather practically compromised. We know that we can break them but it’s unfeasible. That is to say that generating a collision for a message is likely to be more difficult than it’s worth. The most common algorithms in use today are MD5 and SHA1, both of which have recently been compromised.
In 1996 a flaw was found in the design of MD5 and in 2007 it was broken completely and it was demonstrated that a pair of files could be created which have the same hashed value and they were able to fake SSL certificate validity. This caused the DHS (Department of Homeland Security) to issue a statement indicating that the MD5 function should be considered cryptographically broken and that it is unsuitable for further use. This is no small thing as we have an insatiable addiction for MD5! We use it everywhere! Everywhere else we use SHA-1 which has also been cracked. The DHS suggests SHA-2 but since it is algorithmically similar to SHA-1 there is a good chance it won’t be effective for very long.
So how do we make hashes stronger?
I don’t believe that we are all of a sudden going to invent a new generation of hashing algorithms that are significantly stronger. In fact, if anything we’re getting better and better at cracking these hash functions so they have shorter and shorter lifespan. I think what we will have to do is become more inventive in our use of them.
Hash functions work a lot like block ciphers (synchronous encryption), that is they work on blocks of data at a time. If you are going to fool a hash function then you either need to make the tampered block result in the same hashed value or you need to “balance” out the message in a later block. Blocks are typically 512 bytes of data. If there is a remainder at the end of the message it will be hashed with a partially-empty block.
One obvious thing you can do to protect yourself is to supply more than one type of hash for any given set of data. So, for example, if you download a file from the Internet you sometimes see a few different hashes. You may fool one hash, but you’re not going to fool two hashes with the same collision, especially if the two hashing algorithms are unrelated.
While using two hashes seems a reasonable approach there are some down sides. First, it’s twice the work for your computer hashing the same bits using two different hash functions. Secondly, it is not possible in every scenario where a single hash must be used. One example of this is the use of HMAC authentication. Such authentication works kind of like this:
Client | | Server |
User “nzaugg” wishes to authenticate | ------> | |
| <------ | Okay, here is a very large and unique phrase. |
Hash(Password+Phrase) | ------> | Yep, that’s what I get when I Hash(Password + Phrase) |
We can’t really send more than one hash as a response and even if we did, it wouldn't help any.
While playing with a pair of hacked files I had an idea which could still use MD5 but stagger the hashes so the blocks align differently, then hash those hashes together. It sounds pretty home-brew, but let me explain.
Offset Hashing
Lets say that I have that same hash as before but I have compromised data in block #1 (red line). If we hash that data again with different data before it and at a different place in the block it will yield a different result. In order for this to work though, we need actual data in the offset blocks (in blue). We can simply take the last 256 bits from the end of the file for the beginning and the first 256 bits for the end. We could also fill it with A..Z, etc. It doesn’t matter too much but should be convention. This is important because if we left the blocks on the beginning and the end empty then we could not detect tampering on those two blocks using MD5 as it’s somewhat position independent. At least it didn’t work when I tried it. It has more to do with what values preceeded it and therefore produced the same value if the previous values were all zero.
And using no new hashing algorithms we can now detect the hash tampered data. It’s still about twice as expensive as a 1-pass hash but overall hashing is fairly inexpensive and modern computers more that compensate. It may be possible to calculate a collision once, but is impossible to calculate the collision for one hash pass and have it work for the other.
It was a little troubling to me that unless the blue squares were filled with some data I was unable to detect the changed data. So I thought why not offset hash with two different hash algorithms and then use a 3rd to hash together the results of the other two hashes. I could use MD5 for pass 1, SHA-1 for pass 2, and SHA-2 to combine the two passes into a single hash.
The result is an unbeatable hash! It doesn’t matter that both SHA-1 and MD5 have been broken, no one has ever broken the pair used together and offset hashing makes that task even more impossible especially considering that the two algorithms chosen are very different algorithms. The third algorithm doesn’t need to be different than one of the other two chosen, it essentially just helps make subtle differences in the hashes stand out.
By the way, BitTorrent downloads rely heavily on hashing functions for both data verification (did I hear you correctly) but also for authenticity. There are programs out there (links below) that can change the file and get it to generate the same hash. When you download stuff off of BitTorrents you can never know what you are downloading does not contain a virus. Even if they were to use my staggered diff hash idea the virus could have been there before the torrent file was even created.
Links