A Cryptanalysis of CipherSaber-1


Note: A recent draft paper has identified new weaknesses in RC4 as used in CipherSaber-1. See the main CipherSaber page for more detailed information. -- agr 2001-7-27


CipherSaber 1 is a way of using Ron Rivest's RC4 cipher as a practical symmetric key cryptosystem. It is intended to demonstrate that strong cryptography can be very simple to implement, and therefor impossible to suppress without massive infringement on civil liberties. A CipherSaber-1 program can be written in 16 lines of QBasic. CipherSaber is explained more fully at http://ciphersaber.gurus.com.

In CipherSaber-1, the ASCII representation of the user key or passphrase is used directly as part of the RC4 key. To prevent the same RC4 key from being used twice, a ten byte initialization vector (IV) is appended to the user key to form the complete RC4 key. The IV is stored as the first ten bytes of CipherSaber-1 encoded files.

This paper presents and analysis of several potential weaknesses in CipherSaber-1 and recommendations for avoiding them. We use the same notation here that Bruce Schneier uses in his description of RC4 in the second edition of his Applied Cryptography, except that the stream of cipher bytes that RC4 produces are denoted as C(i).

1. Key Size

In most cryptosystems, the longer the key, the better. However, it was pointed out by several individuals who looked at CipherSaber-1 that a very long user key could prevent the IV from affecting more than a few of the S(i) values. This could cause leakage of plaintext. To insure adequate mixing, we advised users to restrict their user key length to 54 characters or less, as most would anyway. This insures that the IV will be mixed in at least four times during the setup process. Here is a more detailed justification of the 54 character limit.

We want to estimate the likelihood that initial cipher bytes will be unaffected by the IV. Assume a user key length of 54 bytes, the maximum length we recommend. Then all the S(i) after the first 54 will be affected by the IV during key setup. S(0) through S(53) will only be affected by the IV if they are swapped in key setup rounds 55 through 256. That means there 202 opportunities. The probability that a given S(i) will remain unswapped is

(255/256)**202 = 0.45.

So the expected number of the first 54 S(i) that are unaffected is 54*0.45 = 24.5. Thus the probability that S(i), where i is selected at random from [0,256], has not been affected by the IV is 24.5/256 = 0.096

We can calculate the values that the first few S(i) are likely be have after the key setup is complete if we assume that none of the first few swaps hit an S(i) that has already been swapped previously. Recall that at the start of key setup, S(i) = i, for all i.

Then:

S(0) = K(0)

S(1) = K(0)+K(1)+1

S(2) = K(0)+K(1)+K(2)+3

...

Call the successive cipher stream bytes C(i). If the first few S(i) are not affected during the production of the first few cipher bytes, then:

C(0) = S(S(1) + S(S(1)))

C(1) = S(S(2) + S(S(1)+S(2)))

C(2) = S(S(3) + S(S(1)+S(2)+S(3)))

...

C(i) = S(S(i+1) + S(x)), where x = S(1)+S(2)+...+S(i+1) mod 256

Each S(i) for i between 0 and 53 have a 0.45 chance of being unaffected by the IV. Therefore x has a 0.45**(i+1) chance of being unaffected.

The values S(x) and S(S(i+1)+S(x)) each have a .096 chance of being unaffected. Therefore C(i) has a chance p(i) of being unaffected by the IV, where

p(i) =0 .096*0.096*0.45**(i+1) = 0.0092*0.45**(i+1)

     i	1/p(i)

 

     0	242

     1	539

     2	1198

     3	2662

1.1 C(0) Analysis

The most vulnerable Cipher byte would appear to be C(0). However assume further that the first two characters of the user key are 7-bit ASCII and donot contain control characters. Thus they are in the range [20, 7E] hex, i.e. [32, 126] decimal.

Recall that S(1) = K(0)+K(1)+1 and therefore is in the range [65-253].

Also remember that: C(0) = S(S(1) + S(S(1)))

We see that the S(S(1)) term is guaranteed to select an S value that was affected by the IV. So if the first two characters of the user key are 7-bit ASCII, C(0) will always be afffeced by the IV.

1.2 C(1) Analysis

Based on the above analysis, C(1) now appears to be the most vulnerable cipher byte.

Remember that C(1) = S(S(2) + S(S(1)+S(2))) and S(1) = K(0)+K(1)+1

If K(0) and K(1) are in the range [27, 100] decimal, then S(1) must be in the range [55, 201] decimal. This means that S(2) and S(1)+S(2) cannot both be in the range [0, 53] and C(1) is guaranteed to be affected by the IV. The condition that K(0) and K(1) are in the range [27, 100] decimal, which is [1B, 64] hex, is satisfied if the first two characters in the key are digits or uppercase characters. So if the first two characters of the user key are digits or uppercase characters, C(1) will always be afffeced by the IV. Given that the chance of C(1) being unaffected by the IV is 1/539 which is less than the likelihood of any random 8-bit value (1/256), we have not recommended that users restrict the first two characters of their key to digits or uppercase.

1.3 Effect of Varying Key Size

For a 41 byte key (the size that insures 5 mixings), the C(1) unaffected probability is computed as follows. The probability that one of the first 41 S(i) being unaffected by the IV is

(255/256)**(256-41)=.43

Probability of a random S(i) being unaffected is 0.43*41/256 = 0.069.

Probability p(1) of C(1) being unaffected is (0.43*0.069)**2 = 1/1134. This is much less than the 1/256 likelihood of a random 8-bit value.

For a 74 byte key (the size that insures 3 mixings), the probability p(1) that C(1) is unaffected by the IV is as follows.

The probability that one of the first 74 S(i) being unaffected by the IV is:

(255/256)**(256-74) = 0.49

Probability of a random S(i) being unaffected is 0.49*74/256 = .141

Probability of C(1) being unaffected is (0.49*0.141)**2 = 1/209. This is greater than the 1/256 likelihood of a random 8-bit value, justifying our recommendation not to use a key this big.

2. Weak Keys

On September 22, 1995, Andrew Roos posted a paper to sci.crypt.research that presented a class of weak keys in RC4. The conditions for this class are

K(0) + K(1) = 0 mod 256.

These weak keys can leak information about the first byte of the key or cipher stream.

Note that if the K(0) and K(1) are 7-bit ASCII characters at least one of which is not NUL (00), the above condition is never satisfied. Thus the use of printable 7-bit ASCII keys avoids this class of weak keys.

Several people have asked why the IV isn't prepended to the user key. Doing so could permit weak keys to occur.

3. IV Size and Randomness

RC4 is a stream cipher. That means RC4 produces a stream of bits that are xor'ed with the bits in the message. In a stream cipher it is essential that the same key never be used twice.

If Sam the skilled snoop ever encounters two different messages encrypted with the same RC4 key, then Sam can xor their cipher text to eliminate the RC4 cipher bits because xor repeated twice is a null operation. Sam will then have a stream of bits that represent the xor of the plaintext of the two messages. If Sam knows the plaintext of one of the messages, he can easily recover the other. Worse, if the messages are in a natural language such as English, he can usually recover both messages -- even if he knew neither. While revelation of encoded messages is bad news for any cipher, in neither case is Sam able to recover the user key.

CipherSaber addresses the requirement that the same key never be used twice by appending a ten byte quantity, called the initialization vector (IV), to the users key. If the same IV is never used twice then the same RC4 key can never be used twice.

The IV values can be any unique 8-bit quantities. For example, one could use the year, month, day, hour minute second and hundredth of a second. As long as no two messages were sent at exactly the same time, you would be safe.

Another way to get a high likelihood of uniqueness is to use random values for the IV bytes. This also makes encrypted file look like totally random byte strings, which may be useful in some cases.

Since there are 80 bits in the IV, if n messages have been encoded with the same user key, the probability q(n) that two messages will have the same IV is

q(n) = n(n-1)/(2**80)

Even if one generated a billion messages, the probability that two messages have the same IV is about one in a million. There is, however a big "but" as we see in the next section.

3.1 Use of Built-in Random Number Generators to Create the IV.

The "but" is that we are assuming all the IV bytes are random. Creating random values on a computer can be tricky, In particular, the built-in random number generators supplied in most programming environments are notoriously bad.

Most C programming environments offer a 32 bit random number generator. If such a generator is used to create an 80-bit IV there is still at most only 32 bits of randomness. The probability q(n) that two messages will have the same IV is then

q(n) = n(n-1)/(2**32)

If you want q(n) to be less than one in a thousand, you can only send about 2000 messages with the same user key. To achieve even this level of safety, you need to initialize the random number generator with a seed that has 32 bits of randomness.

3.2 Problems with QBasic as a Source of IV Randomness

A more extreme example is Microsoft's QBasic 4.5. This Basic interpreter is shipped ont the CD-ROM that comes with every copy of Windows '95, making it undoubtedly the most widely distributed programming environment in the world. QBasic is very well suited for implementing RC4. It even has a SWAP statement. However, its random number generator, RND, has only 24 bits of state. This means if you want q(n) to be less than one in a hundred, you can only send about 400 messages with the same user key.

Even worse, the standard way to initialize QBasic's random number generator is to put the statement RANDOMIZE TIMER at the beginning of the program. TIMER is a built in function that returns a floating point value containing the number of seconds since midnight. TIMER's resolution appears to be 1/100 of a second. but the operating system only supports resolution of 55 milliseconds. Assuming an eight hour variability in time of encryption, TIMER only supplies about 19 bits of randomness. This means if you want q(n) to be less than one in a hundred, you can only send about 72 messages with the same user key.

3.3 Better Ways to Create the IV in QBasic

Here are some methods for generating the CipherSaber IV in QBasic.

The first is to xor the RND values with the characters returned by the DATE$ function:

     randomize timer

     for i = 1 to 10

          iv$(i)=chr$(asc(mid$(date$,i,1)) xor 256*rnd)

     next  i

This method sharply reduces the likelihood that two messages will have the same IV. However, while the encrypted file will look random to a casual observer, an expert could detect that the IV was not really random and might even be able to extract the date of the message. In most cases this is not a concern since the date could be determined by the message header or file directory block.

Here is a method that comes closer to creating a truly random IV. It captures key stroke times while the user key (k$) is entered.

     get: c$=inkey$

     if c$ = "" then goto get

     if c$ = CR$ then go to done

     i=i+1

     j=(i mod 10) + 1

     k$(i)=c$

     randomize timer

     iv$(j)=chr$((256*rnd + asc(iv$(j)) mod 256)

     goto get

 

Even with DOS's low timing resolution, one can expect an 8 tick variation per character entered, producing 3 bits of entropy. With a 15 character user key, this would produce about 45 bits of entropy. The 18 to 20 bits of entropy represented by the first randomize timer call would bring the IV close to 65 bits of entropy. Given that users tend to enter random keys slowly, 4 bit of entropy per character is more likely, bringing us close to a full 80 bits of randomness.

One disadvantage of the key stroke timing method is that you must force the user to enter their key afresh for every message.

4. CipherSaber-2

To solve the problems caused by the single round of key setup in RC4, I was originally going to include a CipherSaber-2 in my proposal. CipherSaber-2 has a parameter N that specifies the number of times the RC4 key setup loop is repeated. For N=1, CipherSaber-2 would just be the same as CipherSaber-1. The value of N could be agreed upon by both users and kept secret along with the key. Indeed, N can be considered part of the key. I see three advantages over RC4:

1. For n >= 2 it solves the mixing problem. User keys can be up to 246 bytes without any problems.

2. It would coax users to have longer keys. A big weakness in cypto-for-the-masses is that people just will not pick long enough keys. Having a second "bucket" helps from a human factors perspective (sort of like area code and 7-digit phone number vs. a straight 10-digit phone number)

3. Large values of N makes exhaustive key search much slower. A thousand-fold increase in the time needed to test a key is equivalent to adding 10 bits of entropy to the key. A CipherSaber-2 written in C and running on a Pentium could probably stand a five or six digit N without an unacceptable delay.

[Note: It has been pointed out to me that an attacker can easily try successive values of N, so the actual benefit of keeping N secret is minimal. It might be better for each user to determine the largest value of N that is tollerable on their computer and to agree to use the smaller of the two acceptable N values when communicating with another CipherSaber user. agr 9/16/98]

I did not include CipherSaber-2 in the initial announcement because I wanted to keep the proposal simple and because I wanted to use a cipher that was straight out of a textbook. Also it is foolhardy to introduce a new cipher without a time for other people to review it. [However recent attacks on RC4 have led me to recommend CipherSaber-2, at least until a consensus develops on how best to fix RC4's key setup. -- added 2001-7-27 agr]

5. Recommendations

To use CipherSaber securely, we recommend that users:


I want to thank Hal Finney for his valuable comments on a much earlier draft. Additional comments on this paper in general and on CipherSaber-2 in particular are welcome.

Arnold G. Reinhold

March 19, 1998 Rev f. HTML version, Basic and PC timer correction and minor edits April 22, 1998, May 13, 1998. CipherSaber-2 correction Sept. 16, 1998, CipherSaber-2 recomendation July 27, 2001

Return to CipherSaber Home Page