email hashes in PGP keys as protection against spam

Replacing the email addresses in public PGP keys on key servers by their hash value shall guard against the abuse of key servers by spammers.

initial position – the problem – ToC

The usage of asymmetric cryptography requires the availability of the public key. Key servers are a good solution for that. The problem of this approach is the possibility to get more or less all email addresses from the key server by chained queries.

Sometime in the future cryptography may be a very effective protection against spam but it does not seem useful to help the spammers in the meantime by the PGP infrastructure. Today this is not a problem yet as hardly anybody uses PGP and thus it would be too much effort for the spammers to abuse key servers. But this can change.

problem awareness

aim – ToC

The suggestion is to create a PKI which does not reveal email addresses with very small changes to today's technics.

technichal implementation – ToC

requirements

implementation

The task is to remove the email addresses from the keys on the servers (or rather those to be uploaded) but still to find these keys when somebody searches for them by one of their email addresses.

The obvious approach is to duplicate all UIDs and to write only a hash value of the address into the duplicates. This requires a normalization of the notation. Strictly speaking SMTP distinguishes between upper and lower case but in practice this does not matter. This theoretical problem can be solved by querying the non-normalized notation first. A useful normalization would be lowercase UTF-8.

The necessary changes can be done at several points:

gpg

gpg could be changed so that always two identities are created, the normal one and one which consists of a hash value of the normalized email address only. The safest solution would be to erase all non-hash IDs by default when exporting the public key. This may cause problems as the email clients are not yet adapted to look for hash values. This problem is not serious though because most client software can be configured to use a certain key for a given email address. It can be further reduced by having gpg store not only the key but also the email address (and perhaps the name, too). This information could be stored seperately from the key (like the trust information today). If other software queries the key ring then gpg could return both the information from the key ring and from the additional store. If gpg has fetched the key from the server then it knows at least the email address. If the mail client software knows the name belongig to this address, too, then it should be able to pass this name along with the normal gpg call so that it gets written to the additional database along with the email address. Furthermore there should be an easy possibility for other software (mainly email clients) to synchronize names and addresses so that names which get known after the key import get added to the additional database automatically.

GPG GUIs

If gpg itself is not changed in the described way then it would still be possible to create the hashed ID within a key administration software (that calls gpg). Exporting hash IDs only would be possible by extending the original key by the hashed IDs, copying the key, removing the original IDs from the copy and then uploading the rest to the key server.

key server software

It seems that the key server software does not have to be modified. It could, of course, accelerate the change to this new system by accepting keys with hadhed UIDs only. But this would not happen. Everyone has to decide on his own whether he wants to get spam mails or not.

possible problems

hash collisions

Possible collisions due to bad hash functions are not a problem. It is not enough to find an email address with the same hash value. You could easily generate a second key for a given address (belonging to someone else) and upload it. Before a public key is used its authenticity has to be checked. How shall an attacker get signatures from trustworthy people for his colliding email address? They would have to sign a key for a wrong name or one without a name and in both cases with a nice email address like cWhTtnVXAD7ZfHEw@mafia.ru.

Of course, one could minimize this theoretical problem my excluding endangered hash functions for this system (MD5, SHA-1).

market opportunity – ToC

advantages of the proposal and their importance

disadvantages of the proposal

The only obvious disadvantage is that you cannot see the name and email directly from a key. But this one is not relevant. For this problem to occur your would have to receive a public key which someone else has received from a key server without passing the UID to you. And it would have to be the key for an email address you don't have as a synchronization between the address book and the key ring would deliver the UID conveniently. Why should one receive such a key? If such a singular case occurs then you can ask the signers of the key. If none of them is known then the signatures and thus the key, too, are useless anyway.

distribution

In the Linux community the question of distribution is noncritical: Everybody uses gpg and most users receive regular updates for their software. When the problem which this porposal shall guard against will arise in two or three years (it will be rather more) then 90% of the users will have an adapted version of gpg.

Software updates are less well organized in the Windows (especially among private users) world but the users of PGP can be expected to be rather competent so they will probably not use a security software for years without installing updates. And most of them would get the necessary update when they get in touch with a hash UID for the first time.

email hashes in PGP keys as protection against spam

contents

summary – ToC