I forgot my password!
(Now what?)

The Blue Moon Authentication System

There are many approaches to deal with forgotten passwords. All rely either on proving access to some resource (such as a pre-registered email account) or on the long-term memory of the person who needs to restore access to his or her account. Most approaches are not very secure, and many are hard for legitimate users to manage. To make it worse, many approaches are unsuitable for input-constrained devices, such as mobile phones. It is well known in the cognitive science literature that personal preferences are more stable than long-term memory. A system based on personal preferences is also less vulnerable to data-mining attacks than one that relies on more traditional facts (such as mothers' maiden names or childhood addresses). We propose a system that is secure and practical: It takes less than thirty seconds to authenticate (whether on a computer or a handheld), and has a false negative rate of close to 0% and a false positive rate of less than 1%. For many environments, Blue Moon Authentication may very well be the best approach there is.

How do we do this? To learn more, read on, or watch this video.

Problems with existing approaches

Vulnerability to data-mining.

Researchers have shown that mothers' maiden names can be derived from publicly available records. The White Pages will return old addresses. Public mortgage information shows where you have owned property, and social networks can be mined for information about names of your friends. This makes many currently used password reset questions vulnerable to attack.

Vulnerability to guessing.

Does anybody know the name of your first pet? Maybe not, but a surprisingly large number of people select a very common pet name. Lists of these are commonly available. What is your brother's first name? The census will reveal what names are most common. What is your favorite color? There are not that many common answers. What was the brand of your first car? How many common brands are there? These questions are not safe individually. Many would have to be used simultaneously in order to render the odds of successful guessing by an attacker appropriately small.

Vulnerability to cloning.
If a bad site uses the same questions as a good site does, then the bad site will learn the answers, and all that is left for the bad site is to figure out your user name on the good site. That is often relatively easy.
Difficulty of remembering or entering.
What was the name of your kindergarten teacher? Who is your sisters best friend? (Jim or James? What if you do not have a sister?) Did you live on "Garden Street" or "Garden St"? Or did you enter "Garden St." (with a period) when you set up your account?
Cost.
It has been estimated that the average cost of a password reset involving a help-desk call is $22. That makes online and automated password resets a necessity in many situations.

Our approach

Setup. During setup, you select some things that you like and some things that you dislike, both from a long list of available topics. Since the topics are presented in random order, you are not even very likely to make the same two selections if you register at two sites--one good and one bad. This is how it looks:

Authentication. To prove your identity, you give your user name and answer whether you like or dislike a list of things. These things are the same as you registered during setup, of course, but in a random order. How can the attacker know if you like garage sales or not, and whether you like reggae music? But you do not need to get all answers right (although most people do!). It is enough that you get a large portion right--large enough that it is unlikely to be guessed by an attacker.

Determining the error rates. With 15 questions, we can set the acceptance threshold so that the probability that a legitimate user is not let in is close to 0% (false negative rate), and the probability that an attacker manages to get access to a victim account is less than 1% (false positive rate). This assumes that the attacker does not know any personal information about the victim, but that he knows the answer distributions for the entire population. If he does not, his chances of success go down; if he knows something about the victim, the probability goes up. (The typical phisher does not know anything a priori about a potential victim, but sophisticated phishers may sometimes know or approximate distributions.)

How did we determine this? We performed a series of experiments and simulations.

First, we performed an experiment in which 400+ users responded to close to 200 questions. From this, we could compute the entropies of the answers for these questions. We removed bad questions; those are the questions where the answers had low entropies, i.e., were rather predictable. For example, "Do you like to watch movies?" is a bad question, since most respondents answered "yes." We kept around 120 questions that we called "good." Then we performed a second experiment in which approximately 100 users performed the setup step, and then--after two weeks--attempted to perform the authentication step. We could determine the false negative rates as a function of the threshold for acceptance. We found that by requiring approximately 70% of the full score for successful authentication, everybody succeeded. That is the threshold we used. Next, we attempted to determine the false positive rate for this threshold. We constructed an adversary that was given the answer distributions from the first experiment, and using these, tried to answer the questions for all the 100 users in the second experiment. The adversary succeeded for one account, i.e., an approximate 1% false positive rate. However, this does not give any meaningful statistical significance, and we performed a third experiment to address that problem.

In the third experiment, we synthesized more than 49000 users by selecting questions and answers according to the distributions observed in the first experiment. We then computed the false positive rate for these synthetic users in the same way as in the second experiment. Remember: our adversary was not given any specific account information, but only the general overall answer distributions. This way, we could confirm the previous false positive rate, and compute a significance interval. The upper bound of the significance interval is 1%, with a certainty of 2.5%. By emulating more users, one can tighten the significance interval.

We have also performed experiments where users were asked to impersonate acquaintances, close friends and family members (who had previously performed the setup). Not surprisingly, we found that the more a person knows about a victim, the easier it is to impersonate her. While these experiments used too small number of subjects to obtain any meaningful statistical significance, we perceive false positive rates around 10% for acquaintance, and sometimes above 50% for family members. This can be drastically reduced by also requiring access to some resource---such as an email account, cookies, etc. While this might be easily obtained for determined phishers, they do not have personal information, and conversely, an attacker close to the victim will often not have the technical means to attack email accounts, steal cookies, etc. (Also, we must remember that existing password reset techniques would not fare well against such an attacker either.)

Who are we?   This is the team involved in the development of this technology, listed in the order of joining the project:

M. Jakobsson, L. Yang, and S. Wetzel. "Quantifying the Security of Preference-Based Authentication." DIM '08. Click here for paper.

M. Jakobsson, E. Stolterman, S. Wetzel, and L. Yang. "Love and Authentication." In Proceedings of ACM Human/Computer Interaction Conference (CHI), 2008. Click here for paper.

 

Disclaimer: Some of our detailed findings are described in our recent whitepaper. Note, though that this does not report on the current, and improved, user interface, nor does it report our most recent findings regarding error rates. The estimates above will be detailed in the forthcoming publication. Please contact us if you have any questions, or would like to learn how your organization can use this technology. Patents pending; technology owned by RavenWhite Inc.

Try our demo
Contact us
Download whitepaper
About RavenWhite

Patents pending on listed technologies. Copyright 2007, 2008, 2009, Ravenwhite Inc. All rights reserved.