Credential Stuffing Attacks and Protection for web frameworks (examples with Django)

Hi! As web developers we have to be mindful about a lot of attack types and follow best practices to keep our users and data safe. My focus point during my master’s studies was network security, and I’m passionate about this topic and follow up recent developments in this area. In my professional life, I’m managing the backend of a website with many millions of users, this puts a lot of pressure on our shoulders to do our best.

enter image description here

Recently we have come across credential stuffing attacks. Credential stuffing attacks are badass and gets you thinking, whose fault is that? Who should be responsible? Before we dive into it, let me explain what credential stuffing attack is:

Credential stuffing is a type of cyberattack where stolen account credentials typically consisting of lists of usernames and/or email addresses and the corresponding passwords (often from a data breach) are used to gain unauthorized access to user accounts through large-scale automated login requests directed against a web application

Password Reusing..

So here we are talking about attackers knowing users passwords because they were breached in another website or system, and users were password-reusing, which is a horrible horrible practice. This is NOT about brute-forcing, this is NOT about easy/hard passwords. A user’s password might be the highest entropy password of the world, but if it has leaked in one system and user has used the same strong password in your system, now you are part of the problem.

The impact of the attack varies by the nature of your website, for example if you are a banking application it’s a huge almost national problem whereas if you are a note-taking application without any user-to-user interaction, the attack is contained to those users at fault and impact is minimal.

Nevertheless if an attacker controls a couple of thousand users on your website, they can always get very creative and they will find ways to disrupt, spam and/or advertise on your website. At best possible case, it will skew your analytics and daily active users. If we all agree it’s a bad thing, let’s see how we can solve this..

Rating the solutions

I want you to keep in mind the following attributes for each solution..

A) Effectiveness: How much of the attacks will they prevent? (scale 0-100%) B) Friction: How much of a hassle will it bring to users? (scale 0-100%: 0% means it’s completely transparent to users they won’t notice and 100% means it’s so cumbersome your conversion rate will go down to 0) C) Retroactive Applicability: If you take an action now, will it work for all existing users? D) Implementation Cost: If this is easy to implement, measured in man-hours + software API costs if occurs.

If you are a new company and just building out your website, C doesn’t matter at all. For most of us C is very important since bulk of our database has been created older than 1 year ago. (unless you are a hyper-growth startup, then you can ignore existing users and focus on new ones)

You can combine 2 solutions to get A, B from one and C from one, like a simple solution is deleting all your existing users and database, then you don’t have the C-problem anymore!

Solutions

0. The dummy solution

Deleting all your database.

Effectiveness is 100% no attackers can login. Friction is 100% no regular user can use your website. Retroactive Applicability is available, you can delete old users as well. Implementation cost is almost 0, maybe 1 man-hour.

This is why “we must block all the attacks” is a stupid requirement, but we know better for sure that’s why there are these additional metrics especially the friction.

1. The easy solution! - Two Factor Authentication (2FA)

This is a very simple solution, everyone who signs up to your website should setup a 2FA besides their email and password. There are several solutions for this category, for example you can send an 8-digit SMS code on signup/login, or you can ask them to setup google authenticator.

The important thing is you have to force this, this attack is not about protecting responsible users but it’s protecting your website and community from careless password-reusing users. If you offer this optional it will have 0 impact, ZERO.

This will solve this attack vector 99.99%, it’s very hard (impossible?) to leak your Google Authenticator.

What’s the downside with such a great solution that solves problem almost immediately and for good?

There are two big downsides, first one is you can’t enable this for existing users, if you enable how are you going to make sure it’s the authentic user setting up the 2FA and not the attacker, you can’t differentiate between the two, there is no means you can differentiate, that’s the root problem. And bad thing is this problem is more prominent for inactive users.. So even if you solve it for new users, it won’t matter much until 2 years later. Most of your database have most probably created accounts before 1 year. If you are a startup just starting up, feel free to implement this and never face any credential stuffing attacks.

Second downside is that, it’s a hassle to setup 2FA for users. If you force this most users will leave and your conversion rate will go down by 50-90%. The gained safety might worth this if you are a very serious application or a bank, if you are a social app it probably won’t worth it.

2. The solution that doesn’t work - IP Rate-Limiting

One possible solution could’ve been limiting login attempts per IP, you can say an IP can only try 5 login attempts on a day. Why this doesn’t work? Because such attacks are orchestrated by bot-nets, so millions of IPs. If you allow them 5 accounts each, you are already in trouble. Another downside is there are many valid usecases where a single IP brings a lot of traffic/users to your website, it might be a university campus or a company office or your company’s office, etc. So while this solution does not work it also hinders valid use-cases.

3. The great tech (Facebook) Solution

If you have a huge team, do what Facebook does. In case you detect an unusual login, based on the unusualness level, ask for a login challenge. In case you are not familiar with this, if you login to Facebook from a new browser / IP or location, it pops up a quiz for you. In this quiz you have to name your friends from existing pictures of them on the platform. This kind of proves you are you. This is a simple task for you but a heroic challenge to anyone outside of your friend circle. A brilliant solution indeed.

If the unusualness is less dominant, you might just get a simple warning in your email that someone logged in to your account from a new location / device.

Of course the challenge is coming up with the unusualness function that takes in the login attempt as parameter and gives out the possibility login attempt has been done by an attacker between 0 and 1. Having this function running reliably means you need vast information about the users and their behavior. I don’t expect any middle-sized website to do that.

Once you have this function, you can say between 0-25% no further action, 25-50% email warning about new login, 50-75% present friend-naming challenge, 75-100% block the login attempt completely.

from .machinelearning import is_request_attacker
def login_wrapper(request):
    attacker_probability = is_request_attacker(request)
    if attacker_probability <= 0.25:
        return login(request)
    elif attacker_probability <= 0.50:
        email_warning_about_request(request)
        return login(request)
    elif attacker_probability <= 0.75:
        return present_friend_challenge(request)
    raise AttackerDetected()

I’m skipping this solution since it’s very hard to get that much data (not sure even if it’s legal) but if you can do that, go ahead by any means.

4. Going forward solution - https://haveibeenpwned.com/

I love this solution and it should be in the books of web development 101, right after the “Never store plain-text passwords” section. The solution is not to accept any password that was ever leaked and can be found online. That’s not an easy check, but thank god Troy Hunt went through the hassle to unify all the leaks so we can just query by password. This is an achievement of humanity, collective data and internet.

The solution is during the signup you should query the pwned API by the password your user provides and if has been pwned, reject the password and ask user to choose another one. This should be as simple as your password strength system, but instead of calculating locally you query an external API.

After reading this, there might be 2 questions in your mind (hopefully). First is, isn’t querying an external API with user’s password a security issue. Second is why can’t this work for existing users.

I hope the first question comes to your mind as a responsible developer, yes it WOULD be a security threat but pwned API allows (actually forces) us to query by a k-anonymity model. Additionally we only query by the 5-character prefix of the SHA-1 hash of the password. So there’s really nothing leaking or leaving the system.

Answer to why this can only work going forward and not existing users is because again as responsible web devs, we don’t keep users passwords in plain-text. So we cannot know what their passwords are, they are hashed many times with a salt. Even the simplest and hardest passwords are indistinguishable in the database.

Python implementation:

def check_if_password_is_safe(password):
    password_hash = hashlib.sha1(password.encode()).hexdigest().upper()

    hash_prefix = password_hash[:5]
    hash_suffix = password_hash[5:]

    try:
        r = requests.get(
            'https://api.pwnedpasswords.com/range/{}'.format(hash_prefix),
            headers={'User-Agent': 'YourWebsiteName'},
            timeout=5)
    except requests.exceptions.Timeout:
        logger.exception('PwnedPasswordsTimeout')
        # return true if external API is down instead of blocking signups
        return True

    try:
        r.raise_for_status()
    except requests.exceptions.RequestException:
        logger.exception('PwnedPasswordsException')
        # return true if external API is broken instead of blocking signups
        return True

    if hash_suffix in r.text.upper():
        # password hash has been found as leaked.. not safe
        capture_message('UnsafePasswordFound')
        return False

    return True

So we send pwned API a SHA-1 hash prefix and if there are matches it returns us all the suffixes. We search our suffix in the response, if it’s there, it’s leaked, if not the password is safe. Pwned API never knows what we actually search for, just the prefix. https://haveibeenpwned.com/API/v3#SearchingPwnedPasswordsByRange

5. Deactivate Dormant Accounts with Pseudo 2FA - My Optimal Solution

As I mentioned if you are running an established website, the chances are the most of your database has been filled older than 1 year ago and most of your users are inactive or dormant, as I call it here. This segment is most susceptible ones to credential stuffing attacks. So my solution was to mark each user that hasn’t been on the website for 6 months as “dormant”, and apply special logic on login. This way the solution is completely transparent to the active and new users, while satisfying a pretty high A (Effectiveness). It’s very unlikely that a user suddenly comes back to the website after 6 months of total silence. But of course it can happen if you have millions of users. That’s why deleting old accounts is not the best solution, we have to do something a bit more complex.

After marking such accounts dormant, we can focus on the next step, what happens if a dormant user is trying to authenticate? We basically explain them their account has been put on hold for their security and if they wish to proceed with the login they should click on a link we send them via their email.

There are 2 big reasons why this works. First reason is that attackers bots work with your website’s API, they are unlikely logging in to email accounts as well and doing some SMTP checks to click on incoming links. And second is even if users have the same password to their email, (which attacker knows both!) email providers security inhibits them from easily logging in if you think about gmail or hotmail, they have the extra security measures to prevent unauthorized access.

The reason this is called pseudo 2FA is because email link is acting like a two factor authentication, but it’s not really a full fledged 2FA system.

Conclusion

My conclusion is that I’m glad to have met this problem and some possible solutions. This should make the internet a safer place for sure. Even if you don’t use the solutions I provide, it’s a good practice to think about the attack vectors and the UX. Maybe we together will come up with a new standard that will solve the credential stuffing attacks for good.

https://i.imgur.com/M6ZGTKC.png

Every company and website is different, so I can’t tell you what to use but I can tell you what we have used. We implemented #4 (haveibeenpwned) and #5 (dormant accounts) together. #4 allows us to solve this problem for good for future users with tiny inconvenience to new signups. #5 allows us to solve bulk of the problem while causing 0! inconvenience to the active users. Of course effectiveness of #5 is only as high as (1 - active user / total user), so if you have 10 million users and 500k active #5 will be 1 - 0.5/10 = 0.95 => 95% efficient. This is a very good ratio.

In the second part of the series, I will present some more solutions and open source python libraries to implement #4 and #5 if this topic picks up.


champion

Congratulations! Keep securing like a champion, and please follow me on @EralpBayraktar :)