Blog03: BFTP — Spotify full acount takeover

INSEC ENSIAS Club
6 min readNov 19, 2022

Hi Folks !

before we get started here is some basic stuff that will helps you understand this attack.

character encoding:

is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers(this is a wikipedia definition).

for a better understanding lets take a look at how PHP encode characters :

$byte_array = unpack('C*', 'hello');
var_dump($byte_array);
=============================
output:
[ 104, 101, 108, 108, 111 ]

so for the string hello the encoding is :

h=104, e=101,l=108,o=111

Types of Encoding Techniques:

  • HTML Encoding.
  • URL Encoding.
  • Unicode Encoding.
  • Base64 Encoding.
  • Hex Encoding.
  • ASCII Encoding.

but we are interested in Unicode encoding which is a character encoding standard created to enable people around the world to use computers in any language. It supports all the world’s writing systems.

i’m not going to cover the Unicode, but if you want to dig deeper in the
argument, character sets and related topics, the following link is a great starting point.

If you want to play a bit with Unicode characters, you can visit Unicode utilities to get information about a character or search confusable characters.
In addition to this tool, there are other interesting resources such as codepoints.net, txtn.us and Unicode Text Converter.

canonicalization:

The term ‘canonicalization’ refers to the practice of transforming the essential data to its simplest canonical form during communication. For instance, the name Aryan can be represented in more than one way including ArYan, Ar%79an (here, %79 refers the ASCII value of letter y in hex form), etc. A most prevailing method for evading input validation and output encoding controls is to encode the input before it is sent to the application for further processing in a manner to fulfills the hacker objective. So, overall canonicalization is the representation of something in the least ambiguous and most direct way. But hackers often utilize it an offensive way, and it becomes a bug that occurs when an application makes erroneous assessments based on a non-canonical representation of a name or when data is transformed from one form to another, it’s often possible to bypass checks

Today i’m gonna take u back to 2013 when a user(hacker) posted on the spotify support forum that he managed to hijack user accounts. To make sure that he isn’t lying the forum manager challenged him to take over his account and BOOM the manager’s account had a new playlist added and a new password.

The honest reaction of the manager and the whole security team:

A bunch of the team dropped whatever they were working on and scurried to try to understand what was going wrong and how to fix it. From the forum post they knew that because spotify allowed using unicode in nicknames, taking over an account went something like this:

  1. Find a user account to hijack. For the sake of this example let us hijack the account belonging to user bigbird.
  2. Create a new spotify account with username ᴮᴵᴳᴮᴵᴿᴰ (in python this is the string u’\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30′).
  3. Send a request for a password reset for your new account.
  4. A password reset link is sent to the email you registered for your new account. Use it to change the password.
  5. Now, instead of logging in to account with username ᴮᴵᴳᴮᴵᴿᴰ, try logging in to account with username bigbird with the new password.
  6. Success! Mission accomplished.

and from the log lines associated with the account takeover of the forum manager’s account it appeared to be a problem with how they derived a canonical username from the username the user chooses at registration.

Forbidden characters in usernames:

If you allow your users to pick their usernames too freely so technically you trust their input they will shoot you for sure. For instance, it is probably good to

  • not allow white space in usernames,
  • treat “BigBird” and “bigbird” as the same username.

The first is an example of forbidding certain characters in usernames and the second is to treat some characters (‘B’ and ‘b’) as equivalent. The latter is often implemented by canonicalizing the username. If they only allow the letters a-z and A-Z then they could canonicalize a username by mapping all characters to lower case:

canonical_username = username.lower()  # in python

So ‘BigBird’, ‘Bigbird’ and ‘bigbird’ would all be mapped to ‘bigbird’. They refer to ‘BigBird’ as the verbatim username and the remapped ‘bigbird’ as the canonical username. When an account is created the canonical username needs to be unused, so if one user enters ‘BigBird’ and another enters ‘bigbird’, only one of them will be allowed to create the account.

Lower casing has the key property of being idempotent, i.e., that applying it more than once has no effect: x.lower() == x.lower().lower(). So if a username gets passed from service to service and you want to make sure it is in canonical form you can safely apply .lower() and if it was already in canonical form there is no harm done, and it is easy to stay safe.

when H is not the same as Ꮋ:

If you allow non-ascii characters this becomes even morecritical, since lots of different characters look very similar(this is the cause of homoglyph attacks we will discuss it later). For example it is hard to see the difference between Ꮋ(u+13bb) and H(u+ff28) even though one is obviously a Cherokee Letter Mi and the other is a Fullwidth Latin Capital Letter H and in unicode they indeed have different code points. Treating two so similar looking characters as different when used in usernames is likely to cause problems and confusion, so they distinguish between verbatim usernames and canonical usernames. While the Cherokee Letter and Latin Capital Letter characters are different when used in verbatim usernames they are mapped to the same character in canonical usernames. Just simple lower casing will not be enough, obviously.

XMPP’s nodeprep canonicalization method:

Instead of implementing their own canonicalization methode, spotify used at the time the python framework twisted. the code they used was more or less:

from twisted.words.protocols.jabber.xmpp_stringprep import nodeprep
def canonical_username(name):
return nodeprep.prepare(name)

in the XMPP nodeprep specification it clearly says there that it is supposed to be idempotent and handles unicode names.

nodeprep.prepare wasn’t idempotent 😳:

Let’s see what happens when they tried ᴮᴵᴳᴮᴵᴿᴰ

>>> canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30')
u'BIGBIRD'
>>> canonical_username(canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30'))
u'bigbird'

Not so good since the function apparently was not idempotent, but at least it provided insight into why the attack worked.

  1. When you registered an account, canonical_username got applied once
  2. an account with canonical username ‘BIGBIRD’ got registered which was allowed since it did not collide with the existing account with canonical username ‘bigbird’.
  3. When resetting the password for ‘ᴮᴵᴳᴮᴵᴿᴰ’ canonical_username was applied once, so the email to send the password reset to got sent to the address associated with the newly created account with canonical username ‘BIGBIRD’
  4. However, when the link was used, canonical_username was once again applied, yielding ‘bigbird’ so that the new password was instead set for the ‘bigbird’ account

The final fix:

they reported the problem to the twisted developers, and they wrote a small wrapper function around nodeprep.prepare that basically calls the old prepare function twice and rejects a name if: old_prepare(old_prepare(name)) != old_prepare(name).

What then remained was some cleanup. Find identify handfull of compromised accounts, which due to the nature of the bug was actually easy. they just needed to find the accounts with incorrect canonical usernames and from them they could find the corresponding, hijacked, accounts.

Some take-aways :

  1. always validate,canonicalizate and sanitize user input
  2. When users expose vulnerabilities, avoid antagonizing them if possible. They can probably provide valuable help on how to reproduce and perhaps even how to fix the issue.
  3. encourage those users with some real bounties not 3 months free membership 😜
  4. Normally, upgrading is a good way to get rid of bugs and security holes, but every once in awhile an upgrade packs a wallop.

Links:

Facebook: INSEC Ensias

Instagram: INSEC Ensias

Linkedin: INSEC Ensias

Youtube: INSEC Club

Don’t forget to drop us a follow on social media to stay up to date with everything the club is doing. Looking forward to sharing more knowledge with all the readers and we welcome your feedback at insecblog@gmail.com

Writer: F3nn3C

Editors: akna, berradAtay

--

--