Meltdown and Spectre

Meltdown and Spectre exploit critical vulnerabilities in modern processors. These hardware vulnerabilities allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit Meltdown and Spectre to get hold of secrets stored in the memory of other running programs. This might include your passwords stored in a password manager or browser, your personal photos, emails, instant messages and even business-critical documents.

Meltdown and Spectre work on personal computers, mobile devices, and in the cloud. Depending on the cloud provider’s infrastructure, it might be possible to steal data from other customers.

Meltdown

Meltdown breaks the most fundamental isolation between user applications and the operating system. This attack allows a program to access the memory, and thus also the secrets, of other programs and the operating system.

If your computer has a vulnerable processor and runs an unpatched operating system, it is not safe to work with sensitive information without the chance of leaking the information. This applies both to personal computers as well as cloud infrastructure. Luckily, there are software patches against Meltdown.

Spectre

Spectre breaks the isolation between different applications. It allows an attacker to trick error-free programs, which follow best practices, into leaking their secrets. In fact, the safety checks of said best practices actually increase the attack surface and may make applications more susceptible to Spectre

Spectre is harder to exploit than Meltdown, but it is also harder to mitigate. However, it is possible to prevent specific known exploits based on Spectre through software patches.

Out of Order / Speculative Execution

Modern CPUs do out-of-order execution whenever they see a branch (if/switch etc). They will typically execute code for multiple branches while the conditional is evaluated. So

if (a+b*c == d) {
  // first branch
}
else {
  // second branch
}

will involve both the conditions running simultaneously while the condition is evaluated. Once the CPU has the answer (say “true”), it scraps the work from the second branch and commits the first branch. The instructions that are executed out-of-order are called “transient instructions” till they are committed.

The Bug

The code in both the branches can do a lot of things. The assumption is that all of these things will be rolled back once a branch is picked. The attack is possible because cache-state is something that does not seem to be rolled back. This is the crux behind both Meltdown and Spectre attacks.

Meltdown specifically works because “any-random-memory-access” seems to work while in a transient instruction. This attack allows a program to access the memory, and thus also the secrets, of other programs and the operating system.

CPU Cache?

Reading data from RAM is slow when you are a CPU. CPU cache times are in the order of 1-10ns, while RAM access takes >100ns. Almost any memory read/write is placed in the cache: The cache is a mirror image of memory activity on the computer.

Cache Timing?

Let us say I have this piece of code:

$secrets = ["secret1", "secret2", "secret3", "secret4", "realSecret"];
$realSecret = $secrets[4];

This loads the real secret in memory. An attacker then does the following:

  1. Clear the CPU cache
  2. Runs the above program
  3. Try to access the specific memory address

The above access results in an error, and raises an exception. However, the attacker knows that the secret is in one of the 5 possible locations. Since only one of these is ever read by the actual program, it can repeatedly run the program and time the exception to figure out which one of the locations was being read. The one which is being read is cached, and the exception will be raised much faster as a result.

Cache Timing attacks are the building blocks of Meltdown, which uses them as a side channel to leak data.

Now that we’ve explained cache-timing attacks (which can tell you “what-memory” is being used by another program), we can get back to Meltdown. Meltdown happens because:

  • CPUs do not rollback CPU-cache after speculative execution, and
  • You can manipulate the cache in those transient instructions to create a “side-channel” and
  • Intel CPUs allow you to read memory from other processes while in a transient instruction.

Meltdown consists of 3 steps:

Step 1. The content of an attacker-chosen memory location, which is inaccessible to the attacker, is loaded into a register.

Step 2. A transient instruction accesses a cache line based on the secret content of the register.

Step 3. The attacker uses Flush+Reload to determine the accessed cache line and hence the secret stored at the chosen memory location.

In code:

c = *kernel_memory_address;
b = probe[c];

There are several caveats:

Exception Suppressing

If you try to actually read kernel-space memory directly, your program will crash. Meltdown works around this by making sure that the memory is only read in transient instructions that will be rolled back.

So you wrap the above code with:

if (check_function()) {
    meltdown();
}

And make sure that check_function always returns false. What happens is that the CPU starts running the code inside meltdown function before it has the result from the check.

Cache Lines

CPU cache are broken down into several cache-lines. Think of them as lookup hashes for your CPU cache. Instead of accessing single-byte (probe[c]), meltdown multiples the memory addresses by 4096 to make sure that the code accessess a specific cache line. So more like:

b = probe[c * 4096];

If you’re wondering why we are doing a read instead of just printing c, or maybe copying it to another place, it is because CPU designers considered that, and rollback those instructions correctly, so any writes cannot be used to exfiltrate the data from a transient instruction.

Zeroes

Sometimes, the exception is raised before the code executes, and the value of c is set to 0 as part of the rollback. This makes the attack unreliable. So, the attack decides to ignore zero-value-reads and only prime the cache if it reads a non-zero value. Thus the whole code becomes

if (check_function()) {
  label retry:
  c = *kernel_memory_address;
  if (c != 0)
    b = probe[c * 4096];
  else
    goto retry;
}

The similar assembly code (from the paper) is:

; rcx = kernel address
; rbx = probe array
retry:
mov al, byte [rcx] ; try to read rcx
shl rax, 0xc ; multiply the read value with 4096 by shifting left 12(0xc) bits
jz retry ; retry if the above is zero
mov rbx, qword [rbx + rax] ; read specific entry in rbx

The special condition where c actually is zero is handled in the cache-timing where we notice no memory address has been cached and decide it was a zero.