Cloudlinux Intel CPU Bug - Meltdown and Spectre - KernelCare and CloudLinux

I

Igor Seletskiy

Guest
b2ap3_medium_intel-bug.png

Update [Jan 4, 2018 7:37am PT]

Some setback on CloudLinux 7 kernels. We need to make some changes and restart the build/test cycles. Another ~3 hours.

Update [Jan 4, 2018 4:50am PT]

Good News: We have CloudLinux 7 and CloudLinux 7 hybrid kernels being built right now. We hope to have them for you in the next 3 hours. CloudLinux 6 kernels will follow shortly after that.

Bad News: We will not have KernelCare patches until Saturday, and even that is considered optimistic at best. We will also have to release three sets of patches, and it might take us a week to cover it all.

The first set of patches should cover KPTI patchset that fixes a meltdown attack (that is the one we optimistically plan for Saturday). There are multiple complexities with the patch, one of which is to change how addressing works for existing processes, and the other one is how to deal with unpatch (and changing addressing again). This code would have to be written from scratch, as this condition doesn't happen.

The second patch will be focused on speculation control that wasn't part of mainstream, but part of RHEL kernel, and tries to address one of the Spektre attack.

The third patchset will try to load microcode used to protect against second Spectre attack on the fly. Typically microcode is loaded on OS boot, but we will try to safely apply it using KernelCare.

Why is it so bad: There is a chance that we don't understand "everything" yet, and there is something that will prevent us from delivering this patches altogether, or will delay them even more. We are trying to hot patch six months of work of a fairly big group of kernel developers in a very short time and will be working non stop as long as we can. Yet, this is really complex problems and we foresee that we will hit a few brick walls that we would have to smash before we will have patches.

Update [Jan 3, 2018 8:03pm PT]

There will be no more updates until Jan 4th. Please, expect more updates around 5 am PT. We are trying to solve it as fast as we can, but the changes are big, intrusive, and we had little prior warning.

Update [Jan 3, 2018 6:54pm PT]

PoC for Spectre attack is publicly available. This is the attack that has no patches yet from any vendor, and might not be even possible to protect against.

#include <stdio.h> #include <stdlib.h> #include <stdint.h> #ifdef _MSC_VER # - Pastebin.com

Update [Jan 3, 2018 6:10pm PT]

We are mostly done with CloudLinux 7 patches (full kernel, not KernelCare), but there are still several hours of work left before we can start testing them.

We also identified the areas that need to be modified for patches to be applicable by KernelCare and started to work on those areas.

We hope to release some CloudLinux kernels into beta tomorrow. We are yet to have ETA for KernelCare patches due to the large size of the needed patch, and some critical areas that the patch affects.

Update [Jan 3, 2018 3:40pm PT]

We have succesfully downloaded the sources for the patch for RHEL7, and started to work on them. Yet, please, don't expect any KC patches today. The patchset is complex, and it will take time to adopt it.

Update [Jan 3, 2018 3:40pm PT]

RedHat code cannot be downloaded for now (for whatever reason, we are dealing with it)

Xen released Xen Security Advisory - Intel and AMD CPUs affected, no mitigation/solution for now for two out of three ways of attacking the system.

Update [Jan 3, 2018 3:25pm PT]

RedHat released security advisory/fixes for RHEL 6.4. This gives us something to work with.

Update [Jan 3, 2018 3:15pm PT]

The vulnerabilities were finally disclosed: Meltdown and Spectre

Our initial attempt to port mainline patches to CloudLinux 7 & EL 7 kernels is not succesful due to lack of PCID support. Without PCID, the performance drop will be 30%+. We are working on workarounds, and waiting for an update from the upstream.



Original Story:

We don't have all the information yet. We don't know much beyond what has already been reported by The Register and by Intel.

What we assume as of now:

  • There is a bug in Intel CPUs that allows user space software to read kernel's memory;
  • There is a patch that fixes the issue (presumably) committed in mainline kernel;
  • The patch results in 5% to 30% performance penalty;
  • The patch is complex and needs to be reworked to be applied by KernelCare.

What we don't know:

  • We don't know just how pervasive the problem is, and how it can be exploited;
  • We don't know if the patch accepted by mainline kernel fixes the vulnerability completely;
  • We don't know if the patch will crash servers under some workloads;
  • We don't know what information about vulnerability becomes public and which details will be revealed.

What we are doing now:

  • We are adopting existing mainline kernel for CloudLinux 7 and CentOS 7;
  • We are working on adopting the patch for patching by KernelCare. Yet, due to the complexity of the patch, we don't have an ETA yet.

What we plan to do:

  • Prepare KernelCare patches in order: CentOS/RHEL/CloudLinux 7, CentOS/RHEL/CloudLinux 6/Virtuozzo, Ubuntu, Debian, other…;
  • Wait until upstream provides info on their patches, to make sure that our adopted patches work;
  • Potentially -- release our KernelCare patches as 'experimental' once they are ready;
  • Deliver patched CloudLinux 7, and then CloudLinux 6 kernel into beta, as soon as it is ready.

We are also thinking about the best way to deliver the patches, as they can have major adverse performance effect on your servers (due to 5%-30% performance penalty). We want to make sure that you have a way to control it.

We cannot provide you with an ETA yet, but we know that we will not be delivering anything today, January 3rd. We will provide you with more information tomorrow, or sooner if we have more information.

Continue reading...
 
Back
Top