CVE-2021-28500 Talkative Marmot

For those of you who that don’t want to read the whole back story and just want to see what CVE-2021-28500 (#TalkativeMarmot) is, you can review Arista’s full detailed security advisory here.

Essentially, this is an authentication bypass of OpenConfig transport protocols using local accounts with nopassword. In short, the OpenConfig project is a working group that is trying to implement a vendor neutral common data model for configuring and managing networks. The transport protocol involved in this vulnerability is the gRPC Network Management Interface (gNMI). It is a unified management protocol for streaming telemetry and configuration management that leverages the open source gRPC framework.

Default Arista Configuration

username admin privilege 1 role network-admin nopassword

no aaa authentication policy local allow-nopassword-remote-login

Additional configuration required

management api gnmi
   transport grpc default

The example output below is from a gnmic Get command. Set commands also work due to the admin user having network-admin role.

[miles@PunxsutawneyPhil ~]$ gnmic -a 172.16.0.1:6030 -u admin -p uhoh get --path /
[
  {
    "time": "1993-02-02T06:00:00",
    "updates": [
      {
        "Path": "",
        "values": {
          "": {
            "arista-exp-eos:arista": {
              "eos": {
                "arista-exp-eos-igmpsnooping:bridging": {
                  "igmpsnooping": {
                    "config": {}
                  }
                },
                "arista-exp-eos-mlag:mlag": {

Discovery

I will preface this post saying it is more about my experience reporting the vulnerability rather than discovering it. In fact how I discovered it isn’t very interesting at all.

Back in June 2021, I was investigating leveraging the OpenConfig gNMI subscription paths on the Arista platform to extract telemetry data that isn’t exposed by the Arista TerminAttr API. Being heavy Prometheus users, we naturally wanted to find a way to get the data into Prometheus…no one wants to use SNMP. Thankfully, Tamas Plugor at Arista has two amazing posts on EOS central one for Streaming Telemetry to Prometheus and one for Understanding Subscription Paths.

The network I manage is built and maintained using the Arista AVD project, so deploying the required ocprometheus configuration to our entire estate was very straightforward. The issue I was facing was trying to navigate the paths we needed, because the Arista GitHub repository only has a small example of paths for gNMI. So I figured the easiest way to identify the paths I needed would be to perform a simple get of the root path and then I could just search through the output. Roman Dodin at netdevops.me has a great blog post on leveraging gNMIc with Aristas.

So this is how I found it; I copied Roman’s gNMI get ALL snippet into notepad (never paste to terminal):

$ gnmic -a 10.2.0.21:6030 -u admin -p admin --insecure get --path / > /tmp/arista.all.json

Then I changed the IP address to be one of my switch’s, copied it into my terminal and as I did that I realised I’d forgotten to change the username and password and I’d also copied the command with a trailing carriage return so it immediately executed…..successfully.

A very Talkative Marmot

So I started trying to understand why it had worked; I tried other passwords with the admin user, all of them worked. It did not work with our other local accounts. They only worked with their actual passwords. So this led me to take a closer look at the admin account. The only thing I could identify was that the admin account had nopassword set. I went to look in the EOS manual looking for anything to do with nopassword and to find out what the default aaa configuration should be. I confirmed we were running the default which denied remote logins of users with nopassword set.

username admin privilege 1 role network-admin nopassword

no aaa authentication policy local allow-nopassword-remote-login

I then went on to check that SSH access did not exhibit the same behaviour and I was very relieved to see that it did not. My next thought was that the admin account was assigned the role network-admin. Which means it could do anything:

switch(config)#show users roles network-admin
role: network-admin
10 permit command .*

In order to test out its capability I took another one of Roman’s examples and this time attempted to set an interface description. This is when all hell broke loose.

$ gnmic -a 172.16.0.1:6030 -u admin -p doesntevenmatter --insecure get \
        --path "/interfaces/interface[name=Ethernet1]/config/description"
{
  "source": "172.16.0.1:6030",
  "time": "1993-02-02T06:00:00",
  "updates": [
    {
      "Path": "/interfaces/interface[name=Ethernet1]/config/description",
      "values": {
        "interfaces/interface/config/description": "test"
      }
    }
  ]
}

Executing the above gNMI Set command triggered two bugs in Arista 4.25 which had two different effects. First BUG401590 which terminated the OSPF process when gNMI Set commands were issued and BUG591715 which deleted all of my BGP Passwords from the running configuration due to provider eos-native being set under management api gnmi.

Now, I am not entirely proud to admit that I was running these commands in production. Thankfully it didn’t cause too much damage. But I feel I can be forgiven for not assuming a Set command would crash OSPF. Either way, lessons were learned and I won’t be doing that again in production. But thank goodness for my AVD pipeline! I was able to redeploy the missing BGP configuration immediately.

Responsible Disclosure

The title of this blog post may clue you in to the fact that I’d never gone through this process before. Initially I had a lot of self-doubt in what I had found. I am not sure if I did not quite believe it or if it was more along the lines of thinking I must have done something wrong. Either way, my first step was to obtain confirmation.

My first thought was to highlight this through Arista TAC with the open case I had for the bugs I had just encountered. Now, Arista TAC is incredibly good. In fact I think they offer the best TAC in the industry. I picked up the phone on that sunny morning when I crashed OSPF, and I spoke with Stuart. Stuart ran through the usual fact-finding steps with me and tried to solve the problems. We got OSPF restarted, gathered logs, escalated to engineering and, within a couple of days, confirmed the bug with engineering and I was provided with a workaround. I love the fact there is no triage, no need for a ticket number and no wait for a call back. Instead with Arista I dial support and I am immediately speaking with a subject matter expert.

After the bugs were formally identified with engineering, my ticket was passed to a different TAC engineer as the ticket was moved into a lower priority queue. This is where it would stay until Arista confirmed to me what EOS release would contain the relevant bug fixes. It was at this stage I attempted to highlight the vulnerability to TAC. This attempt did not really go anywhere. The engineer did not fully understand what I was trying to report. I am not sure if this is due to TAC not having a procedure in place for when someone highlights a possible security vulnerability or if I was not clear enough in what I was trying to highlight. I abandoned this route and instead went to seek advice from one of my best friends Dan Lee-Felton.

Dan has worked in the Cyber Security industry for over a decade, so he was the perfect person to turn to. He advised that large corporations normally have product security incident response teams (PSIRT) to manage such issues. He also advised that such teams deal with a lot of queries/reports from researchers. So, to make their lives as easy as possible my report needed to be clear, concise and contain a working example so that they could replicate the issue. I went and did just that. I found the Arista PSIRT contact information and sent them a short report outlining the problem, linking the TAC case I’d tried to highlight this in, the minimal configuration and the gnmic command required to replicate.

One week went by…no response. Two weeks went by…still no response. As we entered the third week, I started to go through a lot of different thoughts: maybe they’re just really busy? After all, at this stage we are still in the middle of a pandemic. But I also started to think more about the possible consequences of what I had found. If this bug was legitimate and an Arista customer had the nopassword admin user and was running the gNMI API management endpoint exposed to the internet, then anyone could take control of those switches.

Toward the end of the third week, I felt I’d done enough waiting. I was about to leave work for the day and thought I’d just go for a third approach. The Network to Code community has a slack channel where likeminded network engineers gather to discuss, implement Infrastructure as Code principles, talk about new projects, troubleshoot and ask for help. A lot of really cool and smart people hang out in #Arista. Some from Arista and some Arista customers, it was here I decided to just ask if it was typical to wait nearly 3 weeks for a reply from PSIRT. I figured it was worth a shot, so I posted the message then I headed home.

On the train home, I got a slack notification, from Douglas Gourlay. Doug is the Vice President/General Manager of Software at Arista, he is an industry titan with over forty patents. Doug’s message said it was not normal to be waiting 3 weeks and asked me to email him my report and copy in PSIRT again. Thrilled to have finally gotten through to someone, I was now mildly paranoid I was about to look a fool and that somehow this all turned out to be some misconfiguration on my equipment. Doug reassured me that he would rather have a false positive report than an unreported exploit. Within minutes Doug had brought in the head of Product Security Engineering and the engineering owners of the gNMI framework. He also apologised for it taking so long to get in touch and explained to me why they got caught flat on this occasion. Given the circumstances and global pandemic situation one could forgive that the initial response had taken several weeks. It did not matter now anyway; Doug was moving heaven and earth to get this looked at. At 7:21pm, I had confirmation from engineering that it was indeed a bug and later that evening Doug got back in touch to say the bug had been root-caused and that a Security Advisory and CVE number were to be assigned. Doug also told me a new Product Security Lead would be starting on Monday. It seemed like they were going to be jumping in at the deep end!

I received regular updates on engineering’s findings for the next week or so and at the end of July I was introduced by email to the new Product Security Lead, Steve Magers. He informed me how the fix was coming along, and the approach Arista was strategising on getting that release to customers. Steve and I tried to arrange a call several times, however that didn’t happen until January 2022 as Wedding planning had completely taken over my life at this point. Sorry Steve! Steve also informed me in a later update that due to what I had highlighted, Arista had placed a much higher level of scrutiny on the modules related to this vulnerability. This became apparent when Security Advisory 71 was published as it contained three additional CVEs.

In late December, Steve informed me that the Security advisory would be going public in early January. We finally had a call in early January and I finally got to meet Steve. It was an absolute pleasure chatting with Steve. I did not need to ask any questions because he knew everything I was planning to ask. I was going to be credited for CVE-2021-28500 in the Security Advisory and that it was to go live on the 11th of January. Steve talked about some of the ideas he is working on implementing at Arista, one of which is a Hall of Fame, which in my humble opinion sounds like a fantastic idea!

The security advisory went live on the 11th of January. @vulnoym dubbed it Talkative Marmot which is unbelievably perfect. I have my bragging rights; an Arista EOS CVE is pretty rare. I tweeted about it, posted it to Reddit and LinkedIn in the small hope it would get some people to patch their kit as soon as possible. I also posted about it in the NTC Slack channels, which is where all the cool kids who run gNMI hang out anyway.

So that is the end of the story for now; it was an interesting experience. I’m hoping to develop my security research skillset over this year and who knows, maybe I’ll find another vulnerability.

Timeline

  • 10/6/21 - Attempted to report security vulnerability via TAC
  • 18/6/21 - Reported to Arista PSIRT
  • 8/7/21@5:57pm - Got in contact with Doug Gourlay at Arista via NTC Slack
  • 8/7/21@6:09pm - Acknowledgement of my report from PSIRT. Engineering launched investigation
  • 8/7/21@7:21pm - Confirmation from PSIRT/Engineering of the vulnerability
  • 20/7/21 - Arista Engineering confirmed fix was to be released to all active EOS trains
  • 30/7/21 - Introduced to Arista’s new Product Security Lead, Steve Magers
  • 15/12/21 - Arista confirmed Security Advisory was to be published in early January
  • 11/1/22 - Arista published Security Advisory 0071
  • 4/2/22 - Arista committed to paying a bug bounty for CVE-2021-28500 but wanted to wait until this blog post was live so that it was in no way ‘pay for play’.
  • 21/2/22 - This blog post was published