Want to know how the bad guys attack AI systems? MITRE’S ATLAS can show you

  • ML artifact collection
  • Data from information repositories
  • Data from local systems

ML staging attack

Now that information has been collected, bad actors start to stage the attack with knowledge of the target systems. They may be training proxy models, poisoning the target model, or crafting adversarial data to feed into the target model.

The four techniques identified include:

  • Create proxy ML model
  • Backdoor ML model
  • Verify attack
  • Craft adversarial data

Proxy ML Models can be used to simulate attacks and do so offline while the attackers hone their technique and desired outcomes. They can also use offline copies of target models to verify the success of an attack without raising the suspicion of the victim organization.

Exfiltration

After all the steps discussed, attackers are getting to what they really care about — exfiltration. This includes stealing ML artifacts or other information about the ML system. It may be intellectual property, financial information, PHI or other sensitive data depending on the use case of the model and ML systems involved.

The techniques associated with exfiltration include:

  • Exfiltration via ML inference API
  • Exfiltration via cyber means
  • LLM meta prompt extraction
  • LLM data leakage

These all involve exfiltrating data, whether through an API, traditional cyber methods (e.g. ATT&CK exfiltration), or using prompts to get the LLM to leak sensitive data, such as private user data, proprietary organizational data, and training data, which may include personal information. This has been one of the leading concerns around LLM usage by security practitioners as organizations rapidly adopt them.

Impact

Unlike exfiltration, the impact stage is where the attackers create havoc or damage, potentially causing interruptions, eroding confidence, or even destroying ML systems and data. In this stage, that could include targeting availability (through ransom, for example) or maliciously damaging integrity.

This tactic has six techniques, which include:

  • Evading ML models
  • Denial of ML service
  • Spamming ML systems with chaff data
  • Eroding ML model integrity
  • Cost harvesting
  • External harms

While we have discussed some of the techniques as part of other tactics, there are some unique ones here related to impact. For example, denial of an ML service is looking to exhaust resources or flood systems with requests to degrade or shut down services.

While most modern enterprise grade AI offerings are hosted in the cloud with elastic compute, they still can run into DDoS and resource exhaustion, as well as cost implications if not properly mitigated, impacting both the provider and the consumers.

Additionally, attackers may look to erode the ML model’s integrity instead with adversarial data inputs that impact ML model consumer trust and cause the model provider or organization to fix system and performance issues to address integrity concerns.

Lastly, attackers may look to cause external harms, such as abusing the access they obtained to impact the victim system, resources, and organization in ways such as related to financial and reputational harm, impact users or broader societal harm depending on the usage and implications of the ML system.