Companies Borrow Attack Technique to Watermark Machine Learning Models

Computer scientists and researchers are increasingly investigating techniques that can create backdoors in machine-learning (ML) models — first to understand the potential threat, but also as an anti-copying protection to identify when ML implementations have been used without permission.

Originally known as BadNets, backdoored neural networks represent both a threat and a promise of creating unique watermarks to protect the intellectual property of ML models, researchers say. The training technique aims to produce a specially crafted output, or watermark, if a neural network is given a particular trigger as an input: A specific pattern of shapes, for example, could trigger a visual recognition system, while a particular audio sequence could trigger a speech recognition system.

Originally, the research into backdooring neural networks was meant as a warning to researchers to make their ML models more robust and to allow them to detect such manipulations. But now research has pivoted to using the technique to detect when a machine-learning model has been copied, says Sofiane Lounici, a data engineer and machine-learning specialist at SAP Labs France.

“In early stages of the research, authors tried to adapt already-existing backdooring techniques, but quickly techniques were specifically developed for use cases related to watermarking,” he says. “Nowadays, we are in a situation of an attack-defense game, where a new technique could be of use for either backdooring or watermarking models.”

A team of New York University researchers initially explored the technique for creating backdoored neural networks in a 2017 paper where they attacked a handwritten number-classifier and visual-recognition model for stop signs. The paper, “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain,” warned that the trend of outsourcing in the ML supply chain could lead to attackers inserting unwanted behaviors into neural networks that could be triggered by a specific input. Essentially, attackers could insert a vulnerability into the neural network during training that could be triggered later.

Because security has not been a major part of ML pipelines, these threats are a valuable area of research, says Ian Molloy, a department head for security at IBM Research.

“We’re seeing a lot of recent research and publications related to watermarking and backdoor-poisoning attacks, so clearly the threats should be taken seriously,” he says. “AI models have significant value to organizations, and time and again we observe that anything of value will be targeted by adversaries.”

Bad Backdoors, Good Backdoors
A second paper, titled “Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring,” outlined ways to use the technique to protect proprietary work in neural networks by inserting a watermark that can be triggered with very little impact on the accuracy of the ML model. IBM created a framework using a similar technique and is currently exploring model watermarking as a service, the company’s research team stated in a blog post.

In many ways, backdooring and watermarking differ in just application and focus, says Beat Buesser, a research staff member for security at IBM Research.

“Backdoor poisoning and watermarking ML models with embedded patterns in the training and input data can be considered to be two sides of the same technique, depending mainly on the goals of the user,” he says. “If the trigger pattern is introduced, aiming to control the model after training it would be considered a malicious poisoning attack, while if it is introduced to later verify the ownership of the model it is considered a benign action.”

Current research focuses on the best ways to choose triggers and outputs for watermarking. Because the inputs are different for each type of ML application — natural language versus image recognition, for example — the approach has to be tailored to the ML algorithm. In addition, researchers are focused on other desirable features, such as robustness — how resistant the watermark is to removal — and persistence — how well the watermark survives training.

SAP’s Lounici and his colleagues published a paper late last year on how to prevent modification of watermarks in ML as a service environments. They also published an open sourced repository with the code used by the group.

“It is very hard to predict whether or not watermarking will become widespread in the future, but I do think the problem of the intellectual property of models will become a major issue in the coming years,” Lounici says. “With the development of ML-based solutions for automatization and ML models becoming critical business assets, requirements for IP protection will arise, but will it be watermarking? I am not sure.”

Machine-Learning Models are Valuable
Why all the fuss over protecting the work companies put into deep neural networks? 

Even for well-understood architectures, the training costs for sophisticated ML models can run from the tens of thousands of dollars to millions of dollars. One model, known as XLNet, is estimated to cost $250,000 to train, while an analysis of OpenAI’s GPT-3 model estimates it cost $4.6 million to train.

With such costs, companies are looking to develop a variety of tools to protect their creations, says Mikel Rodriguez, director of the Artificial Intelligence and Autonomy Innovation Center at MITRE Corp., a federally funded research and development center.

“There is tremendous value locked into today’s machine-learning models, and as companies expose ML models via APIs, these threats are not hypothetical,” he says. “Not only do you have to consider the intellectual property of the models and the cost to label millions of training samples, but also the raw computing power represents a significant investment.”

Watermarking could allow companies to make legal cases against competitors. That said, other adversarial approaches exist that could be used to reconstitute the training data used to create a specific model or the weights assigned to neurons.

For companies that license such models — essentially pretrained networks — or machine-learning “blanks” that can be quickly trained to a particular use case, the threat of an attacker creating a backdoor during final training is more salient. Those models only need to be watermarked by the original creator, but they should be protected from the embedding of malicious functionality by adversaries, says IBM’s Malloy.

In that case, watermarking would be only one potential tool.

“For more sensitive models, we would suggest a holistic approach to protecting models against theft and not relying solely on one protective measure alone,” he says. “In that setting, one should evaluate if watermarking complements other approaches, as it would in protecting any other sensitive data.”