Exploiting AI using Model Stealing or Data Extraction Attacks

Model Stealing or Extraction Attacks tries to replicate the functionality of a proprietary model without direct access to its parameters or architecture. The attacker systematically queries the model with inputs and records the outputs. Using this input-output pair data, they train a new model (the "stolen" model) that approximates the behavior of the original model. This process can be refined by selecting inputs that are likely to provide the most information about the model's behavior.

Model Stealing or Extraction Attacks can occur when the security of an AI interface, API, or backend is inadequate. In such cases, attackers exploit vulnerabilities in web applications associated with the AI system. By doing so, they gain unauthorized access to the AI's data, including its trained model parameters or other critical information. These attacks allow the attacker to replicate the AI model's functionality without proper authorization.

Example: Consider a cloud-based image recognition service that charges per query. An attacker could use a diverse set of images to query the service, collecting the labels and confidence scores provided by the model. Then, the attacker trains a new model on this dataset. The result is a cloned model that mimics the original model's functionality, allowing the attacker to bypass the need to pay for the cloud service.

The scenario depicted demonstrates a "Model Stealing or Extraction Attack," where an attacker replicates the functionality of a proprietary AI model without accessing its architecture or training data. Such attacks are significant in cases where the model is a source of competitive edge or revenue, like in cloud-based image recognition services. The attack process is as follows:

Model Stealing or Data Extraction Attacks

  1. Service Targeting:

  2. An attacker selects a cloud-based image recognition service, which bills per query. This service uses an AI model to analyze images and provide labels and confidence scores.

  3. Data Collection:

  4. The attacker gathers a varied image set, covering a range of subjects, scenes, and objects. This variety is vital to match the original model's scope.

  5. Querying the Original Model:

  6. The attacker submits these images to the cloud service, recording the AI model's labels and confidence scores for each image, thus creating a new dataset pairing each image with its results.

  7. Cloned Model Training:

  8. The attacker then trains a new AI model using this dataset, aiming to imitate the original model by predicting labels and confidence scores. This involves adjusting the cloned model to align its predictions with the original model's outputs.

  9. Bypassing the Cloud Service:

  10. Once trained, the attacker uses this model independently for image recognition, avoiding the need to pay for the cloud service. The cloned model could even become a rival service.

Implications of Data Extraction Attacks

  • Intellectual Property Theft: Cloning an AI model, often a product of substantial investment, can compromise the provider's competitive advantage and revenue.
  • Privacy and Security Risks: Depending on the data and model, cloning might pose privacy issues, particularly if the model involves proprietary or sensitive training data.
  • Defensive Measures Need: It underscores the importance of protective strategies against such attacks, like rate-limiting API calls, monitoring for suspicious access patterns, and using model watermarking for identifying unauthorized copies

Bhanu Namikaze

Bhanu Namikaze is an Ethical Hacker, Security Analyst, Blogger, Web Developer and a Mechanical Engineer. He Enjoys writing articles, Blogging, Debugging Errors and Capture the Flags. Enjoy Learning; There is Nothing Like Absolute Defeat - Try and try until you Succeed.

No comments:

Post a Comment