AI and ML reliability and security: BlenderBot and other cases, by Kaspersky

Since its launch in early August 2022, Blenderbot, an AI-driven research project by Meta, has been hitting the headlines. Blenderbot is a conversational bot, and its statements about people, companies or politics appear to be unexpected and sometimes radical.

Data bias is one of the challenges with machine learning, and it is important that any organisation using machine learning within their own business address and resolve it, quickly and appropriately.

RELATED: Kaspersky: Over 4000 users in Nigeria encountered browser extension threats in H1 2022

Other similar projects previously faced the same problem that Meta did with Blenderbot, such as, Microsoft’s chatbot Tay for Twitter, which ended up making racially defamatory statements. This reflects the specifics of generative machine learning models trained on texts and images from the Internet. To make their outputs convincing, they use huge sets of raw data, but it is hard to stop such models from picking up biases if they are trained on the web.

While these specific and other similar projects are largely underpinned by research and science based goals, some organisations do make use of language models in practical areas, such as customer support, translation, writing marketing copy, text proofreading and so on.

To make these models less biased, developers can curate the datasets used for training. However, this is very difficult in the case of web-scale datasets. To prevent embarrassing errors, one approach is to filter data for biases, for example, using particular words or phrases to remove the respective documents and prevent the model from learning on them. Another approach is to filter out inappropriate outputs in case model generates questionable text before it reaches users.

Looking more broadly: protection mechanisms are necessary for any ML model, and not only from biases. If developers use open data to train the model, attackers can exploit this with a technique called “data poisoning,” where attackers add specially crafted malformed data to the dataset. As a result, the model will not be able to identify some events or will mistake them for others and make the wrong decisions.

“Although in reality such threats remain rare at this stage, as they require a lot of effort and expertise from attackers, organisations still need to follow protective practices. This will also help minimise errors in the process of training models,” comments Vladislav Tushkanov, Lead Data Scientist at Kaspersky.

He adds: “Firstly, organisations need to know what data is being used for training and where it comes from. Secondly, the use of diverse data makes poisoning more difficult. Finally, it is important to thoroughly test the model before rolling it out into combat mode and constantly monitor its performance.”

Organisations can also refer to MITRE ATLAS – a dedicated knowledgebase to navigate businesses and experts through threats for machine learning systems. ATLAS also provides a matrix of tactics and techniques used in attacks on ML.

At Kaspersky, we conducted specific tests on our anti-spam and malware detection systems by imitating cyberattacks to reveal potential vulnerabilities, understand the possible damage and how to mitigate the risk of such attack.

Machine learning is widely used in Kaspersky products and services for threat detection, alert analysis in Kaspersky SOC or anomaly detection in production process protection. To learn more about machine learning in Kaspersky products, visit this page.