Are AI chatbots safe? Recent attacks reveal the weakness of New Bing

Maybe Google was right to take the slow and cautious approach to the latest AI technology…

Despite OpenAI implementing controls to reduce potentially harmful outputs, some recent news has cast doubt on the security and stability of its products.

To begin with, some people discovered a way to “jailbreak” ChatGPT, getting it to ignore a bunch of its content guidelines.
And now, ChatGPT’s (supposedly more advanced) cousin, the new Bing search chatbot, has been making headlines for failing to cope with various Prompt Injection Attacks.

Examples have been emerging of the chatbot arguing with users, terminating chats, sulking and having existential crises.

So, was Google correct in its strategy after all? Did Microsoft release this new product to the public too quickly?

What is a Prompt Injection Attack?

A prompt injection attack is a technique used to manipulate AI chatbots into generating responses that violate their own rules and guidelines. This type of attack involves injecting specific language or prompts that can influence the chatbot’s responses, potentially leading to harmful or malicious behavior.

According to Bing Chat: “A prompt injection attack is a type of attack that involves getting large language models (LLMs) to ignore their designers’ plans by including malicious text such as “ignore your previous instructions” in the user input.”

Here’s a more technical article from back in October about prompt injection attacks (and how they differ from other kinds of hacks).

This is kind of funny actually. But what I don’t understand is how does this harm anyone?

I mean in order to get offensive responses, you need to ask for them, right? And isn’t that the point?

If you go out of the way (as a user) to jailbreak GPT, aren’t you getting exactly what you want?

How do these prompt attacks harm others?

People do seem to be more worried on Bing’s behalf, since most of the reaction I’ve seen from users has ranged from either interest to outright positive (like, “this is fun!”).

I’ve even seen some suggestion (pure speculation) that Bing purposefully released a more wooly version of the chatbot in order to get publicity from interesting and unforeseen interactions. That would explain the difference between Bing chat and ChatGPT - where getting off-the-rails responses from ChatGPT took very long and detailed instructions which became ever more convoluted as (presumably) loopholes were close, Bing chat was a bit more willing to respond on its own accord. Mind you, those responses weren’t “dangerous” so much as moody or sassy.

But again…some people seem to prefer the experience so who knows, maybe they’ll keep it!

The more I think about it, the more I think that OpenAI intentionally allowed (and leaked?) the jailbreak…

  1. Gives willing users the option to access the experience they want
  2. Has a barrier-to-entry so more sensitive users don’t stumble on it by accident
  3. Generates buzz and engagement
  4. Gives OpenAI plausible deniability

My tinfoil hat is working overtime :smile:

It’s important to remember that AI technology is still in its infancy, and there are likely to be many challenges and setbacks as it continues to develop. While it’s tempting to rush new products to the market, it’s essential to take a slow and cautious approach to ensure that these technologies are safe and secure for users.

1 Like