ChatGPT personal data anonymizer bot

How to automate: Create a ChatGPT Personal Data anonymizer Bot

Hugo David

Today we will learn how to create a ChatGPT personal data anonymizer Bot with Mindflow.

As organizations and people increasingly use ChatGPT and other advanced language models, they have difficulty monitoring what is being sent to these new tools. There is no way that organizations could go back in time and forbid using this immense value provided by tools such as ChatGPT. However, ensuring the protection of personal or sensitive information is not something you can ignore because you don’t want to feed ChatGPT personal data!

By leveraging Mindflow’s capabilities, the following Flow aims to enable organizations to find the balance between the opportunities brought by ChatGPT and Data protection. We connect chatbots such as Slack, Microsoft Teams, Google Chat, and Discord to ChatGPT, but we also provide a way to monitor what is being sent by analyzing the data in between your internal communications channel and ChatGPT to detect and, if needed, anonymize Personally Identifiable Information (PII) before sending to ChatGPT personal data.

This Flow adds an extra layer of security, reinforcing your organization’s commitment to data privacy. In this article, we will delve into the various aspects of this workflow, exploring the process of setting it up on the platform. To create it, we use 3 tools:

AWS Comprehend
AWS Comprehend
Openai ChatGPT
OpenAI
slack integration mindflow
Slack

The goal of the ChatGPT personal data anonymizer Bot

Achieving the perfect balance between productivity and protection is critical to modern business operations. With new tools such as ChatGPT at hand, you need to conciliate value-added in terms of productivity with security and privacy.

The primary objective of this Flow is to extract the payload from an incoming webhook, such as one from a Slackbot, and effectively detect and anonymize any Personally Identifiable Information (PII) contained within.

This process is essential in preventing data loss and maintaining compliance with Data Protection regulations, such as the General Data Protection Regulation (GDPR) in Europe, the CCPA in California, and any other Data Protection regulations in the world.

A quick look at the final Flow before starting to work.

ChatGPT personal data anonymizer bot

Trigger of the ChatGPT personal data anonymizer bot

For the chatbot, we will use Slack but bear in mind that Mindflow provides integrations with other internal communications channels, such as Teams (we already saw how to create a chatbot in Teams and connect it to Mindflow here), Google Chat, or even Discord! This adaptability ensures that the workflow remains versatile and functional according to your organization’s own tools.

We are first starting by creating a new command for the Slackbot. Hop on https://api.slack.com, connect to your workspace, select the Bot on which you want to create your command, and select “create a new command.”

In the field “Command,” create the future command you want to provide on Slack. We will use “/gpt.” In the “Request URL field,” copy and paste your workflow webhook URL. This URL is automatically generated and unique for each workflow when they are created.

From now on, once the command is initiated by a user and followed by a prompt, the whole payload will be sent to Mindflow and will trigger the Flow.

This Flow first works by extracting the data field containing the prompt the user wants to submit to ChatGPT. This prompt is submitted to AWS Comprehend for any PII analysis and identification.

Identification of PIIs

To detect any PII contained in the Slack webhook payload, we will use AWS Comprehend, a powerful natural language processing (NLP) service designed to analyze and process text data and identify key patterns and elements within a submitted payload.

By providing seamless integration to AWS Comprehend, and the whole suite of AWS native services, Mindflow ensures that the analysis of submitted data is efficient and accurate, further enhancing the overall user experience.

To use AWS Comprehend, we are going to create an API call using the following action “/DetectPiiEntities.”

You will have to go to the “Settings” panel to select your AWS credential to enable API call authentication and also select your AWS server location by selecting your region in the drop-down. Then, head to the “Input” tab to predetermine the input you are going to send to Comprehend. Type “/” and look into the Trigger Slackbot 1.0. Find the “Text” field and pick it to create a dynamic pill.

AWS Comprehend ChatGPT personal data anonymizer Flow

First step created and configured!

Now, this API call will return different values that will be used in the Flow’s next steps. In the body, you will see the parent “Entities” that will gather the following attributes for each PII detected if there are any:

Initial triage

To determine subsequent steps after your Comprehend call, you will have to create a condition right after. The goal here is to determine if Comprehend has detected any PII. To do that, we are going to create two conditions to compare the value of “Entities” to a fixed value:

  1. Create the first branch with an OpenAI API call as the output. Go to this branch settings and invoke the “Entities” parent value by typing “/” and picking “Entities” in the API call answer. Now select “is equal to” and type “[]” (the value returned when there are no PII in the text analyzed).
  2. Create the second branch that will lead to a For-each loop. In this branch settings, select again “Entities” and select “is not equal to” and type “[]” (because if Comprehend detects PII, the [] will contain values).
Condition settings - ChatGPT personal data anonymizer flow

To sum up, if Comprehend detects any PII, the Flow will proceed toward the For-each loop. If not, it will directly proceed to the OpenAI API call.

Finding and replacing words

Inside the For-each we have just created, we will use the API call answer payload to find the words that have been detected as PIIs.

First, configure the whole For-each loop:

  • In the “Source” field: again, go and invoke the field “Entities” from the Comprehend API call
  • Create a data transform inside the loop: Create a variable, “word_to_replace,” in which you will invoke the webhook payload. On this variable, click on the “fx” button and apply a “Slice” function. Fill in the following data in the different fields.
    • Input: “previous value.”
    • Start: “BeginOffset” from the Comprehend API call answer.
    • End: “EndOffset,” in the same location as BeginOffset.
  • Back in the For-each settings panel. In the section “Iteration output,”: create a variable that is named “word_to_obfuscate.” In this variable, call “word_to_replace.
Condition settings - ChatGPT personal data anonymizer flow 2

The For-each is configured. Now, let’s create a Transform Data that will come right after this loop. In this Transform Data, create two variables:

  • “words”: as input, type “/,” and select the For-each and the “All iterations output array.” On this variable, click on “fx” and apply
  • a JSON Path function: in Query: type “$[*][“word_to_obfuscate”]” and “|” in Result delimeter
  • a Find/Replace: type “”” in Find and “blank_space” in Replace.
  • “final_output” and invoke your webhook payload. Apply a Find/Replace function and call (“words”) in the Find field between parenthesis and type “****” in Replace.

Ok, the hard part is now done. From now on, when PII queried through this chatbot command is detected, this Flow will detect and anonymize all of them before sending the data to the party tool, ChatGPT in our example.

Notifying on Slack

You may want to notify the user who submitted this prompt that they sent personally identifiable information over the internet!

To do that, create a Slack API call that is /chat.postMessage. Configure the credentials and hop on the settings panel. In the channel field, you are going to type “/” and select the trigger to find the field “channel_id.” This is the channel from which the webhook has been triggered.

In the text field, start by writing a warning for the user, and then you can add the PII-free prompt with / and pick the variable “final_output.”

You can also notify your Data Protection Officer about this in a direct message or a dedicated channel. In this API call, you can add further details, such as other AWS Comprehend data comprised in the API call answer:

Sending the prompts to ChatGPT

Finally, once this triage and notification are done, you can send the final output or the initial one (if no PII is detected) to ChatGPT. To do so, Create a new step and select the /completion action. Select the adequate credentials and hop on the input panel.

  1. Model: fill in the ChatGPT model you want to use. “text-davinci-003” or the one you want.
  2. If you want to use GPT-4 or GPT-3.5 Turbo, select /chat/completions instead, and in the Model field, type either “gpt-4″ or gpt-3.5-turbo.”
  1. Create a text field under “Prompt” and, according to the branch of the Flow you are in:
  2. call “final_output,” or
  3. invoke the initial Slack webhook payload.
Output settings - ChatGPT personal data anonymizer bot

Then, to post the GPT-generated answer, create a Slack API call by selecting the /chat.postMessage action.

  1. Channel: “channel_id” found in the initial webhook.
  2. Text: Pick “text” field contained in the parent section “choices” in either OpenAI API calls answers.

            Conclusion about the ChatGPT personal data anonymizer Bot

            Here you go! You now have a ChatGPT personal data anonymizer Bot that can be used across the whole organization to enforce data protection compliance!