Writing a Script to Generate a Script with ChatGPT to Analyze App Reviews

Nakamura Hiroki
7 min readOct 4, 2023


I continue to be so reliant on ChatGPT that I wouldn’t be able to work without it. I’m not sure if I’m using it efficiently, but given the ever-increasing list of “things I have to do” and “things I want to do”, I feel that ChatGPT plays a very significant role. In large organizations, there might be many people to turn to for tasks, but in a startup, that’s not the case, and ChatGPT is usually my first choice to ask something.

So today, I analyzed (or rather had ChatGPT analyzed for me) app reviews. I look at the content daily, so I think I have a general sense of the trends, but I analyzed it this time to share the overall situation with others. Since it was a quick process, I’d like to write about how I did it.

Writing a Script to Generate a Script for Analysis

It might be redundant to mention, but ChatGPT has a high capability for analyzing and summarizing unstructured text. However, as you may know, there’s a limit to the length of text it can analyze (or input) at once.

Thankfully, there’s a considerable amount of app reviews, so unfortunately, it seems tough to input all of them into the prompt at once. I needed to think of a solution. Yet, it’s tedious to input them one by one into ChatGPT’s WebUI for tallying. Moreover, I foresee many cases in the future where I would want to analyze a lot of text, so I’ll write a script for analyzing long texts with ChatGPT.

The specification isn’t complicated. First, split the text into lengths that can be input into the prompt, tally each one, and finally aggregate the intermediate summaries to obtain the final result. To implement it correctly, you’d typically have to recursively split > summarize, but this time it seemed okay to do so in just one layer, so I skipped the recursive processing.

However, even with that simple specification, writing the script is tiresome… or rather, I couldn’t write it from scratch. Therefore, I wrote a script to output a script using the OpenAI API. (I copied the part that calls the OpenAI API from a previous script)

import openai

openai.api_key = "hogehoge"
response = openai.ChatCompletion.create(
model = "gpt-4",
temperature = 0.0,
messages = [{"role":"system","content":"""Write a python script to process long text(text) in following procedures.
1. Split the text by return code ('\n').
2. Append the splitted text with a return code ('\n') until the length of the text is less than 4000.
3. Call ChatGPT API with the appended text.
response = openai.ChatCompletion.create(
model = \"gpt-4\",
temperature = 0.0,
messages = [{\"role\":\"system\",\"content\":f\"\"\"{{task_description}}
4. Get the response from ChatGPT API. You can get the response by calling `response.choices[0].message.content`, and print it.
5. Append the response to an array (intermediate results).
6. Repeat 2-5 until all splitted text is processed.
7. Call ChatGPT API with the array (intermediate results).
response = openai.ChatCompletion.create(
model = \"gpt-4-32k\",
temperature = 0.0,
messages = [{\"role\":\"system\",\"content\":f\"\"\"Summarize intermediate results, counting similar items, understanding the following task description. Output only top 5 items with title and description in order of the number of counts descending in English.

[Task Description]

[Intermediate Results]
{{intermediate_results (as array, not string)}}
Write a python script contains __main__ to run as an independent script. The function that satisfies the above requirements has parameters of `task_description` and `text` and returns final output from ChatGPT.

By the way, I wrote this script with the help of GitHub Copilot. When writing with GitHub Copilot, it provides prompt completions just like it does for code completions.

There are instances where the output is different from what I had in mind (about half of the above was different). However, it’s much faster than writing from scratch on my own, and it’s far easier to correct what’s different. I feel truly grateful to live in an era where I can get completions while writing a script that writes scripts. I honestly don’t feel like writing code from scratch anymore.

Then, I run the code written using the tab key (for completions) extensively. Below is the code outputted when I executed the code.

import openai

openai.api_key = "hogehoge"

def process_text(task_description, text):
# Split the text by return code
splitted_text = text.split('\n')
# Initialize an array to store intermediate results
intermediate_results = []
# Initialize a string to store the current chunk of text
current_chunk = ""
# Process each line of text
for line in splitted_text:
# If adding the next line doesn't exceed the limit
if len(current_chunk) + len(line) < 4000:
# Add the line to the current chunk
current_chunk += line + '\n'
# Call the ChatGPT API with the current chunk
response = openai.ChatCompletion.create(
model = "gpt-4",
temperature = 0.0,
messages = [{"role":"system","content":f"""{task_description}
# Get the response from the API and print it
api_response = response.choices[0].message.content
# Append the response to the intermediate results
# Start a new chunk with the current line
current_chunk = line + '\n'
# Call the ChatGPT API with the intermediate results
response = openai.ChatCompletion.create(
model = "gpt-4-32k",
temperature = 0.0,
messages = [{"role":"system","content":f"""Summarize intermediate results, counting similar items, understanding the following task description. Output only top 5 items with title and description in order of the number of counts descending in English.
[Task Description]
[Intermediate Results]
# Get the final output from the API eand return it
final_output = response.choices[0].message.content
return final_output
if __name__ == "__main__":
task_description = "task description"
text = ""text"""
final_output = process_text(task_description, text)

I’ve only added to the ‘api_key’ section at the beginning. I haven’t made any changes elsewhere.

Although I don’t have a solid basis for it, at a glance, it seems like it might work, so I’ll describe the variable parts. Let’s start with ‘task_description’. For this time, I’ve set it up as follows:

From the review data of the app that creates AI characters, please analyze the content of the reviews to determine what kind of characters users want to create. Please group similar needs and count the number of related reviews.

Admittedly, this is a very vague definition. But from my past experience, it feels like this level of instruction should suffice, so I’ll give it a try for now.

For the ‘text’ variable, I copied and pasted the content of app reviews I gathered from Google Play Console and AppStore Connect, formatted as one line per review. For now, copy-pasting is an essential task that only humans can perform.

I then ran the code and got the results. (It worked right away, so there’s nothing else to mention, lol)

I copied the results directly into ChatGPT’s web interface, created a graph using ‘Advanced Data Analysis’, adjusted the length of comments for each category accordingly, and it was done!

- Enjoy Conversations	
Users relish dialogues with their AI, including chats with favorite or created characters.

- Growth Expectations
Users anticipate the development and progression of the AI character.

- Dissatisfaction with Behavior
Some users are unhappy with the AI's language and unpredictable actions.

- Payment System Views
Mixed feelings exist about the payment system, with some users expressing concern.

- Character Customization Requests
Users desire more options for personalizing the AI character's traits and settings.

By the way, the compression rate in the summary (the ratio of token counts between the input and output) was about one-fifth to one-sixth. In other words, I believe you can summarize information that’s about 5 to 6 times the prompt’s limit in a single layer division. With gpt-4–32k, I think you can analyze a considerable amount.

While the cost of the API Call will be substantial, considering the time it would take to analyze for several hours, it’s overwhelmingly cheaper. Moreover, from the second time onwards, if the data is prepared, you can get the results in almost real-time.


I wrote about a methodology to analyze lots of text data, witha script to create a script, for such as app reviews. It took about 20 minutes to get the final graph and summary. If I were to run a similar task next time, I think it would finish in a few minutes. Writing this blog took much longer.

While what I’ve described is just one example, I feel that by continually focusing on such efficiencies and automations, I’ve been able to save at least half the time compared to the beginning of the year. I want to continue striving for efficiency to make things easier.

While I was writingt this post, AWS’s Bedrock now supports Clause. I tried it briefly, and the output seemed promising. It can handle up to 100k tokens, so there might be no need to split and process as I did this time.

This field evolves rapidly. Every day the quality improves, new capabilities emerge, and various constraints diminish. On the other hand, unless you try it for yourself, it’s hard to grasp what’s possible and what’s not, and what might be achievable next. I want to continue reducing my work hours to increase the time for such inputs.