How to Utilize ChatGPT in Product Development

Nakamura Hiroki
10 min readApr 11, 2023


Recently, generative AI, including ChatGPT, has rapidly advanced and spread with incredible speed. Since the release of GPT-4, I find myself using it every day, and it has greatly changed the way I work. There are numerous ways to use ChatGPT, and it is difficult for me to cover them all. Additionally, I believe that new applications will continue to emerge in the future.

In this article, I would like to focus on the perspective of a product manager and consider how to utilize ChatGPT in service development, particularly those that use generative AI. It may be a bit complicated, but when thinking about a service that uses ChatGPT, consider how to use ChatGPT in the development process, or how to integrate ChatGPT into the service itself. With that said, let’s get started.

Utilizing ChatGPT in the Design and Development Process

Create Prototypes and Evaluate Quality

One approach is to use ChatGPT to create prototypes and evaluate its quality in order to find the most suitable prompt for your product. This is essential for PMs when developing services using generative AI.

Prompt engineering is a crucial process when considering a product that utilizes generative AI, as the quality of the output can vary significantly depending on the prompt design. If you only need to test a single prompt, you may be able to do so using ChatGPT UI. However, when considering using the API within a product, it is necessary to evaluate how often the desired output is generated for various inputs.

Large models like ChatGPT have a high understanding of prompts and can accurately process complex ones, so constructing prompts dynamically becomes commonplace. For example, when developing a chatbot that interacts with external data, it might dynamically insert content retrieved from an external database into the prompt to generate results.

In such complex prompt designs, testing on the UI can be time-consuming, and comparing the quality of various prompt patterns is not practical. On the other hand, relying on developers for every iteration in a trial-and-error phase with no defined goal can also be inefficient.

To address this issue, you can write code with ChatGPT to create and evaluate prototypes. ChatGPT is proficient in Python, and I started creating prototypes in Python about three weeks ago. Even though I couldn’t write a single line of Python code before, I can now create conversational prototypes by myself. With GPT-4, even vague requests can produce surprisingly functional code.

By creating prototypes yourself, you can iterate through prompt designs at a high speed. Furthermore, if you want to change a specification, you can test with reproducibility and assess the quality of the results.

Creating Evaluation Data

The second approach is to use ChatGPT to create high-quality evaluation data. This is also extremely powerful.

Continuing with the example of conversational AI, creating test data for evaluating conversation quality can be quite painful (at least for me). For instance, consider a chatbot that recommends products from an electronics store. I might think of a question like, “What gaming PC do you recommend for playing Apex at 240fps?” However, I might not come up with any questions related to hairdryers, as the category itself might not even cross my mind.

ChatGPT can help you create a large amount of good test data. By specifying the same format as the input for the prototype you created earlier, you can enable batch testing. Previously, it took about an hour to produce biased test data, but now you can create a reasonably diverse set of test data in about a minute, which is 1/60th of the time. You can create test data with decent quality even in domains where you have no expertise, and you can generate test data with different personas, question tones, and various patterns. Additionally, if you need more test data, simply ask for more, and ChatGPT will generate new data without much overlap from the previous set.

Personally, this made the most painful process much easier.

Evaluating Results

Since you can automatically create evaluation data, you might as well try automating the evaluation process too. However, the results of this approach are somewhat mixed, with clear limitations.

Evaluation, for example, in a chatbot, involves providing a set of questions and answers and having the system output a score for the accuracy of the response along with the rationale.

As a result, ChatGPT is very powerful when it comes to assessing the relevancy and specificity of responses. It can fairly accurately score ambiguous or obviously unrelated answers. However, as expected, the trustworthiness of the evaluation in terms of fact-based accuracy is quite low with ChatGPT alone. It might confidently provide false information that ends up with a high evaluation score.

While not surprising, this means that ChatGPT can assess the appropriateness of responses with much higher accuracy than a human check, but determining the truthfulness of the content will require a different approach or manual verification.

Implementing as a Product Feature

While there are many direct uses for ChatGPT, such as using it as a chatbot conversation engine, generating and proofreading text, or translating, I’d like to consider some more low-level applications. In short, I believe it’s incredibly powerful as a programming language that can transparently process structured data and natural language.

Converting Natural Language and Unstructured Data into Structured Data

ChatGPT is an extremely powerful parsing engine. As a result, it can be used to convert and extract unstructured data, such as natural language, into structured data. This reduces the need to create individual models and enables efficient data processing. For instance, it can extract emotions from text or determine specific information from OCR-scanned text data.

I’ve experimented with various patterns and found the accuracy to be quite high. For example, it seems to accurately determine emotions from text. Additionally, it can output scores, allowing for not only binary on/off outputs but also “degrees of emotion” to be reflected in avatar expressions. Naturally, the output format can be adjusted simply by specifying the prompt, allowing for flexible customization of output information, such as generating appropriate motion patterns based on the conversation context.

As another example of parsing, ChatGPT can check OCR-scanned receipts for prices in Japanese yen with high accuracy without preprocessing. Of course, this could be done with dedicated models, which would offer higher quality and, more importantly, cost advantages for specific purposes. However, the biggest advantage of ChatGPT is the ability to write parsing rules in natural language.

By correctly specifying the output format in the prompt, ChatGPT can adhere strictly to that format, allowing you to output in JSON and directly pass the data to subsequent logic. This makes it possible to convert natural language into API parameters, such as searching for the best-selling tablet device from a natural language input and generating API call parameters. Expanding on this, I believe it allows to create even more flexible service integration platforms like IFTTT and Zapier.

As another example of parsing, ChatGPT can check OCR-scanned receipts whether it includes a price in Japanese yen with high accuracy without preprocessing. Of course, this could be done with dedicated models, which would offer higher quality and, more importantly, cost advantages for specific purposes. However, the biggest advantage of ChatGPT is the ability to write parsing rules in natural language.

This means that, in the aforementioned case of the receipt, an accounting staff could add check logic in natural language, adjusting the specifications as needed. Previously, this would have required the help of software or machine learning engineers, but now many people can quickly perform these tasks, bringing the cost of additions and changes closer to zero. Being able to program easily using everyday language is a significant point in enabling adaptable services and flexible operations.

An interesting example of parsing is the HuggingGPT research project or service.

HuggingGPT analyzes user input to determine which models from HuggingFace should be used in what order, then processes the input in sequence and returns the results. This service operates on GPT-3.5 due to the high-quality prompts, but with GPT-4, even more casual prompts could be processed correctly. I expect more services like this will emerge in the future.

Converting Structured Data into Natural Language or Actions

As I mentioned, the reverse is also possible. In the context of integrating external data, ChatGPT can generate appropriate responses based on the obtained data. While API input and output formats can be difficult for most people to understand, combining them with the structuring process mentioned earlier allows you to convert natural language inputs into structured data, process the structured data using conventional methods, and then convert the results into easily understandable sentences for output.

For example, if emotion parameters already exist, as in the previous section, you can input them to generate output that takes those emotions into account. You can also create personalized push messages based on user attributes. Previously, converting structured data into user-friendly expressions required a process of designing and implementing specifications, but with ChatGPT, you can create natural outputs simply by specifying the prompt.

Adding more elements, you can not only generate natural sentences but also produce natural actions. For instance, consider extending the push message example to have the conversation AI generate the best timing and content for talking to the user based on their schedule. By inputting the necessary information, ChatGPT can output the best timing for the context. Although “best” might sound vague, it genuinely produces naturally-appearing timings. The frequency and timing can be adjusted flexibly just by specifying the prompt, allowing you to set a fixed time or a context-dependent timing.

The capability to transform structured data with information and actions that are natural to humans means that existing structured services and data can be linked together to generate results through multiple processes, with human input added to the feedback. The hub of each process will be the prompt. The concept of using prompts as the hub for components that include human input seems to be a key point in planning future services.

Others to be considered

I’ve raised important points to consider when implementing ChatGPT in various applications. Here’s a brief summary of the concerns mentioned

Data Security

Regarding OpenAI, there was a recent announcement that OpenAI will not be used as training data when used via API, but you should carefully check the overall terms of use. Another option is Microsoft’s Azure OpenAI Service, so it is necessary to understand the differences in policies and consider which method is best suited for your needs.


GPT-4 is expensive. GPT-3.5-turbo is not so expensive, but still can be expensive for some applications. Therefore, it is necessary to consider whether it produces a reasonable effect for the cost of one query. Of course, you can consider to use it for value verification as an upfront investment in anticipation of replacing it with a dedicated model in the future. However, I don’t think it is yet as inexpensive to use as WebAPI, at least not yet.


GPT-4 is not fast. Depending on the length of the prompt, it often takes several seconds (although still very fast considering the model size…).
So using it as is in real-time interactions may result in a stressful UX. Of course, there are many possible ways to use it, such as streaming output like ChatGPT UI, making response generation and actual response output asynchronous, or using it in back-end processing in the first place. However, since you cannot yet expect the same response as the general Web, you need to think about how to incorporate the response with latency in mind.

Freshness of Data, Linkage to Specific Information

This is stated everywhere, but recent information is not learned. Therefore, in order to work with new information, it is necessary to implement for it. For providers of publicly available information, the ChatGPT Plugin seems to be the best way to go in the future. For closed use cases, on the other hand, it needs to be implemented individually. There are already many examples of implementations, for example using rinna technology, which are described in the following article.


There are many other things to consider when introducing Generative AI into a product. I gave a presentation at Product Tunk Tokyo, and I hope you will find the following summary of my presentation useful.

At the End

In this article, I wrote about how to use ChatGPT in product development, particularly in produce development using Generative AI.

Of course, there are areas where it works well and areas where it doesn’t, but I believe there are many possible applications. Among the ones introduced this time, I find it very powerful for prototype development, evaluation, and conversion between structured and unstructured data.

While I find it convenient, I also think that it hasn’t yet surpassed those who specialize in certain domains. In other words, the areas I find convenient indicate that my capabilities are limited in those domains. That being said, it’s comforting to have an assistant with decent abilities in many domains, which can help strengthen our capabilities in various fields.

While I find it convenient, I also think that it hasn’t yet surpassed those who specialize in certain domains. In other words, the areas I find convenient indicate that my capabilities are limited in those domains. That being said, it’s comforting to have an assistant with decent abilities in many domains, which can help strengthen your capabilities in various fields.

With the arrival of ChatGPT, my work style has changed, and I feel excited that ideas that were previously difficult to realize now seem more achievable. I also feel thrilled when I imagine people creating new services that didn’t exist before. It has only been a few weeks since GPT-4 was released, but I want to continue using and enjoying it to the fullest.

(I’ve omitted all specific prompts mentioned in this article because it would make it too long. If you’re interested, please feel free to contact me on LinkedIn!)