Prompt engineering and evaluating AI-generated copy

LLM engineering and evaluation is still a relatively new field. During my time at Thumbtack, I learned new skills in this next generation of content editing.

One of my projects in 2025 involved using LLMs to pre-fill messages that homeowners could send to a home care professional. These messages would be sent whenever a customer sent a job request for work to be done on their house.

LLM-generated customer messages: mini case study

One of our pro product managers came to me with a problem. Customers in our baseline product filled out a job request for some kind of home care - be it landscaping, roof repair, plumbing, you name it. They would pick one or more pros to send their job request to, and ask for a quote and availability.

They could also optionally send a message to the pro, with any relevant information not covered in the details of the job request. This could include any special needs, a few initial suggestions for dates and times, or something else the customer felt their pro should know about the house.

Project goals

Over time, pros had started to rely on these customer messages to help them understand the customer’s project better. Often pros preferred to look at the customer messages first before looking at the details of the project, because the message was more conversational and created an open door through which to reply to the customer. Pros were more likely to reply same-day and quickly if the customer filled these messages out. Otherwise they tended to shelve the projects for looking at later, when they had time to comb the details.

However, these customer messages were optional, and only 32% of customers created one. The rest opted to skip this step and leave the message field blank. As a business, we felt that making the message mandatory would cause too much friction for the average customer.

Among the 32% of messages sent, many of them contained very little helpful information for the pro. Customers often used the space to write things like:

“Could you let me know your rate per hour? I’m shopping around. Thank you.”

…or,

“I was wondering what the process would look like, and if it’d be possible for this to be completed soon. Let me know.”

This was better than nothing, because it still opened that door of communication for the pro to walk through. But it meant more back-and-forth between the pro and the customer, often involving the pro asking for details that could’ve been supplied in the initial message. During that back-and-forth, the customer or pro might get busy or distracted with other priorities, leading to less of these leads converting to finished jobs on Thumbtack.

The PM asked me if there was some way to increase the number of customer messages, improve the quality of these messages, and therefore, get pros replying faster and potentially converting more leads into jobs.

Content goals

I came up with the idea of using an LLM to prefill the message box with a sample message. This sample message would include some combination of regular conversation, a summary of the project details filled out by the customer, and a request for a reply from the pro. The customer could then choose to A) send the message as-is, B) edit the message to their heart’s content, and then send, or C) wipe it clean and write their own or leave it blank.

It was important to me that the prefilled message:

Sound plausibly human, like something a customer might actually type;
Have some variety and differentiation, so that pros weren’t getting identical-sounding messages from different customers; and
Contain the most useful information from the customer’s project details, for use by the pro; and
Do it all in the briefest space possible - preferably under 300 characters, to match the average length of messages currently being sent by customers.

A 5th outstanding goal was immediately in flux right from the beginning. I believed the messages should be flagged as having an AI influence. I felt we shouldn’t be trying to fool our pros into thinking these messages were always 100% authentically coming from the customer.

Besides feeling like the honest thing to do, I worried that AI generated messages could never be perfect, and pros might start to ‘sniff out’ the AI influence. If we didn’t label it AI upfront, they might find the whole process disingenuous, and feel that the overall lead quality was low.

The product manager argued that, since the customer could edit the messages at any time, this flag wasn’t necessary, since theoretically, every message was at least approved, if not contributed to, by the customer.