In this edition of Editor Vs AI, I try to use ChatGPT-4 to write a wildcard search, and it does more harm than help.
The Challenge
In a document full of statistics, imagine that we want to change the “to” between values that are in brackets to an en dash—a common task. That’s the middle length dash (–) between a hyphen (-) and an em dash (—); in UK terms, these are “rules” as in a “ruled line,” rather than a dash. No other content should change because of this, and because there are many dozens of instances across many dozen pages, automating this change makes sense.
Meaning, we want instances like (95% CI 1.63 to 2.30) to be changed to (95% CI 1.63–2.30) in one “replace all” move as career copy editor Pamela Capraru put it to me, recently. And this should only be done in the bracketed statistics, not throughout the document.
ChatGPT-4’s Solution
The query
How you phrase queries to the chatbot really affects the outcome. That is, just like “Deep Thought” in The Hitchhiker’s Guide to the Galaxy, the answer won’t make sense if you ask the wrong question. For this one, I asked pretty much exactly what my colleague asked me:
Write a wildcard find and replace to change “(95% CI 1.63 to 2.30)”—with wildcards for the stats but not the percentage—to “(95% CI 1.63-2.30)”.
ChatGPT’s Solution
ChatGPT suggested using this Find value:
(95% CI [0-9].[0-9][0-9] to [0-9].[0-9][0-9])
And this REPLACE value:
(95% CI \1-\2)
The result is an error message from Word (shown below).
Why That Fails
While the [0-9] parts of this will correctly find any single digit from zero to nine, there are a few reasons this whole find string won’t work:
- In wildcards, brackets ( ) create an “expression” that the computer then treats as one entity. Those entities are interpreted as \1 and \2 (etc.), which we see in the replace field. That means ChatGPT has created a single expression out of this whole thing because it’s contained in one set of brackets.
- When Word tries to replace text with a first and second expression (the \1 and \2 pars), it finds there isn’t even a second expression for it to use as \2 in the replace step. That’s what Word means by “the replacement text contains a group number that’s out of range.”
Further, this only finds numbers with two decimal places. To find numbers with more or fewer decimal places, we can add the @ operator after each ], as explained below. And, of course, if we also need to find any percentage, not just 95%, we’d have to replace the 95% with [0-9][0-9]%, too.
A Human Solution
A human thinks out the logic this way:
- To write the query with wildcards, we need to replace all the numbers with [0-9].
- Then we add @ to have Word search for values with any number of decimal digits, not just one.
- Finally, we separate this text string (phrase) into separate expressions so there is (lead up) + to + (end): two expressions searching for variables separated by the word to.
To simplify further, we notice that we’re looking for only those stats that end in a closing parenthesis mark every time. That means we can ignore everything up to the word to because the bracket is such a strong and consistent identifier; end the query with a bracket. Because brackets act as wildcards, we need to add a backslash before one bracket to indicate that it is not a wildcard.
FIND: _to_([0-9].[0-9]@\))
REPLACE: –\1
Notes: This example uses an ‘underscore’ (_) character ONLY so that you can see here where the spaces are. In Word, you’ll type an actual space using the spacebar.
The en dash is typed directly into the replace field using a keyboard shortcut.
Though it is possible to think of this find string as having two parts, only the ( ) creates an “expression” as far as Word is concerned; the rest is just text. So the replace string refers to \1, the only expression it was given.
The solution may be even simpler if there aren’t number ranges elsewhere in the document for which you want to keep the word to in place. In those cases, you don’t even have to specify the closing bracket to limit the search to bracketed contents.
Here is a solution Rhonda Bracey developed. Adding the closing bracket mostly avoids the problem she mentions.
ChatGPT’s “no cigar” second solution
Of course I gave ChatGPT a second chance to get this right. It gets no cigar for this answer; it wasn’t even close. I asked it to “adjust that to catch numbers of any length.” (See image below.) But the answer it gave uses asterisks (*) which is a wildcard for “any character”, which is not what we want; it’s important to specify only digits, which the at symbol (@) does because it means “any amount of the previous parameter”. Also, it is placing a backslash (\) before each period. It does, suddenly, correctly annotate the brackets that are part of the search rather than a wildcard (explained above).
Troubleshooting
If you test out the winning F&R above, be sure to turn off Track Changes before using wildcards or else Word will royally mess up its work.
Don’t be lulled into using the RegEx ^# to search for ^# to ^#. The reason we need wildcards in this F&R is so that the “replace” parameters can be told to keep the numbers — whatever they are — and change only the word to (and the spacing around it).
Be sure to check out the rest of this Editor Vs AI series. If AI is coming for our editing jobs, it won’t be winning out today. ChatGPT-4 was released 14 March 2023. Career copy editor Pamela Capraru lives and works in Toronto.
One thought on “Editor [and] AI: Wildcard Searches”