regex

regex is a big subject. You can buy entire books about regex, and if you are relatively young you will probably live long enough to finish reading one. In a language as exactly specified as C++, any regex implementation will necessarily be complex because of length alone. Note that there is a good discussion of the essentials of the TR1 implementation on John Cook's site. We will cover some of the same ground.

Fortunately, the functionality is available via a single include:

#include <regex>

For many (most?) uses, there are three things to get done before you can use the core functions, and there are two or three functions in that core that will do most of the heavy lifting.

  1. Put the text to be searched in an iter-able container (usually a std::string).
  2. Put the regex into a regex container.
  3. Figure out what options you want to use.

The first two steps are easy. Let's find the vowels in my name:

std::string s("George Kelly Flanagin"); // an iter-able container.
std::regex  r("[aeiouy]");  

Naturally, there are a number of switches that can be passed to the functions that make up the public interface of the library, and they more or less tell the tale of what can be done. The switches are all constants that are a part of the regex::regex_constants namespace. These are the syntax_option_type, and you and your team should probably agree on a default set to avoid a maintenance nightmare.

icase      : ignore case.
nosubs     : forget about capturing subexpressions that match.
optimize   : aim for speed of matches rather that construction of the matches.
collate    : pay attention to locale when using ranges like [a-f]
ECMAScript : Javascript style syntax for the expressions, themselves. (This is
             the default, btw).
basic      : basic POSIX regex syntax.
extended   : extended POSIX regex syntax.
awk        : POSIX awk utility syntax.
grep       : POSIX grep utility syntax.
egrep      : POSIX grep utility syntax you get from the "-e" option.

We (optionally) use some of the above parameters in the examination/matching/search. It is important to keep in mind that these constants are implementation defined, so always use the symbols rather than the values of these symbols.

And now we come to the functions. The two that do the finding are regex_search and regex_match. If you guessed that they are available in many flavors of overloads, you are correct -- the overload definitions fill several pages of the standard. There is also regex_replace which allows you to duplicate the functionality of editors like sed and vi within your program.

 

Last updated 2014-07-19T15:44:11+00:00.

Links to the standard

regex is covered in section 28, sandwiched between iostream and atomic operations.

Benefits

C++ finally joins most of the other languages in having a built-in implementation of the regex functionality.

Risks

Programmers have been writing regex code for years, and the current regex implementation has been bounced around since TR1, and a similar implementation exists in boost. What does this mean? It is going to be hard to get everyone together.