This article will be discussing regular expression capabilities that have been incorporated into SQL. Previously, the LIKE operator could only be used for simple pattern matching of a string; however, new operators, such as REGEXP_LIKE, can be used to do powerful regular expression pattern matching.
Regular expressions can contain multiple sub-expressions. The REGEXP_LIKE operator supports various meta-characters, such as plus (+), question mark (?) and star (*) quantifiers.
- plus (+): Matches one or more occurrences of the preceding sub-expression.
- question mark (?): Matches zero or one occurrence of the preceding sub-expression.
- star (*): Matches zero or more occurrences of the preceding sub-expression.
So let’s create an environment to test out some examples using binary numbers. We’ll use a simple function to convert a decimal number to binary number. Then we’ll create a table to hold decimal and binary numbers from a loop that inserts one million records.
SQL> CREATE OR REPLACE FUNCTION TO_BINARY (N IN NUMBER) RETURN VARCHAR2 IS
2 BINVAL VARCHAR2(64);
3 N2 NUMBER := N;
4 BEGIN
5 WHILE ( N2 > 0 ) LOOP
6 BINVAL := MOD(N2, 2) || BINVAL;
7 N2 := TRUNC( N2 / 2 );
8 END LOOP;
9 RETURN BINVAL;
10 END TO_BINARY;
11 /
Function created.
SQL> CREATE TABLE ATABLE
2 (ANUMBER NUMBER,
3 AVARCHAR2 VARCHAR2(50)
4 );
Table created.
SQL> DECLARE
2 N NUMBER := 1000000;
3 BEGIN
4 FOR I IN 1..N LOOP
5 INSERT INTO ATABLE VALUES (I, TO_BINARY(I));
6 END LOOP;
7 END;
8 /
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.
Now let’s use some regular expressions to try and find matches in our table. There can be large number of rows matching this criteria, so we will only retrieve 5 rows in these examples.
Let’s try to find records containing at least one occurrence of 111, at most one occurrence of 000 and at least one occurrence of 11:
SQL> SELECT ANUMBER, AVARCHAR2
2 FROM ATABLE
3 WHERE REGEXP_LIKE (AVARCHAR2, '(111)+(000)?(11)+')
4 AND ROWNUM <=5;
ANUMBER AVARCHAR2
---------- --------------------------------------------------
1148 10001111100
1149 10001111101
1150 10001111110
1151 10001111111
1183 10010011111
As you can see, all these rows have at least one occurrence of 111 and 11. Note that all of these rows do not start with 111 or end with 11 as we specified that those sequences just have to be somewhere in the string.
There are two meta-characters cap (^) and dollar ($) to specify start and end line anchors respectively. So let’s find out same pattern, containing at least one occurrence of 111 at start, an optional 000 and at least one occurrence of 11 at end.
SQL> SELECT ANUMBER, AVARCHAR2
2 FROM ATABLE
3 WHERE REGEXP_LIKE (AVARCHAR2, '^(111)+(000)?(11)+$')
4 AND ROWNUM <=5;
ANUMBER AVARCHAR2
---------- --------------------------------------------------
31 11111
127 1111111
227 11100011
255 11111111
511 111111111
With regular expressions, we can match patterns as shown, but we can also specify the number of matches as well. We can incorporate the following expressions into our matching:
- {m} – Matches exactly m occurrences of the preceding sub-expression.
- {m,} – Matches at least m occurrences of the preceding sub-expression.
- {m,n} – Matches at least m, but not more than n occurrences of the preceding sub-expression.
So let’s try to find three consecutive occurrences of 10:
SQL> SELECT ANUMBER, AVARCHAR2
2 FROM ATABLE
3 WHERE REGEXP_LIKE (AVARCHAR2, '(10){3}')
4 AND ROWNUM <=5;
ANUMBER AVARCHAR2
---------- --------------------------------------------------
1192 10010101000
1193 10010101001
1194 10010101010
1195 10010101011
1236 10011010100
Or we can find 6 to 8 consecutive occurrences of 10:
SQL> SELECT ANUMBER, AVARCHAR2
2 FROM ATABLE
3 WHERE REGEXP_LIKE (AVARCHAR2, '(10){6,8}')
4 AND ROWNUM <=5;
ANUMBER AVARCHAR2
---------- --------------------------------------------------
2730 101010101010
5460 1010101010100
5461 1010101010101
6826 1101010101010
10922 10101010101010
We can also use the pipe (|) meta-character to specify an “OR” in our pattern matching. For example, say we want to find occurrences of 1111111 or 0000000:
SQL> SELECT ANUMBER, AVARCHAR2
2 FROM ATABLE
3 WHERE REGEXP_LIKE (AVARCHAR2, '(1111111|0000000)+')
4 AND ROWNUM <=5;
ANUMBER AVARCHAR2
---------- --------------------------------------------------
1151 10001111111
1152 10010000000 -
1278 10011111110
1279 10011111111
1280 10100000000
Those are some easy examples to introduce the power of regular expressions within SQL. Please refer to the Oracle regular expression syntax for further details on regular expressions supported with SQL.
Link to script that contains the examples in this post.