This article will be discussing regular expression capabilities that have been incorporated into SQL. Previously, the LIKE operator could only be used for simple pattern matching of a string; however, new operators, such as REGEXP_LIKE, can be used to do powerful regular expression pattern matching.

Regular expressions can contain multiple sub-expressions. The REGEXP_LIKE operator supports various meta-characters, such as plus (+), question mark (?) and star (*) quantifiers.

  • plus (+): Matches one or more occurrences of the preceding sub-expression.
  • question mark (?): Matches zero or one occurrence of the preceding sub-expression.
  • star (*): Matches zero or more occurrences of the preceding sub-expression.

So let’s create an environment to test out some examples using binary numbers. We’ll use a simple function to convert a decimal number to binary number. Then we’ll create a table to hold decimal and binary numbers from a loop that inserts one million records.

SQL> CREATE OR REPLACE FUNCTION TO_BINARY (N IN NUMBER) RETURN VARCHAR2 IS
  2    BINVAL VARCHAR2(64);
  3    N2 NUMBER := N;
  4  BEGIN
  5    WHILE ( N2 > 0 ) LOOP
  6       BINVAL := MOD(N2, 2) || BINVAL;
  7       N2 := TRUNC( N2 / 2 );
  8    END LOOP;
  9    RETURN BINVAL;
 10  END TO_BINARY;
 11  /

Function created.

SQL> CREATE TABLE ATABLE
  2     (ANUMBER NUMBER,
  3      AVARCHAR2 VARCHAR2(50)
  4     );

Table created.

SQL> DECLARE
  2    N NUMBER := 1000000;
  3  BEGIN
  4    FOR I IN 1..N LOOP
  5       INSERT INTO ATABLE VALUES (I, TO_BINARY(I));
  6    END LOOP;
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

Now let’s use some regular expressions to try and find matches in our table. There can be large number of rows matching this criteria, so we will only retrieve 5 rows in these examples.

Let’s try to find records containing at least one occurrence of 111, at most one occurrence of 000 and at least one occurrence of 11:

SQL> SELECT ANUMBER, AVARCHAR2
  2    FROM ATABLE
  3   WHERE REGEXP_LIKE (AVARCHAR2, '(111)+(000)?(11)+')
  4     AND ROWNUM <=5;

   ANUMBER AVARCHAR2                                                            
---------- --------------------------------------------------                   
      1148 10001111100                                                          
      1149 10001111101                                                          
      1150 10001111110                                                          
      1151 10001111111                                                          
      1183 10010011111

As you can see, all these rows have at least one occurrence of 111 and 11. Note that all of these rows do not start with 111 or end with 11 as we specified that those sequences just have to be somewhere in the string.

There are two meta-characters cap (^) and dollar ($) to specify start and end line anchors respectively. So let’s find out same pattern, containing at least one occurrence of 111 at start, an optional 000 and at least one occurrence of 11 at end.

SQL> SELECT ANUMBER, AVARCHAR2
  2    FROM ATABLE
  3   WHERE REGEXP_LIKE (AVARCHAR2, '^(111)+(000)?(11)+$')
  4     AND ROWNUM <=5;

   ANUMBER AVARCHAR2                                                            
---------- --------------------------------------------------                   
        31 11111                                                                
       127 1111111                                                              
       227 11100011                                                             
       255 11111111                                                             
       511 111111111

With regular expressions, we can match patterns as shown, but we can also specify the number of matches as well. We can incorporate the following expressions into our matching:

  • {m} – Matches exactly m occurrences of the preceding sub-expression.
  • {m,} – Matches at least m occurrences of the preceding sub-expression.
  • {m,n} – Matches at least m, but not more than n occurrences of the preceding sub-expression.

So let’s try to find three consecutive occurrences of 10:

SQL> SELECT ANUMBER, AVARCHAR2
  2    FROM ATABLE
  3   WHERE REGEXP_LIKE (AVARCHAR2, '(10){3}')
  4     AND ROWNUM <=5;

   ANUMBER AVARCHAR2                                                            
---------- --------------------------------------------------                   
      1192 10010101000                                                          
      1193 10010101001                                                          
      1194 10010101010                                                          
      1195 10010101011                                                          
      1236 10011010100

Or we can find 6 to 8 consecutive occurrences of 10:

SQL> SELECT ANUMBER, AVARCHAR2
  2    FROM ATABLE
  3   WHERE REGEXP_LIKE (AVARCHAR2, '(10){6,8}')
  4     AND ROWNUM <=5;

   ANUMBER AVARCHAR2                                                            
---------- --------------------------------------------------                   
      2730 101010101010                                                         
      5460 1010101010100                                                        
      5461 1010101010101                                                        
      6826 1101010101010                                                        
      10922 10101010101010

We can also use the pipe (|) meta-character to specify an “OR” in our pattern matching. For example, say we want to find occurrences of 1111111 or 0000000:

SQL> SELECT ANUMBER, AVARCHAR2
  2    FROM ATABLE
  3   WHERE REGEXP_LIKE (AVARCHAR2, '(1111111|0000000)+')
  4     AND ROWNUM <=5;

   ANUMBER AVARCHAR2                                                            
---------- --------------------------------------------------                   
      1151 10001111111                                                          
      1152 10010000000                                -                          
      1278 10011111110                                                          
      1279 10011111111                                                          
      1280 10100000000

Those are some easy examples to introduce the power of regular expressions within SQL. Please refer to the Oracle regular expression syntax for further details on regular expressions supported with SQL.

Link to script that contains the examples in this post.

 

 

 

 

 

 

 

Comment