int findrx(string pattern [from|rf] [flags] [result] [submatch])
string - string to search in.
pattern - regular expression that matches substring to find. String.
from - zero-based character index, from which to start search. Default 0.
rf - address of variable of type FINDRX.
flags - combination of values listed below. Default: 0.
| 1 | Case insensitive. |
| 2 | Whole word. This adds \b to the beginning and end of pattern. |
| 4 | Find all. Valid only if result is array. |
| 8 | Multiline. If this flag is set (or (?m) is used in pattern), ^ and $ match the beginning and end of line. Default: ^ and $ match the beginning and end of whole string. |
| 16 | Don't need submatches. Set this flag if you use result array, and performance is important. |
| 32 | QM 2.3.0. Convert pattern from UTF-8 to ANSI. Used when QM is running in Unicode mode (ignored otherwise). Set this flag if pattern contains non ASCII characters, but string is ANSI (not UTF-8). It is needed because these characters in pattern normally consist of 2 or 3 bytes, whereas characters in string consist of 1 byte. |
| 128 | Only compile pattern. |
| pcre flags |
result - variable of type str, int, ARRAY(str) or ARRAY(CHARRANGE).
submatch - submatch to find. Integer. If 0 (default), finds whole submatch. Not used if result is array.
Finds a substring in string. To specify the substring, is used regular expression (patern). The function can find a whole match, a submatch, or all matches and submatches. A match is the part of string that matches pattern. A submatch is the part of the match that matches a captured subpatern. A captured subpattern is the part of pattern that is enclosed in parentheses and does not begin with ?.
The return value depends on flags and arguments:
| default | zero-based index of first character of the match in string, or -1 if not found. |
| nonzero submatch | zero-based index of first character of the submatch in string, or -1 if not found. |
| flag 4 | the number of found matches, or 0 if not found. |
| flag 128 | undefined |
result can be used to get more information about the found match and submatches. The following table shows what is stored to result depending on its type. Assume that flag 4 is not used.
| str | receives the match or submatch (if submatch is nonzero). If flag 128 is set, receives the compiled pattern. |
| int | receives the length of the match or submatch. |
| ARRAY(str) | receives the match in element 0 and submatches in subsequent elements. |
| ARRAY(CHARRANGE) | receives start and end offsets of the match and submatches. To extract the matched substring as separate string, use str function get (see examples). |
Definition of CHARRANGE (defined by QM):
type CHARRANGE cpMin cpMax
cpMin - start of substring (match or submatch) in string. It is zero-based index of first character of substring in string.
cpMax - end of substring.
If flag 4 is set and result is array, finds all matches. It creates two-dimensional array. To access an element, use result[x y], where y is the index of the found match (0 - first match, 1 - second match, ...), and x is 0 or submatch index (0 - whole match, 1 - first submatch, ...). For example, result[0 0] contains first match, result[0 1] - second match, result[1 0] - first submatch of first match.
If flag 128 (only compile) is set, and result is str variable, the function does not search. It only compiles pattern and stores compiled data into result variable. You can use that variable later with functions findrx and str.replacerx as pattern. If multiple operations are performed with the same pattern, using compiled pattern improves performance, because then pattern does not have to be compiled each time. To compile pattern, are used only pattern, flags and result. You should use same flags value when compiling and later.
Find digits (10) str subject="abc10 100 def" out findrx(subject "\d+") Find digits as whole word (100), and store into s str subject="abc10 100 def" str s if(findrx(subject "\d+" 0 2 s)>=0) out s Extract HTML tags (simplified; useful only as "find all" example) str html IntGetFile("http://www.google.com" html) str pattern="<(.*)>.*<\/\1>" ;;matches a HTML tag ARRAY(str) a findrx(html pattern 0 4 a) int i for(i 0 a.len) out "submatch=%s, whole=%s" a[1 i] a[0 i] Extract URL components str subject="http://msdn.microsoft.com:80/scripting/default.htm" str pattern="(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)" int i; ARRAY(str) a if(findrx(subject pattern 0 0 a)<0) out "does not match"; ret for i 0 a.len out a[i] Extract URL components; show offsets and lenghts str subject="http://msdn.microsoft.com:80/scripting/default.htm" str pattern="(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)" int i; ARRAY(CHARRANGE) a if(findrx(subject pattern 0 0 a)<0) out "does not match"; ret for i 0 a.len int offset(a[i].cpMin) length(a[i].cpMax-a[i].cpMin) str s.get(subject offset length) out "offset=%i length=%i %s" offset length s Extract only server from URL str subject="http://msdn.microsoft.com:80/scripting/default.htm" str pattern="(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)" str server if(findrx(subject pattern 0 0 server 2)>=0) out server