Tokenize (split) string

Syntax

int tok(string arr [n] [delim] [flags] [arr2])

 

Parameters

string - string to tokenize. Usually str variable.

arr - array for tokens. Can be variable of type ARRAY(str) or ARRAY(lpstr). Also can be pointer-based array of str or lpstr. Can be 0.

n - max. number of tokens required. Default: -1. If n is negative, gets all tokens.

delim - delimiters.

flags:

1 In string, substitute first delimiter character after each token to 0. It is useful when arr is array of lpstr. When using this flag, string must not be constant.
2 If there are more than n tokens, get whole right part as last (n-1 th) token. For example, if string is "a b c" and n is 2, you will get "a" and "b c". Without this flag you would get "a" and "b".
4 Don't split parts enclosed in " ".
8 Don't split parts enclosed in ( ).
16 Don't split parts enclosed in [ ].
32 Don't split parts enclosed in { }.
64 Don't split parts enclosed in < >.
128 Don't split parts enclosed in ' '.
0x100 delim is table of delimiters.
0x200 QM 2.3.1. Recursive parsing of parts enclosed in ()[]{}<>. For example, when parsing string "<a (b > c) d>" with flags 8|64, you would get 3 tokens: "a (b ", "c" and "d". With flags 8|64|0x200 will be 1 token: "a (b > c) d".
0x400 QM 2.3.1. Don't apply this default behavior of parsing parts enclosed in ()[]{}<>: 1. Characters )]}> in parts enclosed in "" are ignored. 2. A single character )]}> in ' ' is ignored.

arr2 - array for parts between (after) tokens. Can be 0. Default: 0.

 

Remarks

Parses string and returns number of tokens. If arr is used (not 0), populates it with tokens. If arr2 is used, populates it with delimiters that follow tokens. If arr and arr2 are 0, just returns number of tokens.

 

If arr is array of str, it receives copies of tokens. If it is array of lpstr, it receives pointers to tokens within string (you should also use flag 1), which is faster.

 

When using flags 4-128 and delim, delim must contain these characters too.

 

Tips:

Although this function can be used to get lines of a multiline string, there is a simpler way to do it. See example3. Also you can use foreach, findl, str.getl.

To parse strings also can be used regular expressions (findrx with flag 4, str.replacerx). Also can be used other string functions, like find, findc, findw.

 

Example1

str s = "one two three"
ARRAY(str) arr
int i nt
nt = tok(s arr)
for(i 0 nt) out arr[i]
 Output:
 one
 two
 three

 

Example2

str s = "one, (two + three) four five"
ARRAY(str) arr arr2
int i nt
nt = tok(s arr 3 ", ()" 8 arr2)
for(i 0 nt) out "'%s' '%s'" arr[i] arr2[i]
 Output:
 'one' ', ('
 'two + three' ') '
 'four' ' '

 

Example3

str s = "one[]two[]three"
ARRAY(str) arr = s
for(int'i 0 arr.len) out arr[i]
 Output:
 one
 two
 three

 

Example4

str s="abcdef"
int i
 Split s into characters as strings:
ARRAY(str) a.create(s.len)
for(i 0 a.len) a[i].get(s i 1)
 Split s into characters as character codes:
ARRAY(int) b.create(s.len)
for(i 0 b.len) b[i]=s[i]