Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tokenize string with zero length components
#1
I am in need to tokenize a string which contains some zero length components as in the example which it follows :

Macro temp
Trigger SF12     Help - how to add the trigger to the macro
Code:
Copy      Help
str s="164```````````````123```"
ARRAY(str) arr
int nt = tok(s arr 16 "`")
out nt

In the above example nt is accessed as 2. I wonder whether there exists a flag to obtain the actual number of tokens, which is 16 in the above example.

Let me add that it works properly using a findrx routine, but programmatically it is slower.
#2
Function RegexSplitString
Code:
Copy      Help
;/
function# $s $rx [ARRAY(str)&as] [ARRAY(POINT)&ap] [findrxFlags]

;Gets parts of string separated by substrings that match a regular expression.
;Returns the number of tokens (array length).

;s - string.
;rx - regular expression (separator).
;as - variable that receives string parts. Can be omitted or 0.
;ap - variable that receives positions of string parts in s. Can be omitted or 0.
;findrxFlags - <help>findrx</help> flags. The function uses findrx to find separators; adds flag 4 (find all).

;REMARKS
;The arrays always will have the number of found separators + 1. If s begins or ends with separator, then the arrays will have an empty element at the beginning or end.

;EXAMPLE
;out
;str s="aa bb  cc[9]dd"
;ARRAY(str) as; ARRAY(POINT) ap
;out RegexSplitString(s "\s+" as ap)
;int i
;for i 0 as.len
,;out F"[{as[i]}]"
,;out F"{ap[i].x} {ap[i].y}"


opt noerrorshere

if(&as) as=0
if(&ap) ap=0

ARRAY(POINT) _a
findrx(s rx 0 findrxFlags|4 _a)

int i nSep(_a.len) nTok(nSep+1) iFrom iTo(len(s))

if(&as) as.create(nTok)
if(&ap) ap.create(nTok)
for i 0 nSep
,POINT& r=_a[0 i]; int _to=r.x
,if(&ap) ap[i].x=iFrom; ap[i].y=_to
,if(&as and _to>iFrom) as[i].get(s iFrom _to-iFrom)
,iFrom=r.y

if(&ap) ap[i].x=iFrom; ap[i].y=iTo
if(&as and iTo>iFrom) as[i].get(s iFrom iTo-iFrom)

ret nTok

Macro Macro2773
Code:
Copy      Help
out
str s="164```````````````123```"
ARRAY(str) arr
RegexSplitString s "`" arr
int i
for i 0 arr.len
,out F"[{arr[i]}]"

Or, if the string is a valid single-line CSV, use ICsv interface.
#3
Dear Gintaras, many thanks indeed. Let me attach my approach. It is implied that yours is elegant and powerful. Best regards.

Function tempf12
Code:
Copy      Help
str subject="`164```````````````123```"

str pattern="`"
ARRAY(str) arr
int i i0 i1 nt
ARRAY(CHARRANGE) a
nt=findrx(subject pattern 0 4 a)
;s.get(subject 29 43-29)
;out s
out nt
if nt=0; ret
arr.create(nt+1)

;1st Element
i0=a[0 0].cpMax
arr[0].get(subject  0 i0-1)

;Subsequent Elements
for i 0 a.len-1
,i0=a[0 i].cpMax
,i1=a[0 i+1].cpMin
,arr[i+1].get(subject  i0 i1-i0)
;,out F"{i} {i0} {i1} {s}"

for i 0 arr.len
,out F"{i} {arr[i]}"


Forum Jump:


Users browsing this thread: 1 Guest(s)