Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help understanding unicode and findrx
#1
Hi

I think I've found a bug in my understanding and was hoping for an upgrade Big Grin

I'm using findrx to find text in the window of another application and then uses windows messages like EM_SETSEL and outp to update the text in the other application; it seems to work fine in ANSI mode but I think I want it to run in Unicode mode (since this seems more general). Everything works fine until certain characters appear in the text (like an n with a tilde - ñ). When this happens the selections are off by the number of 'special' characters that occur in the text (it is as if each of these characters counts as two). If the encodings were inconsistent between Unicode and ANSI this would make sense - maybe all I need to know is how to make these consistent.

Here is a simple function that displays the text matching a regular expression from another application:

Function TestReplaceUnicodeAnsi
Code:
Copy      Help
function'int int'hwndre str'findthis

str windowContents.getwintext(hwndre); if(!windowContents.len) ret -3
windowContents.findreplace("[]" "[10]")
str findString = findthis

ARRAY(CHARRANGE) a
int flag = 4
int isFound = findrx(windowContents findString 0 flag&3|4|8|32 a)
out F"isFound = {isFound}"
if (isFound)
,for _i 0 a.len
,,out F"a[{_i}] min ={a[_i 0].cpMin},  max={a[_i 0].cpMax}"
,,int nc = a[_i 0].cpMax - a[_i 0].cpMin
,,str substr.get(windowContents a[_i 0].cpMin nc)
,,out F"Selected string is '{substr}'"

and this function is invoked like this:
Macro RunTestReplaceUnicodeAnsi
Code:
Copy      Help
int w=win("PowerScribe 360 | Reporting" "WindowsForms10.Window.8.app.0.3ce0bb8_r13_ad1")
int c=child("" "*.RICHEDIT50W.*" w 0x0 "wfName=rtbReport") ;;editable text

str findThis =  "(?m)(?<=IMPRESSION:)\S?\s{{0,2}\w?"

int testResult = TestReplaceUnicodeAnsi(c findThis)

out F"TestResult is {testResult}"

If the original text contains this string:
Quote:ññññ

IMPRESSION:
Nodule in the left lung

then the output in the console when Unicode is not selected in QM is:
Quote:isFound = 1
a[0] min =1209, max=1211
Selected string is '
N'
TestResult is 0
which is what I want - the first character after "IMPRESSION:" and some whitespace.

If I run the same code after setting Unicode on in Tools->Options - I get
Quote:Unicode:

isFound = 1
a[0] min =1213, max=1215
Selected string is '
N'
TestResult is 0
which is also fine - I get exactly the selected string I want. Since there are four 'ñ' characters the offsets are different by 4. But when I want to update the text in the other window (say I want to highlight it) it is misaligned:
Code:
Copy      Help
SendMessage hwndre EM_SETSEL a[0 i].cpMin a[0 i].cpMax
works just fine in the non-Unicode case - but if Unicode is enabled in Tools->Options the offsets are off.
Code:
Copy      Help
IsWindowUnicode(hwndre)
returns true so I am assuming that I should be using Unicode.

My confused question: Is there a way to generate the offsets using Unicode such that when I do a selection for pasting or highlighting that the offsets are consistent? Or do I have a bad mental model for how this works? Thanks.
#2
Good model.
To generate character offsets:
Function StrLenToCharCount
Code:
Copy      Help
;/
function# $s nBytes

;Converts byte count to character count in string.
;Returns the number of characters that corresponds to nBytes.

;s - string.
;nBytes - can be string length or some offset in string, in bytes.
;;;If -1, this function calls len() to get string length.

;REMARKS
;In Unicode mode, non-ASCII characters consist of more than 1 byte.
;len() and other QM string functions always use byte count.


opt noerrorshere 1
if(nBytes<0) nBytes=len(s)
if(!_unicode) ret nBytes
ret MultiByteToWideChar(_unicode 0 s nBytes 0 0)
Macro Macro2347
Code:
Copy      Help
str s="ąbcž"
out s.len
out StrLenToCharCount(s s.len)
Macro Macro2348
Code:
Copy      Help
SendMessageW hwndre EM_SETSEL StrLenToCharCount(windowContents a[0 i].cpMin) StrLenToCharCount(windowContents a[0 i].cpMax)


Forum Jump:


Users browsing this thread: 1 Guest(s)