Get stem of a word

Syntax

s.stem([flags] [from] [wordlen])

 

Parameters

s - str variable containing the word or words. Unless flag 1 used, it should contain single word, or it can contain multiple words and the word is specified using from and wordlen.

flags:

1 Replace all words. The function breaks words assuming that word characters are a-z, A-Z, 0-9, _.
2 When used with flag 1, replace all nonword characters to spaces.

from - word offset in s. Default: 0 (from the beginng of s).

wordlen - word length. Default: -1 (until the end of s).

 

Remarks

Removes suffix from a word. For example, "worked", "working" and similar word forms are replaced to "work". Supports only English.

 

Added in QM 2.2.1.

 

The function is not to get exact root. It is to get common word form for searching. The result may be shorter, or not exactly match an existing word. For example, "happy" and "happiness" are replaced to "happi". The function usually does not change parts of speech, e.g. "worker" (noun) to "work" (verb). The function is not perfect. For example, it does not replace "found" to "find". It also does not remove prefixes.

 

When using from and wordlen, if the word is not at the end, the function replaces the suffix to space characters. If there was 0 (null) character at the end of the word (for example, after tok), instead replaces the suffix to 0 characters.

 

The function also makes the word lowercase, regardless of whether there was a suffix.

 

Stemming usually is used to find words that may be in any form, for example "work", "works", "worked", "working". For example, Google uses it.

 

Examples

str s="working"
s.stem
out s ;;"work"

s="THEY WORKED WELL"
s.stem(0 5 6)
out s ;;"THEY work   WELL"

str s="it worked well"
str s2="works"
s.stem(1|2)
s2.stem
int i=findw(s s2)
out i