KNOWLEDGE BASE ARTICLE

Reference CJK characters with REGEX

To perform regular expression queries on non latin characters you need to reference their character types. In the case of the Chinese, Japanese, Korean characters sets, use the \p{IsCJKUnifiedIdeographs} selector.

For example, to capture only CJK characters from a mixed language text block, use a regex something like below...

REGEX(\p{IsCJKUnifiedIdeographs}+)

Here is a full list of language classes that are available.

Link to this article http://umango.com/KB?article=94

KNOWLEDGE BASE ARTICLE

Reference CJK characters with REGEX

Related Tags