~~Title: Regular Expression Syntax ~~ Regular expressions are a more powerful (and therefore complicated) form of wildcard pattern matching. Like [[pattern_matching_syntax|standard pattern matching]], they can be used throughout Opus. Generally, you have to specifically enable the use of regular expressions in a given situation - by default, Opus will assume standard pattern matching. For example, the **[[:file_operations:renaming_files:advanced_rename|Advanced Rename]]** dialog has a regular expression mode that you must select before regular expressions can be used. One advantage regular expressions have over standard pattern matching is they can enable a form of search and replace in certain functions. As an example, this is used in the **Rename** command. The "search" string is specified as a pattern to match against the original names of files. That pattern can indicate //capture groups// - expressions in the source string that are captured, and can be carried over to the new string (which acts as the "replace" string). As an example, imagine the **Rename** dialog is set to regular expression mode, with the following patterns supplied: **Old Name**: The (.*) Backup\.(.*)\\ **New Name**: \1.\2 \\ The two **(.*)** tokens in the //old name// string are capture groups - they "capture" whatever is matched by the expression within the parentheses. In this case, the expression inside the brackets is **.*** which simply means "match anything". So what this pattern will do is match any filename beginning with //The// and ending in //Backup//, and it will capture the middle of the filename for later use. The second **(.*)** will capture the file extension. The //new name// string can then re-use the captured text, and this is indicated with the **\1** and **\2** markers. So as an example, the original filename //The Lord Of The Rings Backup.avi// would be renamed to //Lord Of The Rings.avi//. **\1** refers to the first capture group, **\2** to the second, and so on. If you need the //new name// string to contain a literal \, use two together. For example, //abc%%\\%%xyz// will turn into //abc\xyz//. When used with the **Rename** command only, the //old name// pattern can be followed with a **#** character to indicate that the search and replace operation should be repeated multiple times. For example, the following regular expression rename will remove all spaces from the filename: **Old Name**: (.*)\s(.*)#\\ **New Name**: \1\2 \\ The **#** causes the search and replace to be repeated until the new name no longer changes. You can also specify a maximum repetition count by appending a number, for example **#5** at the end would repeat the operation no more than five times. There are many different variants of regular expression; by default Opus uses what's called //TR1 ECMAScript//. Microsoft has a [[http://www.gpsoft.com.au/DScripts/redirect.asp?page=regex|page on TR1]] that goes into far more detail than this help file can. $$ Token $$ Description $$ (#)**%%^%%** $$ **Start of a string**.\\ The caret is used to "anchor" the search to the start of the string. If the search is not anchored to either end, the pattern can match a sub-string of the target. For example:\\ **%%^%%abc** matches //abc//, //abcdefg//, //abc123//, but not //123abc//\\ **abc** also matches// 123abc//\\ $$ (#)**$** $$ **End of a string**.\\ The dollar sign is used to "anchor" the search to the end of the string. If the search is not anchored to either end, the pattern can match a sub-string of the target. For example:\\ **abc$** matches //abc//, //endsinabc//, //123abc//, but not //abc123//\\ **%%^%%abc(.*)123$** matches //abc123//, //abcxyz123//, but not //abcxyz123def//\\ $$ (#)**.** $$ **Any single character**.\\ The period (full stop) is used to match any single character. For example:\\ **a.c** matches //abc//, //aac//, //acc//, //adc// but not //acd//\\ $$ (#)* $$ **0 or more of previous expression**.\\ Matches zero or more occurrences of the previous expression. Combine with **.** to form the "match anything" token (**.***). For example:\\ **ab*c** matches //ac//, //abc//, //abbc//, //abbbc//, ...\\ **a.*c** matches //ac//, //abc//, //a123456c//, //aanythingc//, ...\\ **.*** matches anything\\ $$ (#)**+** $$ **1 or more of previous expression**.\\ Matches one or more occurrences of the previous expression. For example:\\ **ab+c** matches //abc//, //abbc//, //abbbc// but not //ac//\\ $$ (#)**?** $$ **0 or 1 of previous expression**.\\ Matches either zero or one occurrence of the previous expression. For example:\\ **ab?c** matches //ac//, //abc// but not //abbc// or //abbbc//\\ $$ (#)**%%|%%** $$ **Alternation (logical //or//).**\\ The vertical bar is used to separate two or more characters or expressions, any of which may match. For example:\\ **a%%|%%b** matches //a// or //b//\\ **a(b%%|%%c)d** matches //abd// or //acd//\\ **(bill%%|%%ted)** matches //bill// or //ted//\\ $$ (#)**{}** $$ **Quantifier**.\\ Braces are used to indicate that the preceding expression must match an exact number of times. For example:\\ **ab{2}c** matches //abbc//, but not //abc// or //abbbc//\\ **a.{4}z** matches //abcdez//, //a1234z//, //afourz//, //aaaaaz//, etc.\\ $$ (#)**[]** $$ **Character set**.\\ Matches any single character in the set of specified characters.\\ You can specify the character set as individual characters (e.g. **[abdfg]**) or as a range of characters (e.g. **[a-j]**) or as multiple ranges. For example:\\ **[abc]** matches either //a//, //b// or //c//\\ **[af-j]** matches either //a//, //f//, //g//, //h// or //j//\\ **[a-dh-kq-]** matches //a//, //b//, //c//, //d//, //h//, //i//, //j//, //k//, or any character from //q// onwards\\ **IMGP[0-9]{4}.jpg** matches //IMGP0158.jpg// (or any other four-digit number).\\ $$ (#)**[%%^%%]** $$ **Negative character set**.\\ Matches any character **not** in the set of specified characters. See **[]** for information on how the set is defined. For example:\\ **[%%^%%pqr]** matches any character except //p//, //q// or //r//\\ $$ (#)**()** $$ **Expression / capture group**.\\ Parentheses are used to combine multiple characters into an expression. When used in a "search and replace" like Advanced Rename, they also mark capture groups - see above for a discussion of these. For example:\\ **a%%|%%bc** matches //ac// or //bc//, whereas\\ **a%%|%%(bc)** matches //a// or //bc//\\ $$ (#)**\** $$ **Escape character**.\\ The backslash is used to escape token characters in order to match those characters literally.\\ When used before a non-token character, it is used to indicate the following special escape characters:\\ |**\t**|tab character ($09)| |**\r**|carriage return ($0d)| |**\v**|vertical tab ($0b)| |**\f**|form feed ($0c)| |**\n**|new line ($0a)| |**\e**|escape ($1b)| |**\x**|matches an ASCII character specified as a two-digit hexadecimal number, e.g. **\x20** matches a space| |**\u**|matches a Unicode character specified as a four-digit hexadecimal number, e.g. **\u0020** matches a space.| It is also used to mark several character classes, which are shorthand ways to specify various common **[]** character sets (see below). For example:\\ **a%%|%%b** matches //a// or //b//, whereas\\ **a\%%|%%b** matches //a%%|%%b\\// **a\t** matches //a// followed by a tab character, whereas\\ **a%%\\%%t** matches //a\t//\\ $$ (#)**  \w** $$ **Word character**.\\ Matches any word character. Equivalent to **[a-zA-Z_0-9]**. For example:\\ **%%^%%\w+[0-9]{4}.jpg** matches //IMGP0158.jpg// (or any other four-digit number preceded by at least one other word character).\\ $$ (#)**  \W** $$ **Non-word character**.\\ Matches any non-word character, equivalent to **[%%^%%a-zA-Z_0-9]**.\\ $$ (#)**  \s** $$ **Space character**.\\ Matches any whitespace character. Equivalent to **[ \f\n\r\t\v]**.\\ $$ (#)**  \S** $$ **Non-space character**.\\ Matches any non-whitespace character. Equivalent to **[%%^%% \f\n\r\t\v]**.\\ $$ (#)**  \d** $$ **Digit character**.\\ Matches any decimal digit. Equivalent to **[0-9]**.\\ $$ (#)**  \D** $$ **Non-digit character**.\\ Matches any non-decimal digit. Equivalent to **[%%^%%0-9]**. For example:\\ **%%^%%\D+\d{4}.jpg** matches //IMGP0158.jpg// (or any other four-digit number preceded by at least one non-digit character).