Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.

sed regex

Here is a regex used by sed in xwax-scan

# /[num[.]] artist - title.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t\2\t\3:pi

substitute

`s' substitute command

`:' separator is a colon instead of / http://www.commandlinefu.com/commands/view/2889/sed-using-colons-as-separators-instead-of-forward-slashes

`/' beginning of REGEXP or pattern

REGEXP

Number (group 1)

\([0-9]\+.\? \+\)\?

this is a group followed by postfix operator, in this case the postfix operator is `?':

`\(REGEXP\)'
     Groups the inner REGEXP as a whole, this is used to:

        * Apply postfix operators, like `\(abcd\)*': this will search
          for zero or more whole sequences of `abcd', while `abcd*'
          would search for `abc' followed by zero or more occurrences
          of `d'.  Note that support for `\(abcd\)*' is required by
          POSIX 1003.1-2001, but many non-GNU implementations do not
          support it and hence it is not universally portable.

`[0-9]' bracket expression, matches any decimal digit

`\+'
     As `*', but matches one or more.  It is a GNU extension.
`.'
     Matches any character, including newline.
`\?'
     As `*', but only matches zero or one.  It is a GNU extension.
` '
a space
`\+'
     As `*', but matches one or more.  It is a GNU extension.

Artist (group 2)

\([^/]*\)

another group with a list inside, being used as a backreference `\2’ in this case since it’s the second group In most cases this would be the artist

`[LIST]'
`[^LIST]'
     Matches any single character in LIST: for example, `[aeiou]'
     A leading `^' reverses the meaning of LIST, so that it matches any
     single character _not_ in LIST.
` \+

one or more spaces

`-'

dash

` \+
one or more spaces

Title (group 3)

\([^/]*\)

another group with a list inside, being used as a backreference `\3’ in this case since it’s the third group In most cases this would be the title

file extension

.[A-Z0-9]*$

`\.'
`\CHAR'
     Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'.
`[A-Z0-9]'
`[a-zA-Z0-9]'
     In the C locale, this matches any ASCII letters or digits.
`*'
     Matches a sequence of zero or more instances of matches for the
     preceding regular expression
`$'
     It is the same as `^', but refers to end of pattern space.

REPLACEMENT

\0\t\2\t\3

this is backreference \0 for the original line

followed by a tab \t

then backreference \2 for Artist

followed by a tab \t

then backreference \3 for Title

FLAGS

p

print out the pattern space to stdout

`I'
`i'
     The `I' modifier to regular-expression matching is a GNU extension
     which makes `sed' match REGEXP in a case-insensitive manner.