legacy-wiki
Sed
Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.
sed regex
Here is a regex used by sed in xwax-scan
# /[num[.]] artist - title.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t\2\t\3:pi
substitute
`s' substitute command
`:' separator is a colon instead of / http://www.commandlinefu.com/commands/view/2889/sed-using-colons-as-separators-instead-of-forward-slashes
`/' beginning of REGEXP or pattern
REGEXP
Number (group 1)
\([0-9]\+.\? \+\)\?
this is a group followed by postfix operator, in this case the postfix operator is `?':
`\(REGEXP\)'
Groups the inner REGEXP as a whole, this is used to:
* Apply postfix operators, like `\(abcd\)*': this will search
for zero or more whole sequences of `abcd', while `abcd*'
would search for `abc' followed by zero or more occurrences
of `d'. Note that support for `\(abcd\)*' is required by
POSIX 1003.1-2001, but many non-GNU implementations do not
support it and hence it is not universally portable.
`[0-9]' bracket expression, matches any decimal digit
`\+'
As `*', but matches one or more. It is a GNU extension.
`.'
Matches any character, including newline.
`\?'
As `*', but only matches zero or one. It is a GNU extension.
` '
a space
`\+'
As `*', but matches one or more. It is a GNU extension.
Artist (group 2)
\([^/]*\)
another group with a list inside, being used as a backreference `\2’ in this case since it’s the second group In most cases this would be the artist
`[LIST]'
`[^LIST]'
Matches any single character in LIST: for example, `[aeiou]'
A leading `^' reverses the meaning of LIST, so that it matches any
single character _not_ in LIST.
` \+
one or more spaces
`-'
dash
` \+
one or more spaces
Title (group 3)
\([^/]*\)
another group with a list inside, being used as a backreference `\3’ in this case since it’s the third group In most cases this would be the title
file extension
.[A-Z0-9]*$
`\.'
`\CHAR'
Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'.
`[A-Z0-9]'
`[a-zA-Z0-9]'
In the C locale, this matches any ASCII letters or digits.
`*'
Matches a sequence of zero or more instances of matches for the
preceding regular expression
`$'
It is the same as `^', but refers to end of pattern space.
REPLACEMENT
\0\t\2\t\3
this is backreference \0 for the original line
followed by a tab \t
then backreference \2 for Artist
followed by a tab \t
then backreference \3 for Title
FLAGS
p
print out the pattern space to stdout
`I'
`i'
The `I' modifier to regular-expression matching is a GNU extension
which makes `sed' match REGEXP in a case-insensitive manner.