At firs glance Perl seems to be the rigth tool for this purpose.
First we need to find the rigth regexp.
The facts are:
Names are enclosed between double quotes.
We have a comma as surname – firstname separator.
This two creates a unique pattern to search & replace.
Now we have alter the order, and the case of the name-surname.
One one of the approaches is to memorize portions of the pattern to do the right substitution.
So CONNOLLY, JOHN will match a quote, one singular letter ,the rest of the word,some spaces (or not) ,a comma, some spaces (or not), one sigular letter the rest of the letters of the name plus the final quote.
The translation to a Perlregexp could be:
We used four parentheses groups to memorize what whe have to change.
The complete substituion will be:
s/"(\w)(\w*)\s*,\s*(\w)(\w*)"/"$3\L$4 \U$1\L$2"/
Note the use of the upper/lowercase flags and how the order of the words is altered.
In Perl we can use the infile edition, so we can use this one-liner to get the requested result:
Ok, before this experiment I would bet on Perl, but the results are clear ….
It’s a draw (if not a sed winning)
$ time-p perl -pi-e's/"(\w)(\w*)\s*,\s*(\w)(\w*)"/"$3\L$4 \U$1\L$2"/' file
real 6.71
user 6.38
sys 0.32
$ time-psed-ei's/"\([A-Z]\)\([A-Z]\{1,\}\) *, *\([A-Z]\)\([A-Z]\{1,\}\)"/"\3\L\4 \U\1\L\2"/' file
real 6.51
user 6.19
sys 0.31
Perl execution flags NOTES:
specifies that files processed by the construct are to be edited in-place. It does this by renaming the >input file, opening the output file by the same name, and selecting that output file as the default for >print statements. The extension, if supplied, is added to the name of the old file to make a backup >copy. If no extension is supplied, no backup is made. Saying “perl -p -i.bak -e “s/foo/bar/;” … ” is the >same as using the script:
causes perl to assume the following loop around your script, which makes it iterate over filename >arguments somewhat like sed:
while () {
… # your script goes here
} continue {
Note that the lines are printed automatically. To suppress printing use the -n switch. A -p overrides a ->n switch.
-e commandline
may be used to enter one line of script. Multiple -e commands may be given to build up a multi-line >script. If -e is given, perl will not look for a script filename in the argument list.
We look for a string started by as slash (note de escape char \/) followed by any number of any character (dot + star .*) ,followed by the string closed and ended by any number of space chars * and replace it with :1: .For the first line:
21/tcp closed ftp will be replace for :1:
Same thing for “open” in this case “:0:” will be the substitution string , example: 22/tcp open ssh will be replace for :0:
Our initial tasks get solved ,but we can refine our efforts.
Let’s use the conditional operator.
expr ? action1 : action2
Its pretty straight forward : if expr then acction1 is performed/evaluated , if not action2.
For our example , field two must change to 1 if it’s value is closed, if not it should be 1.
The needed conditional operator:
$2=="closed" ? "1" : "0"
Depending of second field value, our program will perform a different action, in this case its returning a string : 1 or 0.
Note that we reduce the calls to the sub function to just one.
A final (and total different) approach , field substitution instead of text replacing.
Remember our tasks:
a) Get rid of the slash+tcp string of the first field.
b) Change the value of the second field for 1 or 0
c) Field separator should be :
Our input file has naturally three fields (by the default awkFS ):
21/tcp closed ftp
22/tcp open ssh
23/tcp closed telnet
It’s clear that we can think in a four fields based line, if we add the slash / to our field separators by using a regex as FS='( *)|(/)' where ( *) represents any number of spaces as separator and (/) represents the slash:
Attention, the use of the print statement is not needed, awk will print the input line if the result of applying the inner statements to the current input line is true.
The assignment $2="" is not an action statement but we force a true return by placing 1 at the end of the program.
If we set the OFS to null value:
awk'{$2=""}1'OFS=FS='(*)|(/)' infile|head -4
We’re close to or goal, the last step is to process the third field:
$3=="closed" ? ":1:" : ":0:"
Like we saw before we need to assign it to a variable,… look the trick:
$3= $3=="closed" ? ":1:" : ":0:"
We say , hey! change `$3 depending of its previous value.
So :
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file.
The next step is use the substitute command to change file extension.
NOTE: We’re using single quotes to enclose literal strings (the dot is a metacharacter if using double quotes scape it with a backslash).
The result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
$ ls | sed-e"p;s/.txt$/.sql/"|xargs -n2mv
PD: Alternative path to take care of spaces in the file names:
$ touch"a a d.txt.txt""b b b.txt""c c.txt" d.txt e.txt f.txt
$ ls
a a d.txt.txt b b b.txt c c.txt d.txt e.txt f.txt
Here’s the CMD:
$ ls | awk'{gsub(/^|$/,"\"");print;gsub(/\.txt\"$/,".sql\"")}1' |xargs -n2mv
$ ls
a a d.txt.sql b b b.sql c c.sql d.sql e.sql f.sql
From the man page:
xargs combines the fixed initial-arguments with arguments read from
standard input to execute the specified command one or more times.
The number of arguments read for each command invocation and the
manner in which they are combined are determined by the options
specified. [/sourcecode]
The n parameter
-n number Execute command using as many standard input
arguments as possible, up to number arguments
maximum. Fewer arguments are used if their total
size is greater than size bytes, and for the last
invocation if there are fewer than number
arguments remaining. If option -x is also coded,
each number arguments must fit in the size[/sourcecode]
The -n2 flag force xargs to take 2 arguments from the piped output each time and parses it to the mv command to get the job done.
awk'/OUTPUT/{flag=1;next}# Initial pattern found --> turn on the flag and read the next line/END/{flag=0}# Final pattern found --> turn off rhe flagflag{print}# Flag on --> print the current line'infile
The first optimization is to get rid of the print , in awk when a condition is true print is the default action , so when the flag is true the line is going to be echoed.
To delete de NEXT statement , in order o prevent printing the TAG line, we need to activate the flag after the OUTPUT pattern discovery and after the flag evaluation.
A slight variation of the program flow and we’re done:
PD: What if we only want to print the lines enclosed between the OUTPUT && END tags ?