awk

Power for your command line

Created by Juan Diego Godoy Robles / @klashxx

Video talk



whoami

Sysadmin & Dev • First program CPC 6128

awk

What??

"AWK is a programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems."

— Wikipedia

What for??

"for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions."

— Wikipedia

AWKward name ...

"its name is derived from the surnames of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan."

So awk



Searchs for lines that contain certain patterns


            $ awk '/dolor/' lorem_ipsum.dat
            Lorem ipsum dolor sit amet, consectetur adipiscing elit.
                

Data Driven

describe the data you want to work with and then what action to do when you find it.

            $ awk '/dolor/{print $1}' lorem_ipsum.dat
            Lorem
                

BEGIN and END

startup and cleanup actions for awk programs.


             BEGIN{
                 // initialize variables
             }
             {
                 /pattern/ { action }
             }
             END{
                 // cleanup
             }
             
Executed once only
BEGIN before the first input record is read, END after all the input is consumed.

Run it

Short way


             awk 'program' input-file1 input-file2
             
             cmd | awk 'program'
             

Long one


             awk -f program-file input-file1 input-file2
             
             cmd | awk -f program-file
             

             #!/bin/awk -f
             
             BEGIN { print "hello world!!" }
             

RECORDS and FIELDS

Records

  • Records are separated by a character called the Record Separator RS.

  • By default, the record separator is the unix newline character \n. This is why records are single lines.

  • Additionally awk has ORS Output Record Separator to control the way records are presented to the stdout

$ awk 'BEGIN{RS=" *, *";ORS="<<<\n"}{print $0}' lorem_ipsum.dat 
Lorem ipsum dolor sit amet<<<
consectetur adipiscing elit.
Maecenas pellentesque erat vel tortor consectetur condimentum.
Nunc enim orci<<<
euismod id nisi eget<<<
interdum cursus ex.
Curabitur a dapibus tellus.
Lorem ipsum dolor sit amet<<<
consectetur adipiscing elit.
Aliquam interdum mauris volutpat nisl placerat<<<
et facilisis neque ultrices.
<<<
             

NR and FNR


  • NR: number of input records awk has processed since the beginning of the program’s execution.

  • FNR: current record number in the current file, awk resets FNR to zero each time it starts a new input file.

            $ cat n1.dat 
            one
            two
            

            $ cat n2.dat 
            three
            four
            

            $ awk '{print NR,FNR,$0}' n1.dat n2.dat 
            1 1 one
            2 2 two
            3 1 three
            4 2 four
            

RECORDS and FIELDS

Fields

  • By default, fields are separated by whitespace (any string of one or more spaces, TABs, or newlines), like words in a line.
  • To refer to a field in an awk program, you use a dollar $ sign followed by the number of the field you want.
  • $0 represents the whole input record.
  • NF is a predefined variable it's value is the number of fields in the current record. So, $NF will be always the last field of the record.
  • 
    $ cat lorem.dat 
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Maecenas pellentesque erat vel tortor consectetur condimentum.
    Nunc enim orci, euismod id nisi eget, interdum cursus ex.
    Curabitur a dapibus tellus.
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Aliquam mauris volutpat nisl placerat, et facilisis neque ultrices.
                 
  • 
    $ awk '{print $1,$NF,NF}' lorem.dat 
    Lorem elit. 8
    Maecenas condimentum. 7
    Nunc ex. 10
    Curabitur tellus. 4
    Lorem elit. 8
    Aliquam ultrices. 9
                 
  • FS holds the valued of the Field Separator, this value is a single-character string or a regex that matches the separations between fields in an input record.

  • The default value is " ", a string consisting of a single space. As a special exception, this value means that any sequence of spaces, TABs, and/or newlines is a single separator.

  • In the same fashion that ORS we have a OFS variable to manage how our fields are going to be send to the output stream.
  • 
    $ cat group
    nobody:*:-2:
    nogroup:*:-1:
    wheel:*:0:root
    daemon:*:1:root
    kmem:*:2:root
    sys:*:3:root
    tty:*:4:root
                 
    
    $ awk '!/^(_|#)/&&$1=$1' FS=":" OFS="<->" group
    nobody<->*<->-2<->
    nogroup<->*<->-1<->
    wheel<->*<->0<->root
    daemon<->*<->1<->root
    kmem<->*<->2<->root
    sys<->*<->3<->root
    tty<->*<->4<->root
                 

    Working with arrays

  • Arrays are associative, each one is a collection of pairs, index – value, where the any number or string can be an index.

  • No declaration is needed; new pairs can be added at any time.

  • Does not sort arrays by default (PROCINFO)
  • 
               awk 'BEGIN{
                     a[4]="four"
                     a[1]="one"
                     a[3]="three"
                     a[2]="two"
                     a[0]="zero"
                     exit
                     }
                     END{for (idx in a){
                            print idx, a[idx]
                            }
                     }'
                 
    
                 4 four
                 0 zero
                 1 one
                 2 two
                 3 three
                 

    Build-in functions

  • gsub(regexp, replacement [, target])

  • substr(string, start [, length ])

  • split(string, array [, fieldsep [, seps ] ])

  • index(in, find)
  • 
    awk 'gsub(/\./, ",")' lorem.dat
    Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    Maecenas pellentesque erat vel tortor consectetur condimentum,
    Nunc enim orci, euismod id nisi eget, interdum cursus ex,
    Curabitur a dapibus tellus,
    Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    Aliquam interdum mauris volutpat nisl placerat, et facilisis
                 

    
    $ awk 'BEGIN{t="hello-world";print index(t, "-")}'
    6
                 

    
    $ awk 'BEGIN{t="hello-world";print substr(t,index(t, "-")+1)}'
    world
                 
    
    $ cat passwd
    jd001:x:1032:666:Javier Diaz:/home/jd001:/bin/rbash
    ag002:x:8050:668:Alejandro Gonzalez:/home/ag002:/bin/rbash
    jp003:x:1000:666:Jose Perez:/home/jp003:/bin/bash
    ms004:x:8051:668:Maria Saenz:/home/ms004:/bin/rbash
    rc005:x:6550:668:Rosa Camacho:/home/rc005:/bin/rbash
                 

    
    $ awk 'n=split($0, a, ":"){print n, a[n]}' passwd
    7 /bin/rbash
    7 /bin/rbash
    7 /bin/bash
    7 /bin/rbash
    7 /bin/rbash
                 

    Custom functions

    
         awk 'function test(m){
            return sprintf("This is a test func, parameter: %s", m)
         }
         BEGIN{print test("param")}'
                 

    
         This is a test func, parameter: param
                 
    Parsing by parameter is the only way to make a local variable inside a function.




    DEMO TIME! + Q & A


    git clone https://github.com/klashxx/awk.git

    Thanks!

    More info