awk Talk

31 07 2006

Why I love awk:

if(!(x[key"-begin"])) { 
        x[key"-begin"] = x[key"-c"] = x[key"-end"]=$6; 
} else { 
        x[key"-c"]=x[key"-c"]" "$6; 
} END { for(item in x) { 
                if(match(item,"begin")) { 
                        tim1=t1[1]*3600 + t1[2]*60 + t1[3]; 
                        tim2=t2[1]*3600 + t2[2]*60 + t2[3]; 
                        if((tim2 - tim1) >= 1800) { 
                                print z[1]" "z[2]" "z[3]" "z[4] " = " x[y[1]"-c"] 

This awk lines above takes a large file (i.e. 68000 lines) with lines that consist of:SIP SP P DIP DP TIME

SIP = Source IP
SP = Source Port
P = Protocol
DIP = Destination IP
DP = Destination Port
TIME = Time of event (HH:MM:SS.MS)

And it:

  1. Builds a pseudo multi-dimensional array (awk doesn’t do “true” multidimensional arrays) with TIME as the value and SIP_P_DIP_DP as the key.
  2. Checks the array and calculates the start time and end time and excludes events that don’t last at least 30 minutes (1800 seconds).
  3. Prints out matching lines with in this format:
          SIP P DIP DP = <Start Time> <Middle Time> … <Middle Time> <End Time>
            * – this also reduces the data set to 1620 lines, since invalid groups are eliminated and the relevant lines are collapsed into a single line.

AND it runs at ridiculous speed.  Maybe on a faster box it would be ludicrous speed, but on my test box (rather old), it does this in about 2.25 seconds.  Rather fast if you consider the job, I think.

Other tests:

Data Set Size / Time = Rate
67,741 / 2.25s = 30,107.10/sec
556,834 / 29.12s = 19,122.05/sec
1,117,668 / 91.95s = 12,111.67/sec




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: