PLAIN TEXT the language of *NIX
Post first published in nixtip
Let’s face it, one of the main task of the IT administrator could possibly be text processing.
Why? Because text is everywhere, computer systems speak text in three different languages STDOUT
, STDERR
and STDIN
.
We need to develop our skills to create log files miners, to adapt the ugly output of a program to meet our needs, etc…
Fortunately *nix
offer may options to get our goals, so we have to try many tools until we find the one we get comfortable with.
Usually there’s nothing wrong with that, nowadays, most times, the power of the machines allow us to use the tool we want focusing on the results.
But… what happens when we have to process a 4 GBs
file, or many files on production systems ?
Let’s illustrate this with an example.
You can use this shell
script to create the sample text file:
NOTE: the weight of the should be around 130 MB
.
Now we have a 10 million
lines flat text file.
Our task is simple, extract the line number 5000000
Let’s bench a bunch of common ways:
The processing times are quite similar, but if you don’t use the right logic …
We get a double time here! (no exit after found the line)
Conclusion: Don’t care too much about the tool, care about your programming skills or be ready to waste precious computing time.
comments powered by Disqus