Tuesday, July 24, 2007

Split fasta file into files with one contig per file

This post has moved HERE

Saturday, July 21, 2007

Remove all nonalphanumeric characters from a file

sed ':a; $!N;s/[^a-z,A-Z,0-9]//g;ta' myfile

Wednesday, July 4, 2007

useful bash shortcuts

ctrl-a: jump to beginning of line
ctrl-e: jump to end of line
alt-f: jump forward a word
alt-b: jump back a word

alt-d: delete word
alt-t transpose two words

ctrl-r: search back for a command, hit ctrl-r again to search back further
ctrl-xx: jump back to your last edit, again to get back to original position

For ctrl and alt commands press control AND the key indicated at the same time. More random bash here:
http://blog.webhosting.uk.com/2007/04/08/using-bash-shell-shortcuts/

Sunday, July 1, 2007

Extracting part of a field in awk

I had a file where the lines look like this:

all_initstring0010010100111000/1dca.rule118.iter100.score all_initstring0010010100111000/1dca.rule140.iter100.score: 0
all_initstring0010010100111000/1dca.rule128.iter100.score all_initstring0010010100111000/1dca.rule122.iter100.score: 122312
all_initstring0010010100111000/1dca.rule113.iter100.score all_initstring0010010100111000/1dca.rule143.iter100.score: 3213

I wanted to extract the value after rule and before the . in the 1st and 2nd fields and also print the third field. I used awk and the substitution function to replace everything but the required value using a regular expression. Here's the code:

gawk '{gsub(/^.*rule/,"",$1); gsub(/[^0-9].*/,"",$1); gsub(/^.*rule/,"",$2); gsub(/[^0-9].*/,"",$2); print $1 " " $2 " " $3}' myfile

There is actually another solution which is probably nicer:

awk '{ split($1,a1,/\./) ; split($2,a2,/\./); print substr(a1[2],5), substr(a2[2],5), $NF; }' myfile

linuxjunk