Tuesday, April 1, 2008

Count number of A,T,G and Cs in first base of each sequence in a fastq file


awk '{if(n%4 == 1) print $0;n++;}' s_4_sequence.txt | sort | awk '{first = substr($0,4,1); if(first=="A") as++; if(first=="T") ts++; if(first=="C") cs++; if(first="G") gs++;}END{print as; print ts; print gs; print cs;}'

you don't need the sort, and the fastq file is called s_4_sequence.txt here.

No comments: