Wednesday, March 21, 2007

Counting the number of instances of a letter in a file

To explain why this might be necessary:

First I have an xml file with some sequences in it. I grep the xml file finding the instances I am interested in and save these to a file, stripping out the xml using vim.

grep -A 5 \<type\>Dis filename.xml | grep \<sequence\> > file_raw_seq.txt
Alternatively I could use the fasta file and strip the headers using grep -v \>.

I am now left with the raw text of the sequences from the xml file that I am interested in.

I can now run the following command to check out the number of times say M appears in the file;

tr -dc M < input.fa | wc -c


Easy. The alternative was to write a C++ program to do all this - which would have taken considerably longer. Especially given that some people haven't committed working versions of their code. I'm looking at you Mr. TreeCreate and Mr IntVector.

No comments: