First I have an xml file with some sequences in it. I grep the xml file finding the instances I am interested in and save these to a file, stripping out the xml using vim.
grep -A 5 \<type\>Dis filename.xml | grep \<sequence\> > file_raw_seq.txt
Alternatively I could use the fasta file and strip the headers using grep -v \>.I am now left with the raw text of the sequences from the xml file that I am interested in.
I can now run the following command to check out the number of times say M appears in the file;
tr -dc M < input.fa | wc -c
Easy. The alternative was to write a C++ program to do all this - which would have taken considerably longer. Especially given that some people haven't committed working versions of their code. I'm looking at you Mr. TreeCreate and Mr IntVector.
No comments:
Post a Comment