Tuesday, August 24, 2010

vim magic

So I have a bunch of sequences in fasta format - and I need to rearrange the mo'fos.

To start with the entries look like this:
>F13C5.1        CE19383 WBGene00017422  status:Partially_confirmed      UniProt:O76564  protein_id:AAC64611.1


Run this vim command:
:%s:>\(\S\{4,}\)\t.*UniProt\:\(\S\{6,}\).*$:>\1_CAEEL__\2:g


And now they look like this:
>geneName_OrgID__UniProtAccNo

No comments: