Monday, June 13, 2011

Sum weights

I have a file with a bunch of sequences and some weights at the top of the file:

>WEIGHTS 0.926434 1.000000 1.000000 0.926434 1.000000 0.892712 1.000000 1.000000 1.000000 1.000000 1.000000 0.892712 
>CRTC_EUGGR__Q9ZNY3 Calreticulin precursor.
XRKELWXXXXXXXXXXXXXXXXXXXXXXXXXTRWTHSTXXSDYXKFKLTSGKFYGDKAKDAGIQTSQDAKFYAISSPIASXXSXEXXXLVLQFSVKHXXXXXXGXGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXEPRCEXDTLSHTYXAXXXXDXXXEVLVDQVKKESGTLEEDWEILKPKTIPDPEDKKPADWVDEPDMVDPEDKKPEDWDKEPAQIPDPDATQPDDWDEEEDGKWEAPMISNPKYKGEWKAKKIPNPAYKGVWKPRDIPNPEYEADDKVXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXFYDQTNGATKDAEKKAFDSAEADKRKKEEDERKKQEEEEKKTAEEDEXXXDEXXXEDDKKDEL
>HSP47_RAT__P29457 47 kDa heat shock protein precursor (Collagen-binding protein 1) (GP46).
XRSLXXXXXXXXXXXXXXXXXXXXEAAAPGTAEKLSSKATTLAEXSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXSQAKAVLSAEKLRDEEVHTGLGELVRSLSNXTARNVTWKLGSXXXXXXXXSFADDFVRSSKQHYNCEHSKINFRDKRSALQSINEWASQTTDGKLPEVTKDVERTDXXLLXXAMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXYXYXXXXXXXXQXVEMXXXXXXXXXXXXXXXXXXXXXRLEKXXTKXQLKTWMGKMQKKAXXISLPXGVVXVTHDLQKXXAGLGLTEAIXKNKADLSXXSGXXXXXXXXXXXXXXXEWDTEGNPFDQDIYGRXXXRSXXXXXXXXXXXXXXXXXXXXXXXXIGRLXXXXGDKMRDEL
>ENPL_PIG__Q29092 Endoplasmin precursor (94 kDa glucose-regulated protein) (GRP94) (GP96 homolog) (98 kDa protein kinase) (PPK 98) (ppk98).
XRAXXXXXXXXXXXXXXXXXXXXEVDVDGTVEEDLGKSREGSRTDDEVVQREEEAIQLDGLNASQIRELREKSEKXAFXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXELTVKXKCDKEKNLLHVTDTGVGMTREELVKNLGTIAKSGTSEFLNKMAEAQEDGQSTSELIGXXXXXXXXXXXXXXXXXVTXXHNNDTQHIWESDSNEFSVIADPRGNTLGRGTTITLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSXKTETVEXPMXXXXAAKXEKEESDDEAAVXXXXXEKXPXTXKVEKTVWDWELMNDIKPIWQRPSKEVEDDEYKAFYKSFSXXXXXPMAYIHFTXXXXXXXXXILXXXXXXXXXLFDEYXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLNVSREXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGVIXDHXXXXRLAKLLRFQSSHHPSDITSLDQYVERMKEKQDKIYFMAGSSRKEAESSPFVEXXXXXXXXXXXXXXXXXXXXXQALPXXXXKRFQNVAKEGVKFDESEKSKENREAVEKEFEPLLNWMKDKALXDKIEKAVVSQRXXEXXXXLVASQYGWXGNXERIMKAQAYQTGKDISTNYYASQKKTFEINPRHPLIRDMLRRVKEDEDDKXXSDLXXXXXXXXXXXXXXLLPDTKAYXXRIERMLRLSLNIDPDAKVEXXPXXXPXXTTEDTTEDTEQDDDEEMDAGADEXXQXTSETSTAEKDEL
>CRTC_HUMAN__P27797 Calreticulin precursor (CRP55) (Calregulin) (HACBP) (ERp60) (grp60).
XLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTSXXIESXXXSDFXXFVLSSGKFYGDEEKDKGLQTSQDARFYALSASFEXXSXXXXXLVVXFXXKHXXXXXXGGGYVKLFPNSLDQTDMHGDSEYNIMFGPDIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTYEVKIDNSQVESGSLEDDWDFLPPKKIKDPDASKPEDWDERAKIDDPTDSKPEDWDKPEHIPDPDAKKPEDWDEEMDGEWEPPVIQNPEYKGEWKPRQIDNPDYKGTWIHPEIDNPEYSPDPSIYXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAYAEEFGNETWGVTKAAEKQMKDKQDEEQRLKEEEEDKKRKEXXXAXDKEDXEXKXEDXXDXXDKXXDXXEDVPGQAKDEL


I want to sum the weights which is fine for this example with like 12 sequences. However, some of the files have a couple of hundred entries. Step in Bash:

 head -n 1 filename.txt | awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

Giving me the sum of the weights! Woohoo.

Saturday, April 16, 2011

Monday, November 29, 2010

Metasploit simple example (3.5.0)

Download and install metasploit framwork from here: http://www.metasploit.com/framework/download/

run msfconsole type the following to run exploits:

db_driver sqlite3
db_connect
db_nmap [ip address]
db_autopwn -p -e -t

Saturday, November 27, 2010

Tuesday, November 23, 2010

Tuesday, August 24, 2010

vim magic

So I have a bunch of sequences in fasta format - and I need to rearrange the mo'fos.

To start with the entries look like this:
>F13C5.1        CE19383 WBGene00017422  status:Partially_confirmed      UniProt:O76564  protein_id:AAC64611.1


Run this vim command:
:%s:>\(\S\{4,}\)\t.*UniProt\:\(\S\{6,}\).*$:>\1_CAEEL__\2:g


And now they look like this:
>geneName_OrgID__UniProtAccNo