For html (on Mac OS X - should work on linux too) files this is the command I use.
textutil -convert txt -strip printableArticle.jhtml.html
The -strip should remove most of the html tags and preserve the formating. I suggest using the printable version of files from the interwebs as this usually has the complete text and usually has no ads. There should also be fewer links, markup, pictures etc. I guess a more unix approach should appear in the comments.
For pdfs I use the pdf tools.
pdftotext -layout filename.pdf
The -layout ensures that you should have most of the formating intact. FBreader is an excellent program for reading on the nokia tablets.