Wednesday, February 7, 2007

Namazu Config

Namazu is a search engine. Whilst its no replacement for find and grep in most cases, it is useful for searching a directory of pdf files, and this can be made available through a web interface - which is quite useful if you have a lot of pdfs.

Download and install the program - its available for most distributions. You will need to edit the config file namazurc and the index config file mknmzrc. I needed to change the max file size variable to accommodate my pdfs.

In the mknmzrc file I changed the following lines:
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
$FILE_SIZE_MAX = 10000000;

# The max text size for indexing. Files larger than this
# will be ignored.
$TEXT_SIZE_MAX = 6000000;

I then created an index directory in my home directory. If you are setting this up for a web server you may wish to keep the index files elsewhere. So then I ran the mknmz program:

mknmz -O ~/.namazu/index/ ~/work/articles/

This creates the index of the articles directory and stores it in the .namazu/index directory. From the command line you can now enter:
namazu "epitope not cell"

Not that that search makes any sense but you get the idea.

No comments: