Showing posts with label namazu. Show all posts
Showing posts with label namazu. Show all posts

Monday, February 19, 2007

Crontab

This is a useful program for scheduling routine tasks initiated by scripts. Developing on the search installation from last week, we will write a crontab entry to update the namazu index daily.

First of all check which jobs are currently scheduled:

user@compy:~> crontab -l
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/crontab.XXXX8NR6OC installed on Mon Feb 5 17:51:55 2007)
# (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $)
0 5 * * * /home/user/.backup.sh


This is the rsync job from the other week. So, next we need to create a script to run the namazu index. This is a simple shell script with the following command in it:
#!/bin/bash
mknmz -O ~/.namazu/index/ ~/work/articles/

Save this file somewhere like in ~/.namazu/.index.sh. Append this to the crontab in the same way as the rsync script was added - i.e. type crontab -e and add the following line:
0 5 * * * /home/user/.namazu/.index.sh

Then, type crontab -l to check that it has been added. You should now have a namazu index that updates everyday at 5AM.

Wednesday, February 7, 2007

Namazu Config

Namazu is a search engine. Whilst its no replacement for find and grep in most cases, it is useful for searching a directory of pdf files, and this can be made available through a web interface - which is quite useful if you have a lot of pdfs.

Download and install the program - its available for most distributions. You will need to edit the config file namazurc and the index config file mknmzrc. I needed to change the max file size variable to accommodate my pdfs.

In the mknmzrc file I changed the following lines:
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
#
$FILE_SIZE_MAX = 10000000;

#
# The max text size for indexing. Files larger than this
# will be ignored.
#
$TEXT_SIZE_MAX = 6000000;



I then created an index directory in my home directory. If you are setting this up for a web server you may wish to keep the index files elsewhere. So then I ran the mknmz program:

mknmz -O ~/.namazu/index/ ~/work/articles/


This creates the index of the articles directory and stores it in the .namazu/index directory. From the command line you can now enter:
namazu "epitope not cell"


Not that that search makes any sense but you get the idea.