Penetrator
==========

 Penetrator - Personal search engine
 Copyright (C) 2000-2002 Angel Ortega <angel@triptico.com>
 Home Page: http://www.triptico.com/software/penetrator.html

This software is GPL. NO WARRANTY. See file 'COPYING' for details.

IMPORTANT NOTE: if upgrading from Penetrator 2.x
------------------------------------------------

The SQL table layout of version 3.x of Penetrator have changed from the
2.x series, so you'll need to recreate them (-z option) or nothing will
work. If you want to preserve its current contents, you'll need to dump
out your data and manually restore them. Consult your database's
documentation for information on how to do this. Be also sure to read
the penetratorrc.sample included here, as it contains very important
information about database configuration that have changed.
Penetrator 3.x depends more tightly on database features than 2.x.
If you use any of the DBM drivers you don't have to worry about all of this.

Apart from this, Penetrator 3.x includes many improvements from 2.x:
query results are sorted by relevance (DBI driver only), much better
indexing times, a query cache and many more. Go on reading.

Intro
-----

Penetrator is a tool for indexing big trees of text files, as your local
HTML documentation or home directory. It's able to use DBM files or DBI
databases and be used as a command line tool or a CGI. The files to be
indexed can be selected by extension or using external file identifying
programs as /usr/bin/file.

Penetrator's console version tries to emulate grep (it shares many of
this command's arguments), and the CGI, a typical web search engine.

This program doesn't want to be Google; that is, is not meant to index
remote sites, as it needs to have filesystem access to the documents it
indexes. This may change, but not soon.

Installation
------------

*NOTE:* the installation method has changed in version 3.1.x.

Penetrator installation process is like any other Perl Module:

	 $ perl Makefile.PL
	 $ make
	 # make install

This will install Penetrator.pm in your Perl module directory and the
penetrator script in /usr/local/bin. If you want to use the CGI
interface, you'll have to manually copy the script to your CGI directory
and rename it as penetrator.cgi (a symbolic link will also do the work).

Operation
---------

To make it work, you need to create a configuration file. It can live
in /etc/penetratorrc, ./penetratorrc (only for the CGI) and
$(HOME)/.penetratorrc, or you can change it by a command line option.
Included with the package is a sample file that you can tune to fit your
needs. There's much information there; be careful to read it to get the
details that won't be told here.

When you have finished, you must first create the index by using

	 $ penetrator -z -v

(The -v is just for being verbose; you can happily ignore it). Take note
that this creation is not really necessary if you use the DBM drivers; but
as it's mandatory for the DBI one, it's good to get use to it).

Then you must build the index by using

	 $ penetrator -r -v

again the -v is just for verbose output. Depending on the size of the
trees to be indexed, this process can be terribly time consuming the
first time; next rebuilds will be faster, as only the changed files
are re-indexed. You can include this command in a crontab if you want.

Once you have an index, you should use it. If you run it from the command
line (recommended for the first test), you can just run

	 $ penetrator <search words>

Only the documents matching *all* words will be shown if you use more than
one. This command tends to be quite verbose, as it shows all the lines
that contains any of the chosen words; its output is very much like the
one from grep. If you just want to see the lines containing *all* the
keywords searched (that is, exclude lines not containing *all* the search
terms), you must use

	 $ penetrator <search words> -s

then the output will be less verbose. Also, you may want to know just the
file names that match all the keywords; then, use

	 $ penetrator <search words> -l

note that this is again very similar to the output of grep -l .

Words can also be prepended by a - or ~ to exclude documents containing
them from the final result.

CGI
---

Penetrator.cgi works like most CGIs, so just give it a try. The CGI
also searches for a 'penetratorrc' file in the current directory, so
it's a good idea to configure one for it. Again, take a look at the
sample configuration file to see what you can do.

You can add a call to penetrator.cgi by using the following HTML:

 <form action='/cgi-bin/penetrator.cgi'>
 <i>Penetrator</i> Search
 <input name=query value=''>
 <input type=submit value='Search'>
 </form>

Notes
-----

 * If you index your monster, million-cash-earning-by-day site with
   penetrator, I'll be glad to take a look at it. If you just don't
   earn millions with your web site then go away, loser. (But let me
   take a look anyway).
 * You can see a good example of a database using Penetrator at the
   English Horror Literature Database (http://www.interregno.com/horror/).

---
Angel Ortega http://www.triptico.com
