PHP Website Search Script - Explanation
This is the same script used on this site. It's very simple compared to most others...
Most site search scipts are based on search engines. They spider and index pages into a database or file, then when a query is entered they query the database for the search terms.
The index/query approach has three main benefits:
- It's very fast to get results out of.
- The search can perform logical operations on the keywords. So you can search for 'this AND 'that' or search for 'this' OR 'that', but NOT 'the other', and so on...
- You can order the results by some other factor - like 'relevancy'.
My approach in this script is different. It searches the files on this website in real time. I.e. it opens, reads and compares, then closes every file in the start directory, and all the sub-directories under it.
This 'real-time' search's benefits:
- The code is way simpler.
- It's results are always up to date. (Search engines need to re-spider periodically)
...and three main disadvantages:
- It uses a lot more computing power to get the results. This may cause it to seem slow on large sites, or annoy your webhost by slowing down the whole machine.
If your site's not too busy, and it's not being used ten times a second - this approach is perfectly OK. If your site is huge - you may want to use a script which indexes then queries...
- It only caters for exact matches, although it is case insensitive. (I think exact matches are fine!)
- There's no ordering of results. There's no measure of 'relevancy'.
This only matters for large data sets, and it would be easy to add, for example, a check on the number of occurrences of the search term, or it's position in the text etc...
FAQ
- Q: The script doesn't find any files / I need the script to work with .html files.
- A: You need to change the line which says:
if ($file_ext == 'php') search_file($s_dir.$file);
to include the (last 3) characters in the file extension - i.e. 'tml' or 'htm'
if ($file_ext == 'php' || $file_ext=='tml')
- Q: How do I exclude files?
- A: There are 2 ways:
1) You include the text SSIGNORE somewhere in the file. The script will ignore any files containing this because of this line (you can change this text):
if (strpos($f_data, 'SSIGNORE')!==FALSE) return;
2) You can ignore files by uncommenting this line and putting your file names in there:
//----------------------------------------------------------
// To exclude files - uncomment and complete the line below
//----------------------------------------------------------
// if ($file=='FileToIgnore' || $file=='FileToIgnore2') continue;
- Q: How do I exclude directories?
- A:You can ignore files by by uncommenting this line and putting your directory names in there:
elseif (is_dir($s_dir.$file)) {
//----------------------------------------------------------------
// To exclude directories - uncomment and complete the line below
//----------------------------------------------------------------
// if ($file=='DirToIgnore' || $file=='DirToIgnore2') continue;
Line By Line... site-search.php
This script doesn't use the database.
The first part of the file is the HTML to set up the page. You'll have to fill in your site logo etc. here.
The script has 2 user-defined functions:
search_dir() - This function goes thru all the files in the current directory. If it's a web page - the file name is passed to the next function - search_file()... If it's a directory, however, the current directory is changed to it, and this function calls itself to search that directory.
This makes search_dir() a recursive function. Don't worry this is quite normal programming stuff... The computer keeps track of each instance of the function by putting all the data onto a stack every time it's called. When the function's finished going through a directory, it comes off the top of the stack, and the one that was running previously starts again. Stacks are a common programming device!
search_file() - This function searches through a file for the specified search term. It converts the whole file to lowercase so the match is not case sensitive.
It has to do a couple of tricks:
- It keeps a copy of the file in mixed-case so we can get the title of the page and display it, capitals and all, in the search results.
- If found, it displays the text following the search term. To do this it has to remove all HTML tags (formatting) from the text.
The current directory is stored as an array of directories: $cur_path. This is so you can add and remove directories from the list as the program traverses the directory tree.
Here's the code:
Warning: Cannot modify header information - headers already sent by (output started at /home/sites/web-bureau.com/public_html/modules/free-php-website-search-script.php:7) in /home/sites/web-bureau.com/public_html/footer.php on line 12
|