PHP Website Search Script - Explanation

This is the same script used on this site. It's very simple compared to most others...

Most site search scipts are based on search engines. They spider and index pages into a database or file, then when a query is entered they query the database for the search terms.


The index/query approach has three main benefits:

  1. It's very fast to get results out of.
  2. The search can perform logical operations on the keywords. So you can search for 'this AND 'that' or search for 'this' OR 'that', but NOT 'the other', and so on...
  3. You can order the results by some other factor - like 'relevancy'.

My approach in this script is different. It searches the files on this website in real time. I.e. it opens, reads and compares, then closes every file in the start directory, and all the sub-directories under it.


This 'real-time' search's benefits:

  1. The code is way simpler.
  2. It's results are always up to date. (Search engines need to re-spider periodically)

...and three main disadvantages:

  1. It uses a lot more computing power to get the results. This may cause it to seem slow on large sites, or annoy your webhost by slowing down the whole machine.

    If your site's not too busy, and it's not being used ten times a second - this approach is perfectly OK. If your site is huge - you may want to use a script which indexes then queries...

  2. It only caters for exact matches, although it is case insensitive. (I think exact matches are fine!)

  3. There's no ordering of results. There's no measure of 'relevancy'.

    This only matters for large data sets, and it would be easy to add, for example, a check on the number of occurrences of the search term, or it's position in the text etc...



To download any script, please go to the Free Scripts page.



FAQ

  • Q: The script doesn't find any files / I need the script to work with .html files.

  • A: You need to change the line which says:
    if ($file_ext == 'php') search_file($s_dir.$file);
    to include the (last 3) characters in the file extension - i.e. 'tml' or 'htm'
    if ($file_ext == 'php' || $file_ext=='tml')


  • Q: How do I exclude files?

  • A: There are 2 ways:
    1) You include the text SSIGNORE somewhere in the file. The script will ignore any files containing this because of this line (you can change this text):
    if (strpos($f_data, 'SSIGNORE')!==FALSE) return;

    2) You can ignore files by uncommenting this line and putting your file names in there:
    //----------------------------------------------------------
    // To exclude files - uncomment and complete the line below
    //----------------------------------------------------------
    // if ($file=='FileToIgnore' || $file=='FileToIgnore2') continue;



  • Q: How do I exclude directories?

  • A:You can ignore files by by uncommenting this line and putting your directory names in there: elseif (is_dir($s_dir.$file)) {
    //----------------------------------------------------------------
    // To exclude directories - uncomment and complete the line below
    //----------------------------------------------------------------
    // if ($file=='DirToIgnore' || $file=='DirToIgnore2') continue;





    Line By Line...
    site-search.php

    This script doesn't use the database.

    The first part of the file is the HTML to set up the page. You'll have to fill in your site logo etc. here.

    The script has 2 user-defined functions:

    • search_dir() - This function goes thru all the files in the current directory. If it's a web page - the file name is passed to the next function - search_file()... If it's a directory, however, the current directory is changed to it, and this function calls itself to search that directory.

      This makes search_dir() a recursive function. Don't worry this is quite normal programming stuff... The computer keeps track of each instance of the function by putting all the data onto a stack every time it's called. When the function's finished going through a directory, it comes off the top of the stack, and the one that was running previously starts again. Stacks are a common programming device!

    • search_file() - This function searches through a file for the specified search term. It converts the whole file to lowercase so the match is not case sensitive.

      It has to do a couple of tricks:
      1. It keeps a copy of the file in mixed-case so we can get the title of the page and display it, capitals and all, in the search results.
      2. If found, it displays the text following the search term. To do this it has to remove all HTML tags (formatting) from the text.

    The current directory is stored as an array of directories: $cur_path. This is so you can add and remove directories from the list as the program traverses the directory tree.

    Here's the code:

    • The first line starts the definition for the search_dir() function..
      function search_dir () {

    • This makes these variables - defined outside the function - available inside it.
      global $cur_path, $dir_depth, $matches;

    • If there are already over 100 matches - don't do any more searching.
      if ($matches > 100) { return; }

    • create a string $s_dir containing the full current path. This is so the path is available in a form we can use.
      $s_dir="";
      for ($c=0; $c<=$dir_depth; $c++) { $s_dir .= $cur_path[$c]; }


    • Open the current directory using the $s_dir, then start a loop reading all the files in that directory...
      $dhandle=opendir("$s_dir");
      while ($file = readdir($dhandle)) {


    • Ignore the 'this' and 'parent' directory items which appear in every directory.
      if (($file!=".") && ($file!="..")) {

    • the is_dir() function returns TRUE if the file is a regular file. We create the full pathname for the file from the current directory plus the filename. (The '.' operator concatenates strings.)
      if (is_file($s_dir.$file)) {

    • Get the last 3 characters of the filename into the variable '$ext'. Only process .php files - you may need to change this!
      $file_ext = substr($file, strlen($file)-3, 3);
      if (($file_ext == "php")


    • Also, only process files if: They aren't this search script or any other navigation scripts. (You'll probably need to change this too!) Call the search_file() function if the file fits the bill.
      && (strcmp ($file, $PHP_SELF) != 0)
      && (strcmp ($file, "right-nav.php") != 0)
      && (strcmp ($file, "menu.php") != 0)) { search_file($s_dir.$file); }


    • Else, if the file's a directory, add the current file plus '/' to the current path, and increase it's depth by one. Then search the directory.
      elseif (is_dir($file)) {
      $cur_path[++$dir_depth] = ($file."/");
      search_dir();


    • Once the search is complete, reduce the depth of the current directory back down by one. The function will then continue to loop thru the files in the original directory.
      $dir_depth--;


    • Start the definition for the search_file() function... Define the global variables we need access to.
      function search_file ($file) {
      global $search_term, $results, $r_text, $r_title, $matches;


    • create a string $s_dir containing the full current path
      $s_dir="";
      for ($c=0; $c<=$dir_depth; $c++) { $s_dir .= $cur_path[$c]; }


    • Open the file, read it's contents into a variable $f_data.
      $f_size = filesize($file); $f_handle = fopen($file, "r"); $f_data = fread($f_handle, $f_size);

    • Create a variable $f_dlc which is the data in lowercase.
      $f_dlc = strtolower($f_data);

    • Create 2 variables for getting the de-tagged version of the text...
      $t_text = "";
      $in_tag = 0;


    • If the lowercase data contains the search term, strstr() will return the text from the match to the end of the file. Put this 'match text' into the variable $text. If it doesn't contain the text it'll return FALSE!
      if ($text = strstr($f_dlc, $search_term)) {

    • Record the full pathname of the match.
      $results[$matches] = $file;

    • Remove any HTML tags from the text. When $in_tag = 1 - we're 'in' an HTML tag - so no text is copied to the destination variable $t_text.
      for ($c = 0; $c < 200; $c++) {
      if (strcmp(substr($text, $c, 1), "<") == 0) { $in_tag=1; }
      elseif (strcmp(substr($text, $c, 1), ">") == 0) { $in_tag=0; }
      elseif ($in_tag == 0) { $t_text .= substr($text, $c, 1); }
      }


    • Add the de-tagged text to the array of result text.
      $r_text [$matches] = "...". $t_text. "...";

    • Get the position of the page's title text from the lowercase version of the data. Then get the actual text from the mixed-case version. Put it into the array of matches titles. $matches++ increments the number of matches found.
      $t_start = strpos ($f_dlc, "<title>") + 7;
      $t_end = strpos ($f_dlc, "</title>");
      $r_title[$matches++] = substr($f_data, $t_start, $t_end-$t_start);


    • Close the file. End of function.
      fclose($f_handle);


      To start the whole search running you need to initialse some variables, then call the search_dir() function on the base directory you want to search...

    • Initialse all the global variables.
      $dir_depth=0;
      $matches=0;
      $cur_path = array("./");
      $results = array(""); // Results = all the files that matched
      $r_text = array(""); // ...the text they contained...
      $r_title = array(""); // ...the titles of the pages...


    • Convert the search term to lowercase.
      $search_term=strtolower($search_term);

    • As long as the search term isn't blank - start the search!
      if (strcmp($search_term, "")!=0) { search_dir(); }



    Warning: Cannot modify header information - headers already sent by (output started at /home/sites/web-bureau.com/public_html/modules/free-php-website-search-script.php:7) in /home/sites/web-bureau.com/public_html/footer.php on line 12