Index Language Options


The language page is used to designate how Wilbur should handle characters and common words in languages other than English.

DOS Style Text Files Skipped words Additional Language Characters Language Options Zip Encryption Options Include/Exclude Files General Options indexlanguagedlg.gif (27271 bytes)

Additional Language Characters

For pure text files there would be no penalty in considering all possible characters indexable, but when the text is being extracted from binary data, such as in word processing files, having a smaller character set will make it easier for Wilbur to distinguish text from binary data.

In the simplest case Wilbur will only index characters from the English alphabet and the underscore character. All other characters are considered to terminate or separate words. However in many cases the material being indexed will not be in English, so Wilbur provides an option for adding additional characters.

The characters required for several major European languages can be selected by just checking the box beside the language name. Note that more than one language can be checked.

If the language of your choice is not listed, you could try to make up a suitable set by selecting more than one language or select the All International Characters box. However the preferred solution would be to leave all the language options unchecked and just add the necessary characters to the appropriate additional character box described on the Options Page.

Word Skip File

There is little benefit to indexing words which are very common and appear in most files.  Wilbur is installed with a list of common English words which are skipped in the indexing process in the interests of speed and disk space.  This words are contained in the file skip.txt which lives in the same folder as your Wilbur executable file.

A different file can be used by changing the value in the word skip file on this page.   The file must be in the Wilbur directory and must have a TXT extension, but only the base part of the name is entered here.  For example a file named alternate.txt would be entered as just "alternate" without the quotes.

Wilbur provides some tools to partially automate the creation of alternate skip files.   Please see the Create Word Count and Import Skip Words commands on the Index Menu.

DOS Style Text Files

When this box is checked, Wilbur will assume that any file that appears to be all text should be assumed to have DOS style handling of international characters. Since Windows based programs such as Notepad and Web browsers will create text files using the Windows style of international characters, you probably don’t want to set this flag unless you are sure the files you are indexing have been created by DOS programs. All files that appear to be in a binary format, like MS Word documents, are assumed to be Windows style files.


Copyright © 1999 RedTree Development Inc. All rights reserved.
Information in this document is subject to change without notice.
Other products and companies referred to herein are trademarks or registered trademarks of their respective companies or mark holders.