Digital Library System |
|||
|
|
|||
[ Expand All ] [ Collapse All ]
- Introduction
- What Makes This Library System Useful?
- Would having a research assistant search tens of thousands of documents remembering over two and a half million individual paragraphs, sections and relevant items of information be of value to you? What if your assistant then created a report containing each and every item related to what you are looking for and included instant access to complete documents containing the individual items? How much time and money would you save if your assistant could do all this in under ten minutes?
- What Is An Integration Engine?
- Search engines only tell you where information is not what the information is. Even at their best search engine results require you to read through massive amounts of data to find the relevant bits of information you are looking for.
- Supporting the library
- Goals
- The goal of this library is to provide a knowledge discovery tool and be of some good use to the world at large.
- Contributing Documents
- To contribute documents to this library or to include your own research project call 330.345.2956 or email sysop@nvi.net.
- Document Types
- Text documents must be ASCII text.
- HTML documents must be well formed.
- Internet Archive MHTML format.
- Donations
- If you would like to make a monetary donation directly to the programmer to help support the development of this Library Click Here.
- Quick Start
- Login
- Demo
- Login using the demo account:
Username = demo
Password = demo- User
- When prompted enter your Username and Password. Usernames and Passwords are case sensitive. If you do not have an account and would like one, email sysop@nvi.net or call 330-345-2956 to request an account.
- Upload
- The Digital Library System supports ASCII TEXT, HTML, and Internet Archive files. File names must not include illegal characters like an apostrophe " ' " , single, and or double quotes
- To upload a document click on the "Browse..." button. A window will open allowing you to browse your hard drive. Navigate to the folder and file you wish to integrate. Click on the file name you want and click "OK". Your file selection is now available to upload by clicking on the "UPLOAD" button.
- Index
- Click on the "Index" button to index your file or files.
- Integrated Search
- Enter a term or phrase in the "Integrated Search" text box and press the "Integrated Search" button. Elements from your uploaded document or documents are displayed in a collapsible / expandable integration tree.
- Users Guide
- Introduction
- The Digital Library System address the issues of information management and knowledge discovery. Search engines only tell you where information is not what the information is. Even at their best search engine results require you to read through massive amounts of data to find the relevant bits of information you are looking for.
- Documents contain information expressed as concepts with supporting information. The relevancy of a documents content to any question or line of inquiry is highly subjective. The probability that any given document is of value depends on what your looking for. I call this activity Knowledge Discovery.
- Site Navigation
- Site navigation menus are displayed within the black bar near the top of the screen. The main site navigation menu is shown here as an example.
- HOME
- returns you to the main screen.
- LOGOUT
- cancels your current session and logs you out of the system.
- HELP
- loads the help page.
- Main Menu
- The main screen contains various functions and file management utilities. Depending on your account type and access level some functions may not be available. If you would like access to additional functions that are not included with your current account please email sysop@nvi.net or call 330-345-2956.
- File Editor
- NEW FILE
- Creates a new file in the currently selected directory.
- Integration
- Type a word or phrase into the textbox. Click the "Integrated Search" button to search your database for all elements that contain the word or phrase. The integrated search function processes all elements from all documents in your database that have been indexed. There are several integration methods available to assist you.
- Integration Types
- Simple Integration
- Full Text Natural Language Integration
- By default full text natural language integration is enabled. To disable full text integration uncheck the selection box by clicking in the selection box.
- Natural language Integration interprets your Integrated Search query as a phrase in natural human language (a phrase in free text). There are no special operators. A common word stop-word list applies. In addition, words that are present in more than 50% of the elements are considered common and will not match.
- Full Text Boolean Integration
- Full Text Boolean Integration does not use the 50% threshold and the stop-word list applies. Using + and - operators immediately in front of a word indicate that the word is required to be present or absent, respectively, for a match to occur. For example +fish means that the word fish must be present. Alternately -fish means that the word fish must not be present.
- Supported Operators:
- +
- A leading plus sign indicates that a word must be present in each element.
- -
- A leading minus sign indicates that a word must not be present in any of the elements.
NOTE: The - operator acts only to exclude elements that are otherwise matched by other integration terms. Thus, a boolean-mode integration that contains only terms preceded by - finds nothing. It does not return “all elements except those containing any of the excluded terms.”- (no operator)
- By default (when neither + nor - is specified) the word is optional, but the elements that contain it are rated higher. This mimics the behavior of "Simple Integration".
- > <
- These two operators are used to change a word's contribution to the relevance value that is assigned to an element. The > operator increases the contribution and the < operator decreases it.
- ()
- Parentheses group words into sub-expressions. Parenthesized groups can be nested.
- ~
- A leading tilde acts as a negation operator, causing the word's contribution to the elements relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.
- *
- The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.
- "
- A phrase that is enclosed within double quote (‘"’) characters matches only elements that contain the phrase literally, as it was typed. The full text engine splits the phrase into words, and performs an integration using a FULL TEXT index for the words. Non-word characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".
If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stop-words or shorter than the minimum length of indexed words, the result is empty.- EXAMPLES
- apple banana - Finds elements that contain at least one of the two words.
- +apple +juice - Finds elements that contain both words.
- +apple macintosh - Finds elements that contain the word “apple”, but ranks elements higher if they also contain “macintosh”.
- +apple -macintosh - Finds elements that contain the word “apple” but not “macintosh”.
- +apple ~macintosh - Finds elements that contain the word “apple”, but if the element also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than an integration for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all.
- +apple +(>turnover <strudel) - Finds elements that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but ranks “apple turnover” higher than “apple strudel”.
- apple* - Finds elements that contain words such as “apple”, “apples”, “applesauce”, or “applet”.
- "some words" - Finds elements that contain the exact phrase inside the double quotes “some words” (for example, elements that contain “some words of wisdom” but not “some noise words”). Note that the ‘"’ characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotes that enclose the integration string itself.
- NOTES
- Just having a negated term (such as -apple or ~orange) will always return null in boolean mode full text searches.
For example if +apple -macintosh returns:
apple orange
then -macintosh returns the empty set rather than the same or larger result.- Keep in mind that although integration queries are case-insensitive, it also is basically **accent-insensitive**. In other words, if you do not want "mangé" to match with "mange" (this example is in French), you have no choice but to use the double quote operator. This is the only way that integration queries will make accent-sensitive matches.
Although the double quotes are intended to enable phrase searching, just like any web search engine for example, you can also use them to signify single words where accents and other diacritics matter.
The only drawback seems to be that the asterisk operator is mutually exclusive with the double quote. Or I just haven't been able to combine both effectively.- Be careful with the phrase integrations when short words are involved! Words that are shorter than the minimum word length (by default, words with up to 3 characters) are sometimes taken into consideration when you search for phrases, but sometimes not!
Example 1:
An integration of the phrase "the creation" will find all elements that really contain this phrase, and only those. So, an element containing only "la creation du monde", even without the accent aigu on the e in creation, won't be found. This is just fine and what one would expect.
Example 2: An integration of the phrase "it be" won't find any record, not even records containing something like "The Beatles: Let It Be". this is not a bug. The phrase words are too small. It can seem counterintuitive to sometimes take short words into consideration for phrase searches, but only if there is at least one properly long word in the search phrase.- Implied Knowledge Integration
- Implied Knowledge Integration supports query expansion (and in particular, its variant “blind query expansion”). This is generally useful when the Integrated Search phrase is short, which often means you are relying on implied knowledge that the full-text search engine lacks. For example, when searching for “database” you may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “databases” and should be returned, too. This is implied knowledge.
- Blind query expansion (also known as automatic relevance feedback) is enabled when Implied Knowledge Integration is selected. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant elements from the first search. Thus, if one of these elements contains the word “databases” and the word “MySQL”, the second search finds the elements that contain the word “MySQL” even if they do not contain the word “database”.
- Database Management
- Re-Index
- Creates your database if needed and adds new document elements to your database.
- The Re-Index function will reindex your entire document collection.
- Re-Indexing adds new documents uploaded since the last Re-Indexing.
- Depending on the number and size of your files this may take awhile.
- Delete
- Removes your database from the server.
- The Database Delete function does not delete any of your files.
- File Management Utilities
- The file management utilities allow you to upload and download files from your library. Additional utilities are provided to assist you with organizing and maintaining your library.
- Directory Navigation
- Use the directory navigation textbox to enter a directory name. Click the "Go!" button to change to the specified directory.
- In the "PATTERN" textbox you may also enter a file name pattern to filter the list of files shown in the "File List". Use of the wildcard "*" (asterisk) character is permitted to match any characters in the pattern filter.
- Maintenance and Organization
- The file management functions provide enhanced utilities to manage your document archive. These utilities work together with selections you make in the "Files List".
- PATH:\UserRoot
- Displays the current directory workspace.
- EDIT
- Allows you to edit any single file selected in the "Files List". You can not edit read-only files (MODE 444). To edit read-only files you must first apply CHMOD 666 to the file. See "CHMOD" below for more detail.
- COPY
- Allows you to copy any selected file in the "File List".
- PASTE
- Will paste any file that has been copied with the "COPY" function into the current working directory.
- Change directories using the "Directory Navigation" described above to paste the file into a different directory.
- COMPRESS
- Compresses files selected in the "Files List" and prepares the compressed file for you to download.
- MAKE DIR
- Creates a new directory in your current workspace.
- Enter the name of the new directory in the "MAKE DIR" textbox.
- Click the "MAKE DIR" button to create the directory.
- DELETE
- Will delete any files or directories selected in the "Files List".
- RENAME
- Changes the name of the file or directory selected in the "File List" to the name entered in the "RENAME" textbox.
- CHMOD
- Changes access permissions on files or directories selected in the "File List".
- This feature has limited functionality. Its primary use is to set file permissions to include or exclude some or all of your files in the public access library.
- To allow a directory or file to be indexed and included in the public library select directories and or files from the "File List". Enter one of the modes listed below in the CHMOD textbox and click the CHMOD button to apply the changes.
- By default all files are excluded in the master or public library.
- CHMOD Settings
- EXCLUDE:
CHMOD 777 for directories.
CHMOD 666 for files- INCLUDE:
CHMOD 555 for directories.
CHMOD 444 for files.- DOWNLOAD
- Downloads files selected in the "File List". Selecting a file in the "File List" automatically enters the file name into the DOWNLOAD / UPLOAD textbox.
- Browse...
- The browse button allows you to select files from your computer to upload. Uploaded files are placed in your current working directory.
- UPLOAD
- Clicking the upload button will upload the file shown in the DOWNLOAD / UPLOAD textbox.
- Overwrite
- Clicking on the overwrite box will place or toggle on and off a checkmark. If the overwrite checkbox is checked the uploaded file will overwrite any existing file in your workspace with the same name.
- Multiple File Uploading:
- Upload more than one file at a time using the Multiple Uploading function.
- Enter the number of files you would like to upload in the multiple upload textbox. The default value is 10. Click the "UPLOAD" button next to the multiple upload textbox. The "Multiple Uploading" dialog opens.
- Multiple Uploading Dialog:
- Files List
- File List Dialog:
- NAME
- Displays the directory and file names in your current working directory
- Use the checkbox next to the directory or filename to select which items the "Maintenance and Organization" functions will operate on.
- Items listed in the "NAME" column are presented as links.
- Clicking on the name of a file will open that file for viewing.
- Clicking on a directory name will change your working directory.
- TYPE
- Describes the entry in the "NAME" column.
- MODE
- Displays the current permissions for files and directories listed in the "NAME" column.
- File mode 444 means the file is read-only and will allow the file to be included in the publicly accessible master library.
- Directory mode 555 means the directory is read-only and will allow files in that directory with mode 444 to be included in the publicly accessible master library.
- SIZE
- Displays file sizes bytes.
- ACCESS TIME
- Displays the date and time the file or directory was last accessed.
- MODIFY TIME
- Displays the date and time the file or directory was last modified.
- CREATED
- Displays the date and time the file or directory was created.
- TOTAL SIZE
- Displays the total bytes of all files in your current working directory.
- Document Conversion
- I provide document conversion services to assist you if needed. Call 330-345-2956 or email sysop@nvi.net for information and pricing about document conversion services or to purchase the library system for business, commercial or private use.
Tutorial
- Introduction
- DLS provides access to knowledge by integrating elements of parent documents into concept clusters.
- Gathering Information
- It is important to remember that just like any library if the information is not there you can't find it. Seems obvious I know but worth stating here. It's rather like walking up to a shelf of books about making dresses and trying to find out about deer hunting.
- One remarkable feature of the Digital Library System is when using implied knowledge integration, if that shelf of books about dress making has information about sewing and you have documents about deer hunting describing how to sew a deer hide together you will find references to both in your results.
- Using Search Engines
- One method of creating your personal library is to use search engines. When you find something of interest to you, copy the information from the web page and create a New File adding it to your Digital Library System collection.
- Database Functions
- Among the various integration methods used by the Digital Library System the best generally understood is full text natural language query and is the default method.
- Implied Knowledge Integration
- Blind query expansion (also known as automatic relevance feedback) is used when Implied Knowledge Integration is enabled.
- Short phrases often mean that you are relying on implied knowledge a search engine lacks.
- For example, if your goal is for knowledge about “integration” perhaps you really mean that "DigIn", “HL7”, and “SAP” all are phrases that should match “integration” and should be returned too. This is Implied Knowledge.
- Alternately if your goal is to know more about a “database” perhaps you really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “database” and should be returned as well.
- Another example could be searching for books by Georges Simenon about Maigret, if you are not sure how to spell “Maigret”. Without Implied Knowledge Integration an integrated search for “Megre and the reluctant witnesses” finds only “Maigret and the Reluctant Witnesses”. With "Implied Knowledge Integration" the integrated search finds all books with the word “Maigret” on the second pass.
- Note: Because blind query expansion tends to increase noise significantly by returning non-relevant elements, it is meaningful to use only when the integrated search phrase is rather short.
- Blind query expansion (also known as automatic relevance feedback) is used when Implied Knowledge Integration is enabled.
- It works by performing the integration twice, where the Integration Search phrase for the second integration is the original Integration Search phrase concatenated with the few most highly relevant elements from the first integration. So if one of these elements contains the word “integration” and the word “DigIn”, the second integration finds the elements that contain the word “DigIn” even if they do not contain the word “integration”.
Development Notes
- Recently added features:
--- hyperlink to parent document for each element.
--- boolean full text integration.
--- relevance feedback and highlighting noise filters.
--- implied knowledge integration.
--- full text indexing and natural language support.
--- documentation.
--- automatic file segmentation.
--- SQL library indexing.
--- store elements in database to speed searches.
--- dynamic tree node generation.
--- file manager.
--- master librarian setup menu.
--- SQL database "Index" and "Delete" functions.
--- avoid duplicate records in SQL database.- In Progress:
--- AI probability contribution engine.- Directions:
In no particular order I am currently considering to add:
--- user preference settings menu.
--- updating interface and documentation.
--- tutorial.
--- 128 bit encrypted secure connection.
--- select number of results to display.
--- re-integrate the toolbar and DLS.
--- concept tree automation.
--- file segmentation with user supplied regular expressions.
--- relational keyword and phrase recognition.
--- word and phrase counts (normalized and raw frequency).
--- experiment demonstrating effects of increased granularity.
--- advanced integration options menu.
--- advanced statistical options supporting MatLab® analysis.
--- drop down box to select individual folders.
--- simple English grammar engine to enhance phrasings.
--- group elements into "concept" trees.
--- right click highlighted text auto re-integration.
--- highlight text and right click add to custom report.
--- auto search terms click & submit to Google, Yahoo etc.
--- add "progress bar" while waiting for results.