Tuesday, September 4, 2012

Capturing Metadata from Folder Hierarchies

Document management solutions have come a long way from a time when dropping critical files into a network folder was state-of-the-art. Common solutions like Sharepoint, FileHold, LaserFiche, and others have been around for many years. Even with the availability of these systems, there are still large numbers of companies that store their documents in a hierarchy of folders.

Users of GUI operating systems like Windows have been trained for a few decades about how to use folders to store information. A big challenge with making the move to a real document management system is to get away from folder mind block. This is the condition where a user wants to have a folder for everything and put everything in its folder.

Using folders to organize documents creates a numbers challenges including the following:

  • The connection between the folder and document is tenuous. If the document gets moved or copied the information that the folder provided about the document is lost. 
  • The visual hierarchical nature of folders provides an impediment to storing the documents. The user must find the right spot to drop the file. Folders tend to look the same; choosing the wrong one is easy. Or, one slip of the mouse and the document is dropped in the folder next to the intended folder.

Document management systems (DMS) tend to use methods other than folders for storing and retrieving documents. Metadata is the most universal of these. Other names for metadata include tags, labels, and properties.

Metadata is information that describes a document. For instance, a document could be a project plan. If the project plan document had a metadata field describing the document type and the value was project plan, it would be very easy to find this document in a search for project plans. Additional metadata could include fields that describe the project name, client name, or whether or not the document was a final or a draft version.

Some metadata is explicit and some is implicit. Explicit metadata is expressly defined for the document and implicit metadata is derived. The previous metadata fields could all be considered explicit. Implicit metadata could include the file type (Word document, JPEG image, etc.), the number of words in the document, or the last user to modify the document.

When documents are moved from a folder hierarchy to a DMS, metadata can be implied from the structure of folders. It possible to preserve this implied metadata when documents are moved to a DMS. This enables users to find documents using the same information as before while using a more efficient document repository. The following example demonstrates this. The Excel document plan.xlsx in the legacy file share has the following path:

\\fileshare\Projects\Monkey Express\PRJ089\Project Management\Project Plan

From this structure we can imply the following metadata:
  • Client = Monkey Express
  • Project Code = PRJ089
  • Document Type =  Project Plan

A DMS has methods to import this metadata when the document is moved to the new repository. Once imported, a user can easily find all the project plans in the repository, only the projects plans for Monkey Express, or the specific project plan for project PRJ089 using the DMS search capabilities regardless of where the plan is stored in the DMS repository.

No comments :

Post a Comment