MCP: The Metadata Collection Parser
The Metadata Collection Parser is a Perl script that takes a collection of GIS metadata that might span several directories and subdirectories and creates a nifty hypertext catalog with various indexes, etc. It was written by Paul Cote GIS Specialist at the Harvard Graduate School of Design.For an example of the metadata catalog produced, click here
The reason MCP was created was to provide a way to catalog and recatalog geographic data easily, though it may be rearranged in different directory structures, or put onto CD. One big advantage of the catalogs produced, though they have many cross-referenced indices, they are nothing but hypertext which can be burned onto CD and perused with any hypertext browser.
The source for most of what appears in the catalog comes from mp-complient metadata files that are associated with each dataset in the collection, and index.htm files that are html-format readme files explaining the content of each subdirectory. MP does a lot of work for MCP, checking the formatting of the metadata files and generating html reports.
The cataloger parses each mp-complient file, and then creates the following:
In the midst of doing this, the cataloger calls on MP to create its comprehensive HTML format metadata, and a listing of any errors that it found in the metadata.
- A nice metadata summary for each dataset, including an easy- to-read attribute dictionary, including sub-tables for enumerated domain values. ( click here for an example.)
- Multiple indexes to the summaries, by directory, by theme keyword, or geography keyword. (Example)
Architecture and Dependencies
There are a few things you need to know, and a few resources you need to have before you can exploit MCP:
- FGDC Metadata Standards and the MP metadata parser
MCP is built on the FGDC metadata standard, and the file formats supported by the MP metadata parser.- PERL
MCP does a lot of parsing and indexing. PERL (a programming language by Larry Wall, provides the guts of the program. If you don't know perl, you should learn it. Please do not write to the author of MCP with perl-specific questions.
It easy to start learning MCP by running it against the sample metadata collection contained in the sample_md_tree directory. There are three things you will undoutedly need to change to make catalog.pl work on your system before running it on the sample data:
Before you run catalog.pl the first time, you should look through the sample_md_tree directory and take a look at what is in there. catalog.pl will create a bunch of other files, and you will understand the process better if you see the 'before picture.' If you are reading this too late, you can always unpack a new sample_md_tree from your tgz archive.
Now you should be able to run catalog.pl with the single argument being the name of the configuration file:
catalog.pl sample.conf
Now that you have seen what it does, I will leave it as an exercise for you to read sample.conf, and the various readme.htm and index.htm files in the sample directory to figure out their roles. You should then be able to run catalog.pl on your own directories.
A couple of related, maybe useful, not well documented arcview extensions: