Data Handling
Organizing Site Information for Collaboration and Re-Use
The work of understanding the ways that things and conditions in a place affect eachother requires information form many sources tobeorganized. The work usually results in new data sets and documentation will be created. This is interdiscipliary work. Sucessful projects are often shared and carried forward by future collaborators or the original author.
If there is no plan for organizing project resouces from the very beginning, it is usually the case that the source files, tools, intermediate data, and final project data-sets and metadata will become mixed up to the extent that it is easier to throw it all away and begin from scratch rather than to dis-entangle what has taken so much work to put together.
This page introduces a generic project file organization concepts with a template that is easy implement at the outset of a GIS project. Theis folder scheme has the follwing advantages:
- Analysts do not need to waste time trying to decide the best place to store files.
- Analysts do not waste time trying to remember where files were stored
- Intermediate, experimental data does not get mixed up with essential source data or final data products.
- The file organization makes it easy to archive versions of the project for back-up and sharing.
- Projects organized this way can become very deep in terms of the number of data sources and mudular sub-projects, without becomeing more confusing or difficult to manage.
- Projects may be developed independentaly and then snapped together. Multiple independent projects can share data sources and tools.
Related Pages
Begin with a Shared Strategy
A project may have several goals. Typically, a project results in proposals that are conveyed through presentation boards, slide shows and video. In some cases, a project leader or firm has an interest in preserving resources that are developed over the course of the project so that the site study may be re-visited later -- even if the individuals responsible for the first phase have moved on. If potential re-use of project information is a goal, it is beneficial if contributors practice a common set of procedures and milestones for organizing information. A few of these procedures are discussed below. If the project leaders do not make these goals explicit to all contributors at the beginning it is unlikely that the much will be recoverable once the project is finished.
A Lifecycle View of Project Information
At the beginning of a research project, there is a focus on aggregating information together from various sources. At this phase, several researchers may be involved in the compilation effort. At intervals, researchers may pool their research compilations together as a combined team repository. As the project moves ahead, individuals will develop working documents that refer to and incorporate source documents. In the case of GIS Maps, working Map documents reference feature information from the source folder; or they may be Adobe Illustrator documents that reference images in the source folder. Since multiple researchers may have their own collections of working documents, it makes sense to keep these in folders separate from sources. This way, the communal source folder can be updated and replaced wholesale without disrupting anyone's work. Where there is more than one collaborator working on a project, each person's work folder may be distinquished with the author's name or initials. As the project matures certain working documents might be put together to create project presentations. The presentations folder contains finished pdf documents, powerpoints or videos that are static -- that is, they are final and not intended to be edited.
Documentation of Sources and Presentation Files
It is a natural tendency for researchers to discover data and to copy it to their local file system without a thought to collecting information about the sources, methods, attribute codes and other metadata. Information compiled in such a haphazard way may be useful in the short-term for creating ad-hoc illustrations. In terms of a collection intended for longer-term use, files that have no reference information are practically useless. Remember that all information is used in professional and scholarly reports and presentations must carry attribution and publication references for all source material. This sort of information may be available when the sources are discovered, and this is the time when the information should be saved with each source document. It is nearly impossible for a third party to recover this information if there is no record fro where the information was obtained.
Individual researchers and research supervisors should be very careful to make sure that the following information are saved with each source document:
- Issuing Source: Where was the document obtained? If this was from the web, provide the URL
- Issue Date: When was the document obtained?
- Primary Source: Who is the party who takes credit and the blame for the accuracy of this information? This is often a different party from the issuing source.
- Time period reflected in the Data: IN the case of a written document, this may be the date that the original was first issued. IN the case of a photograph it would be the date of capture. Some databases actually cover a range of dates.
- Name of the Collector: Who collected this document?
- Related Files: Are there other files associated with tis resource? It is particularly useful to state whether there are data dictionaries and or lookup tables.
- Notes: It is useful to state whether there are licensing issues related to the resource.
This information may be associate with each file or dataset via a plain text document that has the same name prefix as the resource being described. IN the case of a a folder full of resources from a common source, the common source information may be conveyed with a single text file named readme.txt included in the folder. There may be other useful items of metadata that you could add, but this minimal set is much better than nothing. Some GIS data have much more elaborate metadata that cam be captured, or that may be embedded into the data. The same may be true of PDF documents, but this is not always the case and it is the responsibility of the person who gathers the data to make sure that there is enough metadata for each piece that it can be used responsibly. Remember plagiarism is a crime!! A deeper discussion of Metadata can be found in the document, Understanding GIS Data and Metadata in a Decision-Making Context
Studies Related Geographically
It is often the case that a study of a region may involve several sub-studies. In this case, it is useful to employ a hierarchy of collections. Source material that has regional scope may be collected in a regional sources folder, while each local study may have its own self-contained tree of sources, presentation and work folders as shown in figure 5, above.
Role of Project Curator
In collaborative projects, one member of the team should be designated as the curator of the shared folder. The project manager makes sure that the accumulation of data in the shared folder is orderly and that all of the resources are identified. The project manager is responsible for making regular backups of the shared folder and ultimately for copying the collection to the read-only institutional repository when the project is finished. To assure the integrity of the project folder, a project manager may want to make the main folder read-only to project participants, with one read-write staging folder for uploads by team members.
Parting Thoughts
At any given moment, these rules may not be the most expedient way for an individual to accomplish the task at hand. And yet, the accumulated consequence of everyone doing the most expedient thing leads to grief for the individual who can't recover his/her own work and a total loss to the enterprise and wasted work for team members who must repeat work that has already been done. No claim is made that these recommendations are perfect. Lets consider them as a starting place that we can work with and improve. As our collective experience grows, these guidelines will be extended or altered based on thoughtful discussion of alternatives.