Potential contributor perspectives on desirable characteristics of an online data environment for spatially referenced data
First Monday

Potential contributor perspectives on desirable characteristics of an online data environment for spatially referenced data by James Campbell



Abstract
A significant body of spatially referenced, locally produced, small-scale data is not currently online and therefore is effectively unavailable to professional scientists and to the general public. If there were an online environment, a “Commons of Geographic Data,” where that data could be deposited or registered, what infrastructure characteristics might potential contributors find desirable in order for them to be willing to contribute their data without monetary compensation? Based on data preservation literature, this study hypothesized three such potential characteristics as desirable. Using a combination of qualitative and quantitative methods, we examined the desirability of these infrastructure capabilities in a non-statistical sample of potential contributors. The results of both the qualitative and quantitative research support the hypothesis. The results can provide guidance for those who may wish to design such a commons environment for small scale, locally generated, spatially referenced data in the future, and may also be of use to those that operate repositories of other types of data.

Contents

Introduction
Hypothesis
Method
Results and discussion
Conclusion
Limitations

 


 

Introduction

Data that is related to a particular geographic location is everywhere in today’s online world. Individuals and businesses use cell phone location services, Google Maps and other mapping services, and a wide range of other spatially-referenced data as part of their everyday routines.

Yet there is a potentially very valuable type of data that is not part of every day online life for one simple reason: it is not findable online. Small scale, locally generated, spatially referenced data sets could be of great value to researchers and to the general public if they were available, findable, and if conditions for their use were clear. At present, that is not generally the case for such privately held data sets.

There are many efforts underway to capture and make available large scale national and international data by governments and academic or professional organizations [1]. However, small scale data has largely been overlooked, even though it could be of use to professional researchers as well as to the general public.

There have been several recommendations to construct an online Commons of Geographic Data that would provide an environment where that data could be contributed with no special knowledge or skill or large commitment of time required on the part of contributors, yet would be “findable” by others using standards-based metadata search tools (National Research Council (U.S.), Committee on Licensing Geographic Data and Services, 2004; Onsrud and Campbell, 2007).

An online Commons of Geographic Data (CGD) would enable potential contributors of small scale, locally generated, spatially referenced data to make that data available so that others could use it. In the context of this study, spatially referenced data means any data that refers to a specific place, which includes a large majority of data today. Some examples might include a high school class project that locates and catalogs all of the trees over 15 feet tall in a small town; a homeowners’ association that monitors the water quality of the lake on which their property is located; a historical museum that ties its photographic images to their physical locations, a list of wheelchair accessible street crossing locations, or a weekly list of products available at a particular farmer’s market. Much of this type of local small scale data is generated and stored by private parties. It is stored on private individuals’ or local organizations’ computers and is not now publicly available online so that others might use it. It is, in effect, fully or partially “invisible.”

An online commons environment is one in which users do not have to ask for permission for using the data found there. The data owner has already granted permission, if permission for use is needed, through a “some rights reserved” license as long as the user respects any conditions put on the use of the data by the owner/contributor. Creative Commons licenses are examples of “some rights reserved” licenses.

At present, no such Commons of Geographic Data exists for small scale, locally generated, spatially referenced data. If a group were contemplating the design of such a commons environment, a significant question would arise: what characteristics might potential contributors find desirable that might help motivate them to make their data available through an online CGD environment?

Potential contributor motivation

Any discussion of possible criteria for constructing a commons type repository for spatially referenced data brings up the question: if such a repository were built, would people contribute to it? This specific question has not been tested to date and the focus of this paper is not to review the literature in this area. We note, however, that there is a good deal of evidence from volunteer motivations in general, and from online volunteerism in particular, to suggest that people who own spatially referenced data would be willing to contribute it to an online commons-type environment.

People volunteer their time, skills, and resources every day in a wide range of domains ranging from volunteering in youth oriented activities (Riemer, et al., 2004) to contributing to Wikipedia (Nov, 2007) to helping out as a tourist guide (Anderson and Shaw, 1999) to helping predict protein structures online (Cooper, et al., 2010) to contribute content and tools online (McKenzie, et al., 2012) and to dozens, if not hundreds, of other activities. In short, there is an extensive literature on this subject.

Perhaps the most relevant comparison lies in the area of what has come to be called Volunteered Geographic Information (VGI) (Goodchild, 2007). The explosion of effort in this area in the past few years provides compelling evidence that data owners would be likely to volunteer their data. The real question is under what circumstances contributors might be willing to contribute their data. That is the focus of this research.

Desirable characteristics of data repositories

There have been a number of studies and recommendations about desirable characteristics for the preservation of data in online environments such as the Report of the workshop on opportunities for research on the creation, management, preservation and use of digital content (Institute of Museum and Library Services, 2003), and To stand the test of time: Long term stewardship of digital data sets in science and engineering (Friedlander and Adler, 2006).

Three key recommendations emerging from these and other studies are that these online environments should make it possible (1) to clearly specify usage rights, (2) to search for and discover data using standards-based metadata, and (3) to evaluate data for suitability for a user’s purpose.

These may seem like commonsense ideas, and they are. We might assume that any potential data contributor to a CGD would agree with them. But that would simply be an assumption. Assumptions may be right, or they may be wrong: without empirical evidence, there is no way to judge. Research is necessary to confirm or refute these, or any, assumptions.

This study sought to empirically explore whether potential contributors to an online commons environment for locally generated, spatially referenced data found these three recommendations desirable. While not the purpose or focus of this study, the results could be useful to those who design institutional repositories at universities and colleges, as well as to others who operate or may wish to establish online data repositories for other types of small scale data.

Specifically, this research addresses the following hypothesis.

 

++++++++++

Hypothesis

Potential data contributors of small scale, locally generated, spatially referenced data would be willing to consider contributing their data to an online data repository with no financial compensation if such a repository included:

(a) a simple, clear licensing mechanism so that there is a way to choose which usage rights the owner is willing to pass on to users and which usage rights the owner wishes to retain, if any [2];

(b) a simple process for attaching descriptions to the data. These “plain English” user descriptions would be processed by the system into standards-based metadata without requiring knowledge of metadata systems or controlled vocabulary terms on the part of the contributor; and,

(c) a simple post-publication peer evaluation/commenting mechanism that would both provide feedback for contributors, and provide information on quality and suitability of use for future users.

 

++++++++++

Method

In order to test this hypothesis, we used a combination of qualitative and quantitative research procedures (Onwuegbuzie and Leech, 2004; Ragin, et al., 2004). Personal interviews were conducted with 10 people who either had generated data of their own, or who had the authority on behalf of the groups they represented to make data generated by the group available for use outside of the group [3].

To confirm or refute the findings from these qualitative interviews, we designed an online questionnaire based upon the results of the interviews, and compared results from that questionnaire with the results from the interviews.

In order to minimize bias introduced by information discussed in the interview itself, interviewees were given short pre and post-interview questionnaires to see if their opinions had changed about any of the topics discussed in the interview.

Interviewees and data types

The interviewees and/or the organizations they represented held a variety of different types of data, all of which was locally generated, small-scale, and spatially referenced in some way, and none of which was available online at the time of the interviews. The only selection criteria for an interviewee were: a willingness to consider making their data available in an online repository without any financial payment; personal ownership or legal control of that data; and, a willingness to meet with a researcher in person for up to one hour.

The interviewees so chosen are not in any way a statistically representative sample of potential data contributors to an online commons environment for spatially referenced data. The major reason for not attempting to select a statistically representative sample of potential contributors is that the number of such contributors is unknown and probably unknowable. Thus, we conducted qualitative in-depth interviews, and then used an online quantitative survey to support or refute the qualitative findings. The goal was to produce findings that would be informative, even though not “proven” in a statistical sense. The hope is that the findings would be useful for future designers of a Commons of Geographic Data type online environment, if one should be constructed.

Interviewees were selected using a “snowball” technique. Initial interviewees were suggested by people interested in data collection who were located in geographic areas accessible to the interviewer. Those who participated as interviewees recommended other potential interviewees. As chance would have it, the final group of ten interviewees turned out to be quite diverse in the types of data that they owned or controlled.

Four of the interviewees were either paid or volunteer leaders of local groups concerned with environmental and/or land use matters. Among them, these groups collected data on water, soil, and air quality; invertebrate populations; locations of threatened species; maintenance schedules for trails on preserved land; owner granted easements on private land; and other similar types of data.

One interviewee served on a town recreation committee that focused on recreational uses of water bodies in the town, and had data on residents’ recreational interests as well as on water quality in local lakes. One interviewee was a graduate student working on a project involving ocean currents and ocean water characteristics at different depths. One high school teacher taught use of GIS software for mapping social data such as street light locations and their possible correspondence to crime statistics. One interviewee was an author of books about birding who combined the author’s original data on bird sitings with state habitat maps. Another interviewee worked in an organization with an extensive collection of photographs of historical maritime objects related to specific ports. One worked with a local historical society on locating, describing, photographing, and mapping gravestones in town cemeteries.

For those interviewees working with organizations, in all cases, the organizations are non-profit, all with less than five paid staff.

Seven of the interviewees were from Maine, one from Massachusetts, one from Pennsylvania, and one from North Carolina.

Qualitative data collection process

The purpose of these qualitative interviews was to test whether the hypothesis above would hold. All interviews were conducted from the same interview instrument by the same interviewer. The interviews were transcribed and coded, and then the transcripts were checked against the voice recordings for accuracy. A summary of key points of each interview was then sent to the interviewee for correction and confirmation. None of the interviewees who responded submitted any corrections other than spelling errors.

Quantitative data collection process

Based on the information generated in the analysis of the qualitative data, an online questionnaire was constructed. The goal was to see if others who owned or controlled spatially-related data would agree with the responses of the 10 interviewees regarding the hypothesis points. The author sent an invitation to participate in the research to listservs concerned with geographic information of different types, specifically to members of the Global Spatial Data Infrastructure Association and to members of the Maine Geolibrary listserv. In addition, printed flyers inviting participation were distributed at a conference of the Maine GIS User Group and the Maine Municipal Association.

Many users of spatially-referenced data are also creators of that data, as are many users and creators of other types of data or information on the World Wide Web. This phenomenon has been dubbed “produsage” by Axel Bruns (2008). In this framework, those who both produce and use data are referred to as “produsers.” This is similar to the situation in current media production tools where there is a line of products aimed at “prosumers,” people who both produce and also consume media products such as music or video, often in an online context.

Given this “produsage” tendency online, the survey instrument used the first question to separate those who were producers of data, or who had significant influence on data sharing in their organizations (potential contributors), from those who considered themselves only potential data users.

There was no attempt to ensure that data owned or controlled by respondents was locally generated or privately owned since the complications of trying to pre-qualify potential respondents while simultaneously encouraging them to take a very short, “simple” survey was felt to be impractical. The fact respondents stated that they owned or controlled data rights and would consider making their data available in an online environment without financial compensation qualified them to respond to the survey.

The types of data that respondents owned or controlled included location and contents of waste disposal containers, location and types of health centers, vegetation distribution, land ownership, and many other types of data. While no residence location information was requested from respondents, a number mentioned their geographic locations, several of which were outside of the United States.

All of those who identified themselves as potential contributors also considered themselves potential users. If they completed the entire survey, they answered 20 questions. Of those questions, six requested text-based answers. The other questions required either yes/no responses, or responses rated on a 1 to 5 Likert scale. Those who identified themselves as not owning or controlling data were asked to answer 11 questions, of which three requested text-based responses. They are not included in this study.

As in the qualitative portion of the research, the author made no attempt to construct a statistically valid sample of all potential contributors or users of an online commons repository since that universe is simply unknown. Rather, the goal was to gather a reasonable number of responses from self-identified potential contributors to either validate or invalidate the qualitative research findings. Survey respondents were asked for no demographic or other potentially personally identifiable information, and were assured that all responses were anonymous and confidential.

There was a total of 197 click-throughs from the survey splash page to the actual survey instrument. Each click-through response was given a specific ID for analysis purposes.

Of 197 click-throughs, 120 identified themselves as owners/controllers of data. of those, 100 completed all questions, 10 answered some of the questions, 10 answered none of the questions. For all of the quantitative results discussed below, n=110 unless otherwise noted.

 

++++++++++

Results and discussion

The interviews were recorded, transcribed, and then coded. Since all interviewees were asked the same set of questions, initial top-level codes were based upon those questions. Codes included conditions (which owners might put on use of contributed data); metadata (short description, keywords, etc.); evaluation (valuable or not, amount of time that a contributor would spend, etc.).

As additional aspects of responses appeared, sub-categories for the major categories were added to make meanings more precise, and a few additional top-level codes added for topics that emerged.

Based upon the responses in the interviews, a set of questions were developed that could be posed in an online questionnaire to ascertain whether other potential contributors who completed all or some of the online questionnaire would support or not support the views of the interviewees. The questionnaire responses were then tabulated and compared with the interview results.

We review the results by each hypothesis sub-part.

Hypothesis Sub-part (a): a simple, clear licensing mechanism would help motivate potential contributors to consider contributing their data to an online commons-type repository [4].

Qualitative results

Three interviewees said that licensing was not an issue for them or their organizations since they would not put any conditions on the use of their data if they were to post it online. However, two of the three added that while there was much data they would be willing to make publicly available with no restrictions, there was also some data they might not wish to share in a publicly available online environment. This was also true of several other interviewees as well. (See discussion on withholding some data below).

All of the other interviewees indicated that they or their organizations would want attribution if their data were publicly available online, although they recognized that it is difficult to control what people do with information once it is online. As one person noted: “yea, if they were to use it in a publication or on a Web site, I would ideally like to see some credit for it but I am not going to worry about it too much because it is not something that I have a lot of control over.” None of the interviewees said they would absolutely withhold their data if attribution could not be guaranteed but seven of 10 indicated that attribution, along with a way to ensure that it was given, at least in the first instance, would be desirable to them or their organizations.

Three respondents also indicated that they would be happy to make their data available for non-commercial use. If users wanted to use the data in a commercial context, then they would want to be contacted and negotiate some type of compensation with a potential commercial user.

Half of the interviewees had a concern which no “some rights reserved” licensing scheme at present addresses, nor perhaps is it a concern that is addressable through licensing. They wanted some type of assurance that their data would be used properly. By “properly,” they meant slightly different things but the core concern was summed up nicely by one interviewee: “I think we would probably want to ensure some kind of conditions that protect the integrity of the data. I don’t think we would be inclined to worry about commercial use or that sort of thing. I think we would be mostly concerned with are these data being used properly and are they not being taken out of context or are they potentially being used to misrepresent a situation where the data are not used in a way that we think are sensible or consistent.”

The same person indicated that if a user at home came across this group’s data and misunderstood it, that would not be a serious cause for concern: “I think we would probably mostly be concerned about when and how the data is used in some kind of a publication. If someone is just sitting at their home computer and looking for data and drawing their own conclusions about things, I don’t think we would be as concerned ... I don’t think we would attempt to try to control every pair of eyes looking at that data, saying oh no you are not understanding this properly. I think the concern would be a newspaper article ... .”

The issue for those with what we might call a “downstream quality control concern” is that once their data is out of their control, it might be “corrupted or somehow altered and misrepresented,” as another interviewee put it. Even those who were not concerned about attribution and did not see any reason to put a license on the use of their data shared a concern that the data could be misused. One interviewee spoke of putting an “advisory” on the owner’s data that said, in this particular case: “don’t use irresponsibly. That is, don’t go to these particular zones and stress the birds.”

In some cases, the concern was so strong that it resulted in interviewees reporting they would choose to withhold data out of fear that it would be used improperly and/or misinterpreted, or was so sensitive that releasing it without knowing who might use it could have adverse effects. These were mainly cases of land trusts or other environmental organizations. In some cases, they had developed information about locations of endangered species. In other cases, they had negotiated easements or other land use agreements with landowners which the interviewees felt could create problems either for the land owners or for the organizations if they were made available to the public.

None of the questions in the original interview protocol spoke specifically to this concern. It emerged in three interviews during a general discussion of what conditions, if any, potential contributors might place on the use of their data in an online commons environment. As a result, we added a specific question about types of data potential contributors might choose to withhold to the online questionnaire.

While this concern arose spontaneously among this particular group of interviewees, it has been a concern in institutional settings, for example, among cultural institutions (Eschenfelder and Caswell, 2010). That concern is becoming more acute in the online world.

Quantitative results

Results from response to the online questionnaire are largely consistent on this topic with those gleaned from the personal interviews.

Respondents were asked to reply to a series of questions that began with: “If you were to consider making your data available online so that others could access and/or use it, please indicate how important each of the following would be in your decision whether or not to make your (or your organization’s) data available.”

One hundred and ten respondents indicated that they had data that they might consider making available online and answered at least one other survey question.

Respondents were asked to rate each item on a scale of 5–“Very Important,” to 1–“Not Important At All.”

The first item concerned “Attribution.” Note that these and all following percentages are rounded. The raw number is noted next to each response description, the percentage indicated in Figure 1.

 

Attribution

 

The question of non-commercial versus commercial use of contributed data arose in the interviews. As a result, a specific question addressing that issue was included in the questionnaire. Respondents were asked how important being able to make their data available for non-commercial use only would be to them.

 

Importance of non-commercial use only

 

Only three of 10 interviewees specifically mentioned non-commercial use as a use concern. This differs from the 62 percent of respondents to the questionnaire who would find being able to specify non-commercial use to their data “Very” or “Somewhat Important”. This discrepancy could be due to the fact that there was no specific question about non-commercial use asked of the interviewees, only general questions regarding any conditions they might put on the use of their data. The fact that three interviewees spontaneously mentioned this concern led to it specifically being included in the questionnaire. It is interesting to note that the 35 percent of questionnaire respondents who considered it “Very Important” to be able to indicate use of their data was only for non-commercial purposes matches reasonably well with the 30 percent of interviewees who spontaneously expressed this concern.

Two other concerns arose during in the qualitative analysis of the interview data and both were included specifically in the questionnaire.

The first involved concerns about data being corrupted or misused because of a lack of understanding. As noted above, there is no license of any sort that can guarantee that data will not be misunderstood. However, there are “some rights reserved” licenses which prohibit modifying the data as a condition of the license grant. Therefore, respondents were asked how important being able to specify “User may use the data but not modify it in any way” would be.

 

User may use not not modify the data in any way

 

Over half of respondents seemed to share a concern that their data not be manipulated. We did not ask specifically why, but it is likely that concerns over data integrity and possible corruption of data, as revealed in the interviews, may also have been a concern of the questionnaire respondents.

 

Are there some types of data that you would withhold

 

As noted, the second concern that arose during the interview phase of the research involved withholding some data which potential contributors considered sensitive. To explore this issue further, the following question was included in the online survey: “Is there any type of data which you possess that you would NOT be willing to make available in an online commons-type repository? If so, please briefly describe it and indicate why you would not make it available.”

Since the questionnaire group was much larger than the interviews group, the types of data and rationales for withholding some data varied across a broader range than those mentioned specifically during the interviews. The bulk of the reasons for holding data back mentioned by questionnaire respondents fell into the following categories:

  • Homeland Security;
  • financial privacy, e.g., tax, income, property information;
  • personal privacy, e.g., health related information;
  • some part of data purchased from or held by another owner;
  • endangered or sensitive species information;
  • incomplete data or not of high quality;
  • high level of expertise required to understand properly and thus could be misinterpreted;
  • part of ongoing academic research and researchers do not want to be “scooped” on their research; and,
  • hope of generating future income or cost reimbursement.

Among these reasons are all of those expressed by interviewees, as well as a number of additional ones.

Hypothesis Sub-part (b): a simple process for attaching descriptions to the data. The goal would be to make the data easier for users to discover.

Qualitative results

Metadata is often the weakest part of data management. Developing full metadata descriptions using the Content Standard for Digital Geospatial Metadata (CSDGM), the Federal Geographic Data Committee standard, involves dealing with over 300 fields, as does the international ISO-19115 Geographic Information-Metadata standard. Few professionals, and almost no non-professionals, even attempt to provide complete metadata descriptions for spatially-referenced data sets. Yet using metadata that conforms to international standards is key to making data widely visible in an organized way. This is in contrast to, for example, non-standard tagging in applications like Google Earth or Flicker which international search protocols such as OAIS compliant search tools are not able to harvest and make available.

Potential contributors to an online commons environment would not be expected to create standards based metadata for 300 fields. However, there is a more limited set of ISO-19115 core metadata items which would be practical to have contributors provide, and which could be done in a few minutes without the contributors having any knowledge of metadata or of metadata standards.

During the interviews, we were interested in discovering whether interviewees had already developed basic metadata, i.e., descriptions of the data file contents and keywords that could serve as finding aids in an online environment. If they had not, we inquired whether they felt it was worth investing time and resources to do so, and how much time they would be willing to invest to provide such information.

None of the 10 interviewees had provided either short descriptions of the files that contained their data nor had they attached any keywords to the files. All of the interviewees were aware of the usefulness of metadata but none had found a compelling reason to create either file descriptions or keywords for their files. As one interviewee jokingly put it: “I am an evil person, I have not done the metadata.”

This absence of metadata did not cause any operational difficulties locally since the data was either owned and used by an individual or by a very small group of people who all knew what the data was about or could simply ask a colleague if they did not. None of the data was online at the time of the interviews so making it more discoverable had not been a priority.

All of the interviewees recognized the value of having useful metadata in an online environment, and all could quickly identify keyword terms that would be appropriate for their data.

The question of how much time they might be willing to invest in creating metadata for their data files if the files were to be placed in an online environment varied. In most cases, interviewees felt that since they were individuals or worked with very small organizations, they would have to believe that there would be a use for their data. They then would have to evaluate for themselves or with their boards or colleagues, in the case of organizations, whether investing that time would further their missions or purposes.

Even with that caveat, eight of the ten interviewees would be willing to dedicate from a half-hour per file to “as long as it takes” to provide file descriptions, keywords, and location information for their data. The other two respondents felt that once they had set up a system, the nature of their data was such that it would take only five minutes or so to provide that information per file.

In sum, all interviewees recognized the value of providing metadata for their files if their data were to be placed in an online environment, and they would be willing to dedicate time and resources to do so if they were convinced that others might value and use their data, and that the knowledge required to input the information was minimal. However, none of the interviewees had actually already created metadata in the off-line environments in which they worked at the time of the interviews.

Quantitative results

Questionnaire respondents were asked how important the “Ability to attach keywords or other descriptions to your data so that further users could find it more easily” would be in an online commons-type environment.

 

Importance of ability to attach keywords or other descriptions

 

Of the 110 respondents that answered the previous question, 102 also answered a text question asking them how much time they would be willing to devote to uploading and describing their data. As with the interviewees, the spectrum was wide, ranging from five minutes to “as much time as it would be necessary to do so.” A few respondents said they had already created metadata and one said the process would be automated. The great majority of those responding indicated they felt that metadata for their data was important and that they would devote the time necessary to provide it.

As with the interviewees, questionnaire respondents strongly recognized the value of adding metadata to their files if they were to make them available online, and almost all would be willing to take at least some time to provide metadata.

Hypothesis Sub-part (c): a simple post-publication peer evaluation mechanism that would both provide feedback for contributors, and provide information on quality and suitability for use for users.

Qualitative results

Nine out of 10 of the interviewees viewed the ability of users to comment on data to be a positive factor in potentially placing their data in an online commons type environment. The other interviewee said that it would not make much difference because “I don’t necessarily know why but I would tend not to trust, you know, people’s review of my data.”

The others, however, saw that capability as a definite plus. There were suggestions that there be some sort of registration system so that commenters would be registered, even if they used a screen name rather than their own name, to minimize abuses of an open commenting system. Interviewees also indicated that, if possible, they would like to know something about the commenter’s use of their data to help them judge whether the comment was appropriate to their data. Interviewees felt that they had developed their data for a particular type of use and they were interested in receiving feedback on it when it was used in a similar context. Several also indicated an interest in being able to contact a commenter if what the commenter said could be helpful for improving their data or suggested an additional use.

The advantages of user feedback from a data contributor’s perspective included knowing that someone else had found their data useful for particular purposes, receiving suggestions or questions that they might not have thought of themselves, and using comments by users to improve their data.

One additional positive mentioned by two of the interviewees highlighted the value of knowing that one is part of a larger community with similar interests: “... the connectivity, the sense of networking and the sense of camaraderie almost that sharing information could provide or does, at least on paper, seem to provide is in itself a good, it is a social community kind of good and that to get some feedback that says, ‘hey, we are using your data’ would feed that sense that you’re part of something bigger than your own effort. And I think that would be helpful and inspiring so to be able to get that feedback, you know, you have to have some venue where that can happen.” In this person’s opinion, a peer commenting mechanism could support that sense of community, especially for those working in small non-profit organizations.

Interviewees found a peer evaluation/commenting system to be a very desirable characteristic for an online commons-type data environment.

Quantitative results

Questionnaire respondents who identified themselves as owning or controlling data overwhelmingly felt that the “Ability of users to comment on the suitability of the data for their uses” would be important.

 

Ability of users to comment on suitability for use

 

Although the questionnaire did not ask for reasons why this capability might be important, the numbers support the overall consensus of the interviewees that a commenting/evaluation capability would be valuable from the perspective of potential data contributors.

Repository maintenance

Qualitative results

While not a specific sub-part of the hypothesis, interviewees were asked in a general way about desirable repository characteristics: “Would it make sense to you to make your data available in a central location on the Web so that people who might wish to use your data could do so without contacting you directly?” If the answer was yes, the follow-up question was “could you describe any characteristics of such a central location that would encourage you to make your data available there?”

In response to this question or in other parts of the interviews, several interviewees brought up concerns about the nature of a hypothetical online commons repository. While they recognized potential value in such a repository, they also realized that it would take effort by themselves or their organizations to prepare and upload their data. As one person noted: “it would take a huge effort for us to get it into a consistent format to upload it ... .” While interviewees were open to making that effort, they felt that there should be certain assurances about the repository to justify the work involved.

One concern focused on how such a repository might look to users and whether there would need to be different sections, e.g., a section specifically for student generated data so users would know the data might not be of professional quality. There were also comments about whether or what kind of guidelines for responsible use of the data there might be. But the largest operational concern was the longevity of such a repository.

Since almost all of the interviewees indicated it would take additional work to prepare and upload their data, most felt that there would need to be some assurance that the repository would be maintained over time if they were to make the effort necessary to contribute their data. One interviewee expressed the concern in these words: “I fairly frequently see this ‘start up, some interest, and then you know, decline’ profile and because of that I guess I tend to be a little nervous about starting up or being part of the start up because I don’t know whether my efforts at the front end are going to result in the kind of long-term engagement that I was anticipating or hoping for.”

Based upon the strength of this concern about the longevity of a repository, a question was added to the survey on this topic.

Quantitative results

Survey respondents were asked how important “Long term maintenance of your data on the online site” would be to them. Overwhelmingly, long term stability matters to potential data contributors.

 

Importance of long term maintenance

 

This response mirrors the response of the interviewees as to the importance of long term maintenance of any commons type online environment.

 

++++++++++

Conclusion

Based on the interview conversations and analysis and the online survey results, the hypothesis put forth above seems to hold. Results from the interviews are generally confirmed by the survey results. Although in some cases percentages differ, concerns were consistent overall both in the interviews in and the survey responses.

The purpose of this research is to provide guidance to those who may wish to construct a commons-type repository in which anyone could make their data available for sharing with others, although the results could be of use in institutional repository and other settings as well.

This research, subject to the caveats listed below, suggests that it would be desirable from the perspective of potential contributors of data to provide infrastructure capability that would:

  • allow users to attach conditions to the use of their data;
  • provide basic information that could be translated into standards based metadata; and,
  • receive comments and feedback from users.

Assuring potential contributors that such a repository would have staying power and that their data would be available over time would also be an important consideration for potential data donors.

 

++++++++++

Limitations

This research has several limitations. It does not purport to be a statistically valid sample of potential contributors. That universe is simply not known nor probably knowable. Respondents to the online survey were self-selected. While interviewees all had spatially-related data that was generated locally and not available online at the time of the interviews, no such claim can be made for the survey respondents, although respondents were invited to participate only if they might be willing to make their data available without up-front financial remuneration, and only if they owned or controlled spatially-referenced data.

These limitations prevent any assertion that the hypothesis is “proven” but they do not, we feel, limit the usefulness of the research results for their intended purpose: to provide guidance to those who may in the future choose to construct an online commons for spatially-referenced data that anyone, non-professional and professional alike, can contribute to with no special expertise. Such a commons could help to make visible much currently invisible data for the benefit of all. End of article

 

About the author

James Campbell is a Ph.D. candidate in the Spatial Informatics Program, School of Computer and Information Science, at the University of Maine. His interests focus on policy and legal aspects of access to spatially referenced data and information.
E-mail: campbell [at] spatial [dot] maine [dot] edu

 

Notes

1. See, for example, the Atlas of Canada (http://atlas.gc.ca/site/index.html), and Geoscience Australia (http://www.ga.gov.au). In the U.S., initiatives such as the National Map (http://nationalmap.gov), the National Atlas (http://www.nationalatlas.gov), and Geo.Data.Gov (http://geo.data.gov/geoportal/catalog/main/home.page) serve similar functions. They generally contain a wider array of data since in the U.S., the federal government cannot hold copyright on materials it generates. Similarly, there are non-governmental disciplinary and special purpose repositories that exist to capture large-scale spatially referenced data, e.g., PANGAEA (http://www.pangaea.de) and OneGeology (http://www.onegeology.org/). An example of a global interface for accessing earth observation data sets and services is the Global Earth Observation System of Systems (GEOSS) (http://www.earthobservations.org/geoss.shtml).

2. Under U.S. copyright law, facts themselves cannot be copyrighted but original arrangements of facts can be. For the purposes of this research, we assumed that data sets owned or controlled by interviewees and questionnaire respondents included sufficient original arrangement to qualify for copyright protection although this is undoubtedly not true in all cases. A simple list of dates and temperature readings at a particular location on those dates, for example, would probably not qualify for copyright protection. Rather than muddy the water by trying to make determinations of copyright status of particular data sets, we assume all potential contributions would qualify for copyright protection.

3. Ten interviewees in a qualitative study is a large enough number in qualitative studies to get a good sense of qualitative attitudes of, in this case, potential contributors to an online commons environment. The same is true in other interactive intensive studies such as software usability studies (Hwang and Salvendy, 2010).

4. To get a sense of one possible approach, see Campbell, et al., 2006.

 

References

Melinda J. Anderson and Robin N. Shaw, 1999. “A comparative evaluation of qualitative data analytic techniques in identifying volunteer motivation in tourism,” Tourism Management, volume 20, number 1, pp. 99–106.
doi: http://dx.doi.org/10.1016/S0261-5177(98)00095-8, accessed 29 January 2015.

Amy Friedlander and Prudence Adler. 2006. To stand the test of time: Long-term stewardship of digital data sets in science and engineering. Washington, D.C.: Association of Research Libraries, at http://www.arl.org/publications-resources/1075-to-stand-the-test-of-time-long-term-stewardship-of-digital-data-sets-in-science-and-engineering, accessed 29 January 2015.

Axel Bruns, 2008. “The future Is user-led: The path towards widespread produsage,” Fibreculture Journal, issue 11, at http://eleven.fibreculturejournal.org/fcj-066-the-future-is-user-led-the-path-towards-widespread-produsage/, accessed 29 January 2015.

James Campbell, Marilyn Lutz, David McCurry, Harlan Onsrud, and Kenton Williams, 2006. “Enabling non-specialist contributors to generate standards-based geographic metadata in a commons of geographic data,” abstract from the Proceedings of the Fourth International Conference, GIScience 2006 (Münster, Germany, 20–23 September).

Seth Cooper, Firas Khatib, Adrien Treuille, Janos Barbero, Jeehyung Lee, Michael Beenen, Andrew Leaver-Fay, David Baker, Zoran Popović, and Foldit players, 2010. “Predicting protein structures with a multiplayer online game,” Nature, volume 466, number 7307 (5 August), pp. 756–760.
doi: http://dx.doi.org/10.1038/nature09304, accessed 29 January 2015.

Kristin R. Eschenfelder and Michelle Caswell, 2010. “Digital cultural collections in an age of reuse and remixes,” First Monday, volume 15, number 11, at http://firstmonday.org/article/view/3060/2640, accessed 29 January 2015.
doi: http://dx.doi.org/10.5210/fm.v15i11.3060, accessed 29 January 2015.

Michael F. Goodchild, 2007. “Citizens As sensors: The world of volunteered geography,” GeoJournal, volume 69, number 4, pp. 211–221.
doi: http://dx.doi.org/10.1007/s10708-007-9111-y, accessed 29 January 2015.

Institute of Museum and Library Services (U.S.), 2003. Report of the workshop on opportunities for research on the creation, management, preservation and use of digital content. Washington, D.C.: Institute of Museum and Library Services, at http://www.imls.gov/assets/1/AssetManager/digitalopp.pdf, accessed 29 January 2015.

Pamela J. McKenzie, Jacquelyn Burkell, Lola Wong, Caroline Whippey, Samuel E. Trosow, and Michael McNally. 2012. “User–generated online content 1: Overview, current state and context,” First Monday, volume 17, number 6, at http://firstmonday.org/article/view/3912/3266, accessed 29 January 2015.
doi: http://dx.doi.org/10.5210/fm.v17i6.3912, accessed 29 January 2015.

National Research Council (U.S.), Committee on Licensing Geographic Data and Services, 2004. Licensing geographic data and services. Washington, D.C.: National Academies Press, at http://www.nap.edu/openbook.php?isbn=0309092671, accessed 29 January 2015.

Obed Nov, 2007. “What motivates Wikipedians?” Communications of the ACM, volume 50, number 11, pp. 60–64.
doi: http://dx.doi.org/10.1145/1297797.1297798, accessed 29 January 2015.

Anthony J. Onwuegbuzie and Nancy L. Leech, 2004. “Enhancing the interpretation of ‘significant’ findings: The role of mixed methods research,” Qualitative Report, volume 9, number 4, pp. 770–792, and at http://www.nova.edu/ssss/QR/QR9-4/onwuegbuzie.pdf, accessed 29 January 2015.

Harlan Onsrud and James Campbell, 2007. “Big opportunities in access to ‘small science’ data,” Data Science Journal, volume 6, at https://www.jstage.jst.go.jp/article/dsj/6/0/6_0_OD58/_article, accessed 29 January 2015.

Charles C. Ragin, Joane Nagel, and Patricia White, 2004. Workshop on scientific foundations of qualitative research. Washington, D.C.: National Science Foundation, and at http://www.nsf.gov/pubs/2004/nsf04219/start.htm, accessed 29 January 2015.

Harold A. Riemer, Kim D. Dorsch, Larena Hoeber, David M. Paskevich, and Packianathan Chelladurai, 2004. Motivations for volunteering with youth-oriented programs. Toronto: Canadian Centre for Philanthropy, at http://sectorsource.ca/resource/file/motivations-volunteering-youth-oriented-programs-report, accessed 29 January 2015.

 


Editorial history

Received 21 May 2013; accepted 25 January 2015.


Creative Commons License
This paper is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Potential contributor perspectives on desirable characteristics of an online data environment for spatially referenced data
by James Campbell.
First Monday, Volume 20, Number 2 - 2 February 2015
http://www.firstmonday.dk/ojs/index.php/fm/article/view/4722/4206
doi: http://dx.doi.org/10.5210/fm.v20i2.4722





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2017. ISSN 1396-0466.