Balancing Privacy and Confidentiality with Data Access and Utility
The CUSP Data Facility connects users to relevant datasets for urban policy research. We intend to reduce the multiple technical, legal, bureaucratic, capacity, and cost barriers to data access, so that the full research and policy benefits of data products can be realized.
The Data Facility has two primary goals. We aim to ensure that new and existing urban data are made available to and used by current and future members of the research community in a state of the art facility. We also aim to ensure that staff in government agencies and local citizens are engaged by the ability to use the Facility in addressing important urban problems.
We do recognize that much of the data we manage, from streaming sensor data to agency administrative data, is sensitive and we handle it accordingly. The CUSP Data Facility’s Safe Data Environment comprises a multi-faceted approach to maintain safe data, through safe people, projects, settings, and outputs:
Safe people
Julia Lane is the Chief Data Officer at CUSP and leads yearly trainings on “Data Privacy & Confidentiality” and “Responsible Data Use”. All Data Facility users attend one of these in-person seminars or review the online videos. Additionally, we encourage all researchers who are working with yellow or red data to review on a year basis the American Statistical Association’s Training Modules on Privacy and Confidentiality. These trainings are all posted on the CUSP Data Hub website.
Safe projects
The Data Facility has also developed a data governance framework to focus on the human component that is critical to maintaining data privacy and confidentiality. CUSP data assets are classified as green, yellow, and red, as a simple but effective method for describing the way in which researchers can access and work with datasets. All CUSP datasets are discoverable on the Data Catalog but researcher access to nonpublic datasets (yellow and red) is subject to internal review and must be associated with a research project proposal. Requests to create databases with public or nonpublic datasets also must be associated with a research project.
Safe settings
CUSP maintains that data in a secure data environment with encrypted data ingress protocols, data storage in a secure data server, group management control at the data file level, and restricted export of any data products, subject to review by in-house experts in statistical disclosure limitation. Researchers working with sensitive data may access that data through remote access or through CUSP-certified machines on the LAN. CUSP works closely with NYU’s Information Technology security and data architecture experts to ensure the system adheres to modern best practices in secure data management.
Safe outputs
The CUSP team firmly believes in the value of researcher collaboration and research reproducibility. Researchers working with green data in the CUSP Data Facility may freely export data products for research, collaboration, and publication. Data products generated from yellow or red data are subject to internal review to limit any potential reidentification of individuals or entities. The Data Facility data export guidelines are published on the Data Hub.