Opening confidential data

Moderators: Heikki Pajuoja, Metsäteho Ltd., and Arja Kuula-Luumi, Finnish Social Science Data Archive
Reporter: Olli-Jussi Korpinen, University of Eastern Finland

The workshop included discussion about working with confidential data admissioned to the researchers by companies in private forest sector, and confidentiality issues related to qualitative data, which is typically generated in personal interviews and questionnaires.

Despite the fact that, sometimes, even the forest companies may encourage the researchers to publish data considered as “less sensitive”, the researchers must be aware of the legislative background of working with financial data. For example, the Finnish Competition and Consumer Authority (FCCA) has put strict rules on what can be published and what is considered as private financial information. Therefore, the researchers have to constantly conduct self-evaluation in the management of confidential data in the projects and basic reporting. For Metsäteho Ltd., one basic task is to report annual statistics of forest industries to the Official Statistics of Finland (OSF). The guidelines of Statistics Finland rule that if there are less than three data sources (i.e. forest companies) producing statistical data to the record (i.e. one cell in the table), the record must be screened.

Another RDI-related issue where FCCA surveillance is present is the purchase of ICT services to forest companies. Because the competitive tendering is addressed to the entire economic region (EU), Finnish standards applied for data storage and transfer are not principally allowed in tender invitations, but the standards must be well-known internationally. This affects mostly to the researchers who cooperate with forest companies and participate in development work of the collective ICT environment of the companies and their stakeholders (e.g. machine contractors).

A topical question about confidential data management is “who owns the data that forest machines produce and who can get access to the data?”. This is obviously a consequence of the rapid development in ICT and opportunities to store and process “big data”. In principle, the owner of the machine should also own the data it generates, but the customers of the machine contractors obviously negotiate contracts so that the data is available also to the wood supply officers steering the wood supply to the mills and factories. Forest machine manufacturers also supervise their interests because they want to improve their after-sales services and utilize the machine-generated big data in their own product development. When such information with location details is shared, there is also an indirect connection to the personal data of the contractors, especially if the contracting company is small and employs only few machine drivers.

Management of personal data was highlighted in the latter part of the session, which mainly focused on qualitative research and the confidentiality of personal data. A forestry-related dataset that is available in Aila data service portal (provided by the Finnish Social Science Data Archive) was used as the background for the discussion. This part mostly consisted of advices and methods about how information about individual persons should be anonymized when processing the interview or questionnaire results. For example, the dataset that was taken out from Aila included gender, age, occupation type, education and income details about the interviewees as background variables. Therefore, it is highly important that, before publishing the result dataset, the variables are classified without endangering data privacy. We also anticipated that the line between personal and public details is sometimes very fine. For example, the vehicles’ in biomass supply are usually registered to business enterprises, but their register numbers could be also considered as personal data if they were registered to private individuals.