• Home
  • About the Project
    • InnoSale
    • Consortium
    • Project timeline
    • Project Gallery
    • Video Gallery
    • Privacy Policy & Imprint
  • Results
    • Deliverables
    • Dissemination
    • Exploitable Results
    • Publications
  • News
  • Contact

BLOG 4 ”Confidentiality of AI models in collaborative projects”: Data accessibility in AI-driven B2B sales

Before granting access to company confidential data to project partners, conditions for data access and usage must be agreed upon from legal, GDPR, and cybersecurity perspectives between the parties. Due to the proliferation of artificial intelligence, there is a new need in legal documents to define confidentiality and ownership of trained AI models and analysis results.

In this blog post, we concentrate on the confidentiality issues of trained AI models and analysis results, which need more elaboration especially in collaborative research projects where there is no customer-supplier relationship.

The current blog (Part IV) will showcase basic principles that were documented on the formal data license agreement between a company and a research partner conducting the analysis in the publicly funded InnoSale research project. Previously in the series we have discussed use cases (Part I), stakeholders (Part II) and data wrangling (Part III). The next blog “Part V: Business Benefits”, will finalize the series. Blog serie can be found at https://www.innosale.eu/. Please also join our webinar 29.5.2024 14:00-15:30 Finnish time (13:00-14:40 CET), registration link.


Figure 1: Konecranes personnel at factory.

Data License Agreement

In the context of InnoSale, similar to typical collaborative research projects, the Project Consortium Agreement (PCA) covered ownership of results and confidentiality issues at a generic level for the entire consortium. It explicitly stated that personal data should not be transferred between parties. This straightforward approach ensured compliance with the requirements of the General Data Protection Regulation (GDPR). The PCA established the framework within which some partners could share data with other project participants.

However, for specific types of data and analysis needs—such as training AI models with business-critical data—a more detailed Data License Agreement was necessary. The data license agreement is a pivotal document that outlines the conditions and responsibilities of both the data owner and the analysis partner.


Data license agreement is a single most important document that describes the conditions and responsibilities of both the data owner and the analysis partner.


In the case of InnoSale, an existing data license agreement template from the company providing data for research purposes underwent modifications to align with the unique nature of the joint action. For instance, the terms had to be carefully chosen to avoid any discrepancies between the PCA and the Data License Agreement. Notably, both agreements refrained from using the term “Confidential information.” Instead, the license agreement employed the term “Sensitive Information” to emphasize the confidentiality of specific data sets in bilateral collaboration.

Confidentiality of the Trained AI Model

One of the major issues was that the original Data License Agreement template was originally made for a customer-supplier context. In such, the customer provides data, and the supplier uses the data to develop solutions for the customer and gets financial compensation for the work. The results are owned by the customer. In the case of a collaborative research project, these principles do not apply.

Additionally, regarding confidentiality, discussions were necessary to delineate the boundary between the original data and the results of the analysis work. In this type of project, research outcomes include graphs, forecasts, trained AI models, and related software, all of which require agreements on ownership and confidentiality.

A simplified model of the process is shown in Figure 1. The company provides a data set (referred to as company background, i.e., information that the company already possesses), which is considered Sensitive Information. Its use is limited to the research partner for the project’s needs. The research partner then uses this data, often after preprocessing steps, to develop and train an AI model. Since preprocessed data is used in AI model training, the trained model is considered to contain manipulated data, which also falls under the category of Sensitive Information. The trained AI model is also considered Sensitive Information. It also may contain patterns from the original data. In principle, this could potentially allow reverse engineering back to the original company data. Subsequently, the research partner employs the trained AI model to generate analysis results, which, in the context of InnoSale, typically involve product configurations. These product configurations contain Sensitive Information, as they include details derived from the original data, such as product, customer, and sales information.

Regarding ownership after the analysis, the company retains ownership of the original and manipulated data. On the other hand, the research party owns the trained AI model, analysis results, and software (such as scripts and tools) developed during the analysis process. These results created by the research partner during the project are referred to as the Research Foreground.


Figure 2: Confidentiality and ownership of AI model and results

Scientific publications can be made with some limitations. For example, by agreement with the company on a case-by-case basis, and ensuring that the results are generalized and anonymized in such a way that there is no possibility to recognize the company, its data, or any business information from the data. This approach allows for publications that do not conflict with confidentiality.

Konecranes’ perspective

For more than 100 years, safety and reliability have been the cornerstones of the Konecranes product and service offering. As connectivity increases and technology evolves, information security is essential for all aspects of how we do business – from manufacturing and servicing equipment to our digital ecosystem including our customer sites, global enterprise platforms and productivity-enhancing apps.

Konecranes’ strong commitment to the highest level of security and confidentiality creates demands not only for our internal operations but also for all our partners, from suppliers to research and collaboration partners. That is why we expect our partners to uphold and follow the relevant standards, regulations, and industry procedures and practices in their operations and when using Konecranes' data.

Before any data was transmitted, there was a high number of stakeholders with whom one needed to discuss and agree with. In InnoSale, this was accomplished by contacting each responsible data owner and presenting the data under discussion. Based on the discussion, a formal data sharing process was created with a condition that all relevant company stakeholders would also need to review and approve before the data could be shared.

Cybersecurity is a matter of utmost importance, so the information security team provided requirements for cyber and physical environments based on the ISO27001 standard to ensure the secure handling of the data. Additionally, data sets were reviewed by the legal department to clarify risks that could occur when data is handled outside of the company. The data was either anonymized or pseudonymized completely to ensure the privacy of the actors present in the data. The process of hiding private information was conducted in such a way that any external data handler, whether human or AI, cannot combine the actual data source with the data that was handled in this research. Finally, permission to use the data can be granted after carefully considering possible business risks and potential benefits.

Lessons Learned

There were also other lessons learned that we’ll briefly discuss here:

  • First-Time Data Sharing: When sharing data with a third party outside the company for the first time, there is often no established process or prior experience. It’s crucial to designate a responsible person to coordinate the process between both the company and the research party. Additionally, documenting the process ensures smooth operations when critical data will be shared in the future.
  • Cybersecurity and Confidentiality Challenges: While trusted partners may have already agreed on the main project guidelines through the project consortium agreement, cybersecurity and confidentiality become challenging when sharing the company’s business-critical data. A separate agreement detailing data handling methods becomes necessary.
  • Method for Anonymizing and Pseudonymizing Data: Choosing an appropriate method for anonymizing and pseudonymizing data presents a practical challenge. The selected method must be secure while preserving important data characteristics. Furthermore, once AI analysis results are available, there must be a way to interpret them even after anonymization and pseudonymization.
  • Legal Expertise: Confidentiality and ownership issues related to AI models are often complex. Unfortunately, there is a limited number of legal departments or personnel with experience in this area. It is beneficial to seek external legal advice from professionals familiar with these topics.

Authors

Marko Jurvansuu (VTT), Ari Bertula, Juhani Kerovuori, Emmi Vähäsarja and Juuso Sokura (Konecranes).



Frank Werner / Intl. Project Lead
frank.werner@softwareag.com

You can get more information about the partners and project contact details at:
InnoSale ITEA4 page .

This project is funded by the Public Authorities below:


© 2024 InnoSale - All Rights Reserved
This website makes use of cookies to enhance your experience. By continuing to use the site, we will assume you agree with this. ACCEPT
Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT