Special Articles on Private Cross-aggregation Technology—Solving Social Problems through Cross-company Statistical Data Usage—
Utilizing Cross-company Statistics through Use of a Private Cross-aggregation Technology—Initiatives to Improve Customer Experience and Solve Social Problems—
Security Privacy Protection Population Estimation
Kasumi Onoda, Keita Hasegawa, Tomohiro Nakagawa and Keiichi Ochiai
X-Tech Development Department
Abstract
Private cross-aggregation technology enables the use of cross-company statistics while preserving the privacy of individuals. It promises to enable creation of broader analytical value than was previously possible. This article describes how use of cross-company statistics created using the private cross-aggregation technology can be useful, through initiatives conducted with partner companies, Japan Airlines and JAL CARD to increase customer experience value and solve societal problems.
01. Introduction
-
Currently, utilization of data in a wide range of fields ...
Open
Currently, utilization of data in a wide range of fields is very active and accordingly, study of how to create value from such data is accelerating. In most cases so far, data owned by a single company has been utilized, but use of data crossing the boundaries between companies is promising in the future. In such circumstances, NTT DOCOMO has developed a private cross-aggregation technology[1] that enables the use of cross-company statistics while adhering to relevant laws and protecting the privacy of individuals. With this private cross-aggregation technology, data held by multiple companies (such as NTT DOCOMO data and the data of a partner company) is processed at each company so that individuals cannot be re-identified (so that no information is linked to an individual). The technology can then create statistics from the data while technologically guaranteeing that no data is disclosed between them, that is, while automatically processing the data so that it is never subject to human viewing throughout the process.
From a data utilization perspective, it is not enough to show just that the privacy of individuals is preserved in the statistics when using the private cross-aggregation technology, it is necessary to show that it can be used to solve real problems. As such, NTT DOCOMO, together with partner companies Japan Airlines Co., Ltd. (JAL) and JAL CARD, INC. (JAL CARD), conducted a demonstration experiment (hereinafter referred to as “the trial”) to improve customer experience and solve social problems by facilitating smooth boarding[2] and verified the usefulness of the private cross-aggregation technology. This article describes the steps in analysis of cross-company statistics for the trial along with the results and shows the utility of the statistics obtained with the private cross-aggregation technology.
-
In the trial, we used the private cross-aggregation ...
Open
In the trial, we used the private cross-aggregation technology to statistically analyze data related to the movement of all passengers using the airport, from near their residences until they board the aircraft, in order to streamline their movements. To achieve this, in addition to passenger movement and procedures inside the airport, it is important to analyze their movements as they travel to the airport. NTT DOCOMO possesses operational data from its mobile terminal network*1, Note (hereinafter referred to as “NTT DOCOMO data”), which includes mobile terminal location data. As such, the private cross-aggregation technology can be used to create statistics regarding the movement of passengers as they travel to the airport, by first de-identifying at each company (processing so there is no information linked to an individual) both the boarding information from JAL's domestic flight booking data (hereinafter referred to as “JAL data”) and the NTT DOCOMO data, and then automatically processing it so that it is guaranteed to never be subject to human viewing or disclosed to the other company. We expect to be able to select measures to streamline passenger movement based on population distributions obtained from the statistics. As a result, we hope to create social value, which includes improving customer experience by increasing the frequency that flights depart at their scheduled time.
In fact, a crucial part of the trial was planning and designing the data analysis steps. This is because in analysis using the private cross-aggregation technology, it is difficult to perform conventional explorative data analysis, in which the same data set is analyzed repeatedly under similar but varying conditions. There are two reasons for this characteristic:
- The private cross-aggregation technology encrypts using a de-identification process (to a state in which individuals cannot be identified) and then with the data in this encrypted state, performs the secure aggregation process and the disclosure limitation process. The secure aggregation and disclosure limitation processes are performed with the data in this protected state[3], so the computational cost is high.
- Guaranteeing privacy protection when performing aggregation multiple times is an issue. With the private cross-aggregation technology, noise is added to aggregation results to achieve differential privacy*2. The level of protection is determined by setting certain privacy parameters, but when the private cross-aggregation technology is used to create statistics multiple times from the same data set, a protection level that covers the total aggregation over all iterations must be maintained. This limits the number of iterations possible.
- Operational data from the mobile terminal network: Generic name of data in the process of providing telecommunications services.
- Differential privacy: An index that quantitatively measures the strength of privacy protection. It was created with the aim of guaranteeing privacy protection, even against attackers having specific background knowledge and attacking ability. A protection technique using differential privacy has been adopted in the U.S. national census.
Note: A generic name given to data generated in the process of providing telecommunications services, which is used in Mobile Spatial Statistics. Operational data includes location data of mobile phones and other devices used by subscribers and subscriber attribute data. Definitions of these terms can be found in Mobile Spatial Statistics Guidelines at the following link.
https://www.docomo.ne.jp/english/binary/pdf/service/world/inroaming/inroaming_service/Mobile_Spatial_Statistics_Guidelines.pdf (PDF format:73KB) -
The data analysis steps for the trial are shown in ...
Open
The data analysis steps for the trial are shown in Figure 1. There are three steps: (1) consider hypotheses, (2) create statistics, and (3) verify hypotheses. Each step is described below.
3.1 Consider Hypotheses
We first considered hypotheses to be verified. Since, as described above, exploratory analysis is difficult when using the private cross-aggregation technology, it was important to clarify what hypotheses would be verified beforehand, plan what statistics would be created using the private cross-aggregation technology, and design a method to verify the hypotheses using the statistics. As such, a few dozen items which seemed likely to contribute to the objective of the trial, streamlining passenger movement, were selected based on results of analyzing JAL data and business knowledge from JAL and JAL CARD. We then considered effects of measures that were desirable and might be verifiable using NTT DOCOMO data and selected several hypotheses to be verified from among hypotheses related to the selected items.
One example of a hypothesis to be verified was, “For passengers whose boarding gate is far from the security screening area, it takes longer time than usual to move through the airport. Is the amount of information and support for them inside or outside the airport inadequate?” We will use this hypothesis to describe the analysis steps below. We first use the private cross-aggregation technology to create statistics regarding passenger movement and identify tendencies among all passengers regarding when they are in each of the defined areas. The idea was that this would reveal whether most passengers spent more time travelling inside or outside of the airport, which would help determine whether support provided inside or outside the airport would be more effective in streamlining passenger movement.
3.2 Create Statistics
We then created statistics regarding passenger movement using the private cross-aggregation technology. The statistics were output in the form of a cross-tabulation. The cross-tabulation for the trial is shown in Figure 2, with analytical items selected by JAL on the left (red in the figure) and analytical items selected by NTT DOCOMO across the top (blue in the figure). The cells at the intersection of each row and column give statistical values for the number of passengers corresponding to items for analysis selected by each company. Note that the statistical values in each cell have noise added to achieve differential privacy.
JAL selected several items for analysis that seemed likely to be useful for verifying hypotheses. For example, to verify the hypothesis described above, items such as the distance from the security screening area to the boarding gate and the passengers' boarding times were selected. Here, boarding time was used as an index indicating whether passengers could move smoothly.
On the other hand, NTT DOCOMO selected passenger movement state for analysis item, separated into three segments: near their residence, near the airport, or other areas, including on their way to the airport. This was done assuming that outbound passengers will generally depart from their home, travel through areas other than their home or the airport, and then arrive at the airport. To analyze movement state transitions, data from several points in time earlier than the flight departure time were used to create a cross-tabulation for each point in time.
3.3 Verify Hypotheses
The hypotheses were verified using the cross-tabulation created using the private cross-aggregation technology. To do so, it was necessary to show a relationship between boarding time, as an index of streamlined passenger movement, and the analysis items for each of the hypotheses. For example, to verify the hypothesis described above, we determined whether there is a significant relationship between boarding time and the distance from the security screening area to the boarding gate or passenger movement state based on NTT DOCOMO data.
To determine whether there were relationships among multiple items, we assumed that items were independent of each other, and calculated the test statistic for each item to indicate they were not independent (representing rejection of the hypothesis), which is a general test of independence[4]. However, with the private cross-aggregation technology the cross-tabulation has noise added in the disclosure limitation process, to achieve differential privacy. As such, it is difficult to apply general test methods as is, so we used a method that accounts for the noise added to achieve differential privacy[5], and determined the relationship between analysis items for each hypothesis and boarding time, for each point in time before aircraft boarding.
3.4 Results of Verifying Hypotheses
As a result of verifying hypotheses as described above, we were able to verify several of the hypotheses considered for the trial and select measures that were likely to be effective in streamlining passenger movement. As an example of a hypothesis verification result, to verify the hypothesis described above, we determined the relationship between the boarding time and the distance from the security screening area to the boarding gate based on a cross-tabulation of passenger movement state before boarding the aircraft. The result showed a significant relationship between them, suggesting that to streamline passenger movement, it may be effective to provide more support inside the airport rather than outside, particularly to passengers whose boarding gate is far from the security screening area in the airport.
-
This article described an English speaking grading technology ...
Open
This article has described the usefulness of cross-company statistics created using the private cross-aggregation technology. We described how the private cross-aggregation technology was used to statistically analyze data in a trial involving collaboration among three companies. Using cross-company statistics, we showed that results of analysis from new perspectives not previously obtainable contributed to selecting measures to facilitate smooth boarding. Thus, we showed that the private cross-aggregation technology can be used to create value not available using data from a single company.
In the future, we will work to use results obtained from the trial in measures taken by JAL and JAL CARD, and to show more clearly and concretely the value brought by the private cross-aggregation technology through quantitative evaluation of the results of these measures. NTT DOCOMO will continue work and collaborate with partner companies to find solutions that overcome social problems by promoting use of statistics that crosses boundaries between companies and other organizations.
In conclusion, we would like to express our gratitude to Japan Airlines Co., Ltd. and JAL CARD, INC. for their collaboration in conducting this trial and writing this article.
-
REFERENCES
Open
- [1] K. Nozawa et al.: “Solving Social Problems through Cross-company Statistical Data Usage—Overview of Private Cross-aggregation Technology—,” NTT DOCOMO Technical Journal, Vol. 25, No. 1, Jul. 2023.
https://www.docomo.ne.jp/english/corporate/technology/rd/technical_journal/bn/vol25_1/002.html - [2] Japan Airlines, JAL CARD, NTT DOCOMO: “Japan Airlines, JAL CARD, and NTT DOCOMO Begin Demonstration Experiment of Cross-company Data Usage Using “Private Cross-aggregation Technology” to Improve the Customer Experience and Solve a Social Problem—First Initiative in Japan Using Statistical Data Created Without Mutually Disclosing the Data Held by Each Company,” Oct. 2022 (in Japanese).
https://www.docomo.ne.jp/binary/pdf/corporate/technology/rd/topics/2022/topics_221020_00.pdf (PDF format:1,834KB) - [3] K. Nozawa et al.: “Technique for Achieving Privacy and Security in Cross-company Statistical Data Usage,” NTT DOCOMO Technical Journal, Vol. 25, No. 1, Jul. 2023.
- [4] N. Kunitomo: “Mathematical Statistics for Applications,” 1st Ed., p. 222, 2015, Asakura Publishing (in Japanese).
- [5] M. Gaboardi, H. Lim, R. Rogers and S. P. Vadhan: “Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing,” International Conference on Machine Learning, PMLR, May 2016.
- [1] K. Nozawa et al.: “Solving Social Problems through Cross-company Statistical Data Usage—Overview of Private Cross-aggregation Technology—,” NTT DOCOMO Technical Journal, Vol. 25, No. 1, Jul. 2023.