Private Information, Uses and Misuses
In a modern information-driven society, the everyday life of individuals and companies is full of cases where various kinds of private information is an important resource. While a cryptographer might think of PIN-codes and keys in this context, this type of secrets is not our main concern here. Rather, we will talk about information that is closer to the “primary business” of an individual or a company. For a private person, this may be data concerning his economic situation, such as income, loans, tax data, or information about health, such as deceases, medicine usage etc. For a company it might be the customer database, or information on how the business is running, such as turnover, profit, salaries, etc.
What is a viable strategy for handling private information? Finding a good answer to this question has become more complicated in recent years. When computers were in their infancy, in the 1950’s and 1960’s, electronic information security was to a large extent a military business. A military organisation is quite special in that confidential information needs to be communicated almost exclusively between its own members, and the primary security goal is to protect this communication from being leaked to external enemies. While it may not be trivial to reach this goal, at least the overall purpose is quite simple to phrase and understand.
In modern society, things get much more complicated: using electronic media, we need to interact and do business with a large number of parties, some of whom we never met, and where many of them may have interests that conflict with ours. So how do you handle your confidential data if you cannot be sure that the parties you interact with are trustworthy?
Of course, one could save the sensitive data in a very secure location and never access it, but this is of course unreasonable. Our private data usually has value only because we want to use them for something. In other words, we have to have ways of controlling leakage of confidential data while this data is being stored, communicated or computed on, even in cases where the owner of the data does not trust the parties they communicate with.
A very interesting aspect that makes this problem even more important is that there are many scenarios where a large amount of added value can be obtained by combining confidential information from several sources, and from this compute some result that all parties are interested in. To illustrate what is meant by this, we look at a number of different example scenarios in the following subsections.
Auctions exist in many variants and are used for all kinds of purposes, but we concentrate here on the simple variant where some item is for sale, and where the highest bid wins. We assume the auction is conducted in the usual way, where the price starts at some preset amount and people place increasing bids until no one wants to bid more than the holder of the currently highest bid. When you enter such an auction, you usually have some (more or less precisely defined) idea of the maximal amount you are willing to pay, and therefore when you will stop bidding. On the other hand, every bidder of course wants to pay as small a price as possible for the item.
Indeed, the winner of the auction may hope to pay less than his maximal amount. This will happen if all other bidders stop participating long before the current bid reaches this maximum. In fact, the bidder with the highest bid will generally end up paying a price close to the second highest bid: when the price becomes higher than the second highest bed, there is no one left to compete with the bidder with the highest bid.
For such an auction to work in a fair way, it is obvious that the maximum amount you are willing to pay should be kept private. For instance, if the auctioneer knows your maximum and is working with another bidder, they can force the price to be always just below your maximum and so force you to pay more than if the auction had been honest. Note that the auctioneer has an incentive to do this to increase his own income, which is often a percentage of the price the item is sold for.
On the other hand, the result of the auction could in principle be computed, if one was given as input the true maximum value each bidder assigns to the item on sale.
A closely related example is dark pooling which is just a private forum for trading financial instruments. In these dark pools large trades by financial institutions that are done off public exchanges to for example avoid that trades which are very large compared to the depth of the market a big impact on the market price. Another incentive of off-exchange trading is confidentiality: the fact that a particular player is selling or buying bulk in a given stock could affect the market in an undesired way. In dark pooling confidentiality of the trading the raison d’être.
A procurement system is a sort of inverted auction, where some party (typically a public institution) asks companies to bid for a contract, i.e., to make an offer on the price for doing a certain job. In such a case, the lowest bid usually wins. But on the other hand, bidders are typically interested in getting as large a price as possible.
It is obvious that bids are private information: a participating company is clearly not interested in the competitors learning its bid before they have to make theirs. This would allow them to beat your bid by always offering a price that is slightly lower that yours. This is also against the interests of the institution offering the contract because it will tend to make the winning bid larger. In addition, the price at which you are able to perform a given job leaks information about your organisation and you might not want your competitors to learn this information.
On the other hand, the result of the process, namely who wins the
contract, can in principle be computed given all the true values of
Assume you run a company. You will naturally be interested in how well you are doing compared to other companies in the same line of business as yours. The comparison may be concerned with a number of different parameters, such as profit relative to size, average salaries, productivity, etc. Other companies will most likely have similar interests in such a comparison, which is known as a benchmark analysis. Such an analysis takes input from all participating companies. Based on this it tries to compute information on how well a company in the given line of business should be able to perform, and finally each company is told how its performance compares to this “ideal”.
It is clear that each company will insist that its own data is private and must not be leaked to the competitors. On the other hand the desired results can be computed from the private data: there are several known methods from information economics for doing such an analysis efficiently.
In most countries, public institutions such as the tax authorities or the health care system keep databases containing information on citizens. In many cases there are advantages one can get from coordinated access to several such databases. Researchers may be able to get statistics they could not get otherwise, or institutions might get an administrative advantage from being able to quickly gather the information they need on a certain individual.
On the other hand, there is clearly a privacy concern here: access to many different databases by a single party opens the possibility to compile complete dossiers on particular citizens, which would be a violation of privacy. In fact, accessing data on the same person in several distinct databases is forbidden by law in several countries precisely because of this concern.
Do We Have to Trust Somebody?
We are now in a position to extract some common features of all the scenarios we looked at. One way to describe them all is as follows: we have a number of parties that each possess some private data. We want to do some computation that needs all the private data as input. The parties are interested in learning the result, or at least some part of it, but still want to keep their private data as confidential as possible.
Hopefully, it is clear from the previous section that if we can find a satisfactory solution to this problem, there are a very large number of applications that would benefit. Moreover, this leads to an extremely intriguing theoretical question, as we now explain:
One possible — and trivial — solution would be to find some party that everyone is willing to trust. Now all parties privately give their input to , she does the required computation, announces the result to the parties, and forgets about the private data she has seen. A moment’s thought will show that this is hardly a satisfactory solution: first, we have created a single point of attack from where all the private data can potentially be stolen, either by breaking into the system or by an insider. Second, the parties must all completely trust , both with respect to privacy and correctness of the results. Now, the reason why there are privacy concerns is that the parties do not trust each other in the first place, so why should we believe that they can find a new party they all trust?
In some applications one may pay a party for doing the computation; if the amount paid is thought to be larger than what could gain from cheating, parties may be satisfied with the solution. This seems to work in some cases, for instance when a consultancy house is paid a large fee for doing a benchmark analysis — but this is of course a very expensive solution. And there are plentiful examples where insiders have stolen and sold such information: that you paid the consultancy house enough that they do not have an incentive to sell your data does imply that each employee share this incentive.
Thus, we are left with a fundamental question: can the problem be solved without relying on a trusted party?
At first sight, it may seem that this cannot be possible. We want to compute a result that depends on private data from all involved parties. How could one possibly do this unless data from several parties become known to someone, and hence we have to trust that party?
Nevertheless, as we shall see, the problem is by no means impossible to solve, and solutions do exist that are satisfactory, both from a theoretical and a practical point of view.