数据工厂

我一般都对陈词滥调感到恼火,“如果你不付钱就是产品”;Derek Powazek解释说为什么这种说法的含义通常是误导性的,往往是错误的,这在某种情况下尤其成问题聚合器After all, if a company’s market power flows from controlling demand — that is, users — that means said company is incentivized to keep those users satisfied; it is suppliers that have to “take it or leave it”.

This explains why the idea of an Aggregator being a monopoly is hard to get one’s head around; in the physical world where market power comes from controlling distribution — think AT&T, or your local cable company, or a utility company — there is no incentive to treat end users well, because users have no choice in the matterOn the Internet, though, where distribution is effectively free, alternatives are only a click away; Aggregators are extremely motivated to make sure that click doesn’t happen, which means giving the users what they want (the technical term is “increasing engagement”)用户是优先事项,而不是产品。

然而,就像往常一样,陈词滥调仍然存在,因为他们有一些道理Facebook和谷歌 - 两者超级聚合器- 通过广告赚钱,广告客户来到Facebook和Google,因为他们希望吸引消费者从广告客户的角度来看,用户 - 或更准确地说,获得用户的关注 - 是他们绝对支付的产品。

Facebook上的观点

这种看似二分法 - 一方面优先考虑用户,另一方面销售他们的注意力 - 如果您首先将Super Aggregators视为两个截然不同的业务:聚合器和广告卖家,则更有意义To use Facebook as an example (as I will for the rest of the article, although nearly everything applies to Google as well), it is both an Aggregator that content providers clamor to reach, as well as the gatekeeper for consumers advertisers wish to sell to:

关于Facebook业务的两种观点

Still, this isn’t quite right, because Facebook the company is not simply the so-called “Blue App” but also several other businesses, most notably Instagram and WhatsApp (there is also Messenger, but given its user-facing network is the same as the Blue App I don’t really consider it to be distinct)一旦你将这些添加到混合Facebook,公司看起来像这样:

Facebook的集团

You’ll note that I’ve taken to using the term “Blue App” to distinguish Facebook the network from Facebook the company; the question, though, is what exactly is the company anyways?

数据工厂

At a superficial level, Facebook is a sort of holding company for social networks; back in 2014 I called it社会集团That, though, is very much a user-centric perspective; to that end, if you consider the advertising perspective, you could argue that Facebook the company is an advertising dashboard and sales force.

不过,我认为,卖掉Facebook公司的功能具体而言,Facebook是一家数据工厂。维基百科因此定义了一个工厂:

工厂或制造厂是工业场所,通常由建筑物和机器组成,或者更常见地是具有多个建筑物的复合体,其中工人制造商品或操作将一种产品加工成另一种产品的机器。

Facebook quite clearly isn’t an industrial site (although it operates multiple data centers with lots of buildings and machinery), but it most certainly processes data from its raw form to something uniquely valuable both to Facebook’s products (and by extension its users and content suppliers) and also advertisers (and again, all of this analysis applies to Google as well):

  • 借助Facebook的数据,用户能够更好地与他人联系,查找他们感兴趣的内容,组建团队和管理活动等。
  • 内容提供商能够获得比他们自己更多的读者,其中大多数人甚至不会意识到这些内容提供商的存在,更不用说他们自己的意愿。
  • 广告商只需向他们认为倾向于喜欢其产品的个人展示广告,就能够最大限度地提高广告收入的回报率,从而使目标市场比以往任何时候都更加可行(同样也为了客户的利益)。

然后,为了换取这些来自数据的好处,Facebook吸收了来自所有三个实体的数据:

  • 用户可以通过他们上传的信息和媒体直接向Facebook提供数据,也可以通过他们在Facebook属性上的操作直接为Facebook提供数据
  • 内容本身不仅仅是数据,也是生成用户动作数据的催化剂。
  • 与内容提供商一样,广告商不仅自己提供数据,还可以作为生成用户操作数据的催化剂,还可以直接上传大量数据,以便更好地定位潜在客户。

Those aren’t the only avenues through which Facebook collects data: the company has deals with multiple third-party data collection companies, gathering everything from web traffic to offline store receipts, and also has incentivized an untold number of websites — particularly content providers — to include Facebook links on their sites that collect data from those sites.

这样可以更全面地了解Facebook的业务:

Facebook数据工厂

数据来自任何地方,价值 - 也以数据的形式 - 流出,由数据工厂转换。

规范互联网

两周前,在欧盟与互联网, I argued that effective regulation of tech companies, particularly Super Aggregators like Facebook and Google, had to work with the fundamental principles of the Internet, not against them; otherwise, the likely outcome would be to entrench these Internet giants with little gain to consumers.

首先,监管机构需要了解集合商的力量来自控制需求而非供应Specifically, consumers voluntarily use Google and Facebook, and “suppliers” like content providers, advertisers, and users themselves, have no choice but to go where consumers are为此:

Facebook的终极威胁永远不会来自出版商或广告商,而是需求 - 即用户The real danger, though, is not from users also using competing social networks (although Facebook has always been paranoid about exactly that); that is not enough to break the virtuous cycle相反,唯一能够取消Facebook权力的是用户主动拒绝该应用并且,我怀疑,用户会这样做的唯一方式是,如果Facebook成为公认的事实,Facebook对你有积极的影响 - 在线等同于吸烟。

对于Facebook来说,剑桥Analytica丑闻类似于外科医生关于吸烟的报告:威胁不是监管机构会采取行动,而是用户会这样做,而且没有什么可能更致命这是因为聚合理论的监管推论是最终的监管形式是用户生成的。

If regulators, EU or otherwise, truly want to constrain Facebook and Google — or, for that matter, all of the other ad networks and companies that in reality are far more of a threat to user privacy — then the ultimate force is user demand, and the lever is demanding transparency on exactly what these companies are doing.

What, though, does transparency mean in the context of enabling “user generated regulation”, and what might meaningful regulation look like that achieves the goal of forcing said transparency in a way that fosters competition instead of inhibiting it? The answer goes back to data factories.

原始数据与处理数据

内部数据工厂的第一个挑战是无法进入内部都Facebook的谷歌为客户提供查看数据的方法,但不仅是展示压倒性,数据正是您给予他们的这是原始投入。

广告客户,有趣的是,无法下载自定义受众群体once uploaded, but given that data is (also) their business, it is extremely likely that they retain the list of email addresses they uploaded in the first place; the same thing applies to 3rd party data providers与此同时,网站完全处于黑暗中:Facebook徽章或类似按钮可能提供一两页的页面视图,但它不会提供任何数据作为回报。

没有人得到的是最终产品:来自所有这些来源的所有数据的融合,以构建比他们自己提供的每个Facebook用户更详细的个人资料。不过,毫无疑问,它正在发生上个星期Gizmodo有一个很好的写作“隐私增强技术论文集”上的一篇论文详细说明Facebook用户如何成为具有用户从未提供的大量信息的广告的目标,包括固定电话号码,未发布的电子邮件地址和为双因素身份验证提供的电话号码:

他们发现,当用户向Facebook提供双因素身份验证的电话号码或者为了接收有关用户帐户的新登录的提醒时,该电话号码可在几周内由广告客户定位因此,希望其帐户更安全的用户被迫进行隐私权交易,并允许广告客户更轻松地在社交网络上找到他们When asked about this, a Facebook spokesperson said that “we use the information people provide to offer a more personalized experience, including showing more relevant ads.” She said users bothered by this can set up two-factor authentication without using their phone numbers; Facebook stopped making a phone number mandatory for two-factor authentication四个月前

发言人的这句话是对数据工厂的一种认可:Facebook并不关心数据获取的位置,它只是输出服务的输入 - 可定位的配置文件。

对于精确进入成品的缺乏关注并不是Facebook独有的其中一个最着名的例子是耐克:

缝合耐克足球的男孩
根据互联网,这是生活杂志的照片我无法找到副本。

这张照片出自1986年6月的“生活杂志”,其中详细介绍了巴基斯坦儿童每天制作足球的方式。Nike executives, in a refrain that is vaguely familiar, were initially aggrieved; after all, soccer balls were not inflated until after they were shipped, which meant the photo was staged.

这当然是正确的,然而这样的抱怨完全忽略了这一点:耐克并不关心它的足球,鞋子或衣服或其他任何东西它只是向工厂主付了钱并洗了问题那张照片,以及随后几十年的抗议和抵制,迫使公司做得更好。

隐私障碍

不幸的是,虽然耐克无法阻止摄影师前往巴基斯坦(事实上,有人说,拍摄一张照片),普通大众无法看到Facebook或谷歌的工厂内部 - 而这正是监管机构进入的地方。

监管机构可以做的最重要的事情是迫使Facebook和谷歌 - 以及所有数据收集者 - 披露他们的工厂产出让用户能够不仅仅看到他们放入了什么 - 谷歌和Facebook做了什么(以及GDPR要求),以及所有输入混合和匹配后出现的内容。

毫无疑问,任何公司都不会单独这样做,而不仅仅是出于商业原因请注意Facebook发言人在被问及使用上传的联系信息时对Gizmodo的回应:

“人们拥有他们的地址簿,”Facebook发言人通过电子邮件说“我们理解,在某些情况下,这可能意味着另一个人可能无法控制其他人上传的关于他们的联系信息。”

这就解决了隐私法规特别出错的问题:在试图制定保护没有代理人的规则的规则时,那些希望接受该代理商的人甚至无法知道Facebook究竟对他们的了解,因为,隐私Meanwhile, websites throw up pop-ups and overlays that no one reads, or ban entire continents, not because their users care but because a regulator said so.

隐私现实

以下是监管机构需要解决的其他现实问题:大多数用户并不关心隐私,特别是如果它能节省资金我收到了这条推文,回应了蒂姆库克谈论隐私的采访片段,而且相当简洁地说明了这一点:

来自某人的推文会牺牲更便宜的iPhone的隐私

坦率地说,我不会责怪大多数用户的冷漠:Facebook和谷歌以及互联网上所有其他广告支持的服务和网站都提供了非常宝贵的价值Moreover, I’m the first (and often only!) to defend personalized ads: I think they are a critical component of building a future where anyone can build a niche business thanks to the Internet making the entire world an addressable market — if only they can find their customers.

与此同时,大多数用户真的不知道这些公司掌握了哪些数据Might they change their minds if they actually saw the processed data, not simply the raw inputs? I don’t know, but I do think it is their decision to make.

此外,建立明确的要求,用户不仅可以查看他们上传的数据,还可以查看他们整个处理过的数据 - 数据工厂的输出 - 对于那些寻求挑战这些庞然大物的新公司和小公司来说,这样做的负担要小得多。数据出口控制可以从一开始就建立,即使他们可以自由地建造像他们挑战的大公司一样复杂的工厂 - 或者作为潜在的卖点,他们表明他们根本没有工厂这比试图遵守适用于每个用户的规则要容易得多 - 无论他们是否想要保护 - 以及设计时考虑到Facebook和Google,而不是人手不足的初创公司。

Indeed, that is the crux of the matter: regulators need to trust users to take care of their own privacy, and enable them to do so — and, by extension, create the conditions for users to actually know what is going on with their data并且,如果他们决定不关心,那就这样吧市场会说话,这个结果应该是监管机构的首要目标。