Network Working Group J. Arkko
Internet-Draft Ericsson
Intended status: Informational July 18, 2019
Expires: January 19, 2020
Architectural Considerations on Serving Web Content in a Content
Aggregation Fashion
draft-arkko-arch-web-packaging-00
Abstract
News aggregators and search engines have used various formats to
enable them to republish web resources. These formats have included
Google's AMP, Facebook's Instant Articles, and Apple's News Format,
and new developments such as the Web Packaging proposal. This memo
discusses the architectural considerations in these systems, in view
of what aspects of the content delivery the different parties can or
cannot control.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 19, 2020.
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Arkko Expires January 19, 2020 [Page 1]
Internet-Draft Web Packaging Architeture July 2019
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1. Introduction
News aggregators and search engines have used various formats to
enable them to republish web resources. These formats have included
Google's AMP, Facebook's Instant Articles, and Apple's News Format.
The technical reasons behind these formats and associated mechanisms
include desire to improve performance, e.g., via pre-fetching content
from the distributing site. There are some issues, however,
including giving up much control to the aggregator site. For
instance, it is possible for the aggregator to modify content, the
original publisher's URL may not be visible, and it isn't always
possible for the publisher to control how and when the content is
display or how access to it is controlled.
Web Packaging is one proposal to address some of these issues. The
basic idea is that content should be presented to users as if it were
obtained from the original site, no matter where it was actually
fetched from, and still be assured that the content has not been
modified.
This memo discusses the architectural considerations in these
systems, in view of what aspects of the content delivery the
different parties can or cannot control.
2. Traditional Web Model
In a traditional setting, the publisher holds the infrastructure
needed to serve the content. They are in charge of the integrity and
confidentiality of the information served, as well as all mechanisms
related to access control, analytics, localization, advertisements,
etc.
Optionally, it used to be possible for transparent proxies to cache
content. This allowed re-delivery from another entity, but relied on
the HTTP transactions to be in the clear, and allowed the proxy to
perform modifications on the content. Compression for specific
clients or networks was a common practice in these systems. Given
that HTTPS has become almost universally employed for web connections
today, this type of caching is no longer possible in the general
case.
Arkko Expires January 19, 2020 [Page 2]
Internet-Draft Web Packaging Architeture July 2019
3. Content Delivery Network Web Model
In the Content Delivery Network (CDN) model, some or all of the
infrastructure needed to serve the content is outsource to a separate
service. This is economically desirable in a lot of cases, because a
large CDN provider can amortize the cost of a world-wide distributed
delivery network across its many customers
Similarly, a CDN model allows a publisher to outsource the delivery
function to someone else who may be more focused on that function.
The downside with the CDN model is that typically, one needs to give
the TLS keys needed for representing the publisher's website (e.g.,
"www.example.com") to the CDN. There have been some proposals, e.g.,
LURK to reduce the impact of this. But in any case, a CDN network
will be able to provide any, even modified content to the users, and
will be able to see all content. However, the CDN provider and the
publisher have a business agreement, which obviously discourages one-
side actions by the CDN network.
4. Web Packaging Model
In the Web Packaging model, a resource, a web page, or even an entire
site to be packaged in a manner that it can be stored, shared, and
re-distributed. Some of the motivations for doing this include:
o Ability to fetch it faster if the package is stored in a system
that is faster or closer to the user.
o Allows delivery from a number of different systems, in addition to
a particular publisher or the publisher and a chosen single
content aggregator.
o Allow peer to peer delivery, e.g., to enable lower cost delivery
in developing countries.
o Bypassing censorship.
o Having a third party (e.g., an app store) vouch for another
website (e.g., an application's webpage).
o Archiving.
Variants of the web packaging model include technologies focused on
the priorities of content aggregators that allow the aggregators to
perform all tasks (e.g., content validation, identifying marks, or
even modification) at will, without giving much room for the
publishers to secure or control the content. Accelerated Mobile
Arkko Expires January 19, 2020 [Page 3]
Internet-Draft Web Packaging Architeture July 2019
Pages (AMP) [AMP] falls in this category, for instance. The newest
Web Packaging model being standardized enables, for instance,
publisher to sign content and both the aggregator and the users to
verify that the content actually comes from the publisher
[I-D.yasskin-http-origin-signed-responses]
[I-D.yasskin-wpack-bundled-exchanges].
5. Tussles for Control
The key questions in this space really about power and control. Who
has the power to:
o Have detailed control of the look and feel of the page rendering,
colors, menus and so.
o Have an ability to display their domain at the browser's URL bar.
o Have access to the actual content in unencrypted form.
o Modify content or vouch for the integrity of the content.
o Specify what dynamic actions around access analytics, access
control, localization, or advertisements.
o Have access to the identity of the user and be able to collect
data about him or her.
o Who legally or in practice owns the content data, or data
generated by it (such as data collected from advertisements or
users).
o Specify where caching or content-delivery network for a particular
content is in. Previously user or content provider specified a
cache (forward or CDN), now it is the referrer who has the power
to specify a cache
The different approaches discussed earlier take a different cut at
allocating the control to the different players.
6. Conclusions
In the long-term, who controls particular aspects will drive the
architecture of the web and the evolution of the business ecosystem.
It seems evident that there is a need to provide better controls for
the publisher to control the aggregation that their content is
involved in, both in terms of its detailed look and feel, and
securing and access controlling it in appropriate ways.
Arkko Expires January 19, 2020 [Page 4]
Internet-Draft Web Packaging Architeture July 2019
While some of the newest proposal go a long way towards resolving
some of the concerns, they do not address everything. For instance,
Web Packaging does not provide confidentiality against the content
aggregator. As Barnes and Cooper argue in [BA], there are design
examples that prove confidentiality could be provided through
proxies.
The discussion is also (perhaps naturally) focused on current large-
scale web traffic. There may be other use cases for security that is
not tied to a transport session, however. These use case may matter
a lot as well, e.g., IOT devices that wish to deliver or receive
information but may not be reliably connected but rather depend on
relays delivering information packages forward. Today such
arrangements typically involve relays that can read and modify all
that content.
Also, it is essential that Internet is not designed for
centralization. [RFC1958] says "This allows for uniform and
relatively seamless operations in a competitive, multi-vendor, multi-
provider public network". And [RFC3935] says "We embrace technical
concepts such as decentralized control, edge-user empowerment and
sharing of resources, because those concepts resonate with the core
values of the IETF community."
The lead-up of many of the developments discussed in this paper
relate to peer-to-peer networking and enabling faster access to
content for underserved areas. But a bigger factor may be the tussle
for control between different parties, in the end if a user gets
their page from a search result original publisher does not
significantly impact how much material needs to be downloaded. Peer-
to-peer and Information-Centric networking are potentially very
useful technologies, but it seems that their role in this particular
case is perhaps exaggerated. In particular, in the author's opinion
the representation that issues in the web packaging space are a
tradeoff between high efficiency and keeping power at the publisher
side are false.
7. Acknowledgements
The opinions expressed in this paper are solely the author's
opinions, and subject to change.
The author would like to thank IAB members and the participants of
the 2019 Exploring Synergy between Content Aggregation and the
Publisher Ecosystem (ESCAPE) IAB workshop held in Herndon, Virginia
USA.
Arkko Expires January 19, 2020 [Page 5]
Internet-Draft Web Packaging Architeture July 2019
This paper was particularly influenced by the workshop papers [BE],
[DN], and [RE].
8. Informative References
[AMP] AMP, ., "AMP HTML Specification", AMP Open Source
Project , 2019, .
[BA] Barnes, R. and A. Cooper, "Protecting Content from the
Cache", Position paper in the IAB ESCAPE workshop, July
2019, Herndon, Virginia, USA , 2019, .
[BE] Berjon, R., "ESCAPE: The New York Times Position",
Position paper in the IAB ESCAPE workshop, July 2019,
Herndon, Virginia, USA , 2019, .
[DN] DePuydt, M. and M. Nelson, "Signed Exchanges and The
Importance of Trust in Aggregator/Publisher
relationships", Position paper in the IAB ESCAPE workshop,
July 2019, Herndon, Virginia, USA , 2019,
.
[I-D.yasskin-http-origin-signed-responses]
Yasskin, J., "Signed HTTP Exchanges", draft-yasskin-http-
origin-signed-responses-06 (work in progress), July 2019.
[I-D.yasskin-wpack-bundled-exchanges]
Yasskin, J., "Bundled HTTP Exchanges", draft-yasskin-
wpack-bundled-exchanges-01 (work in progress), July 2019.
[RE] Rescorla, E., "Ecosystem Impacts of Web Content
Syndication", Position paper in the IAB ESCAPE workshop,
July 2019, Herndon, Virginia, USA , 2019,
.
[RFC1958] Carpenter, B., Ed., "Architectural Principles of the
Internet", RFC 1958, DOI 10.17487/RFC1958, June 1996,
.
[RFC3935] Alvestrand, H., "A Mission Statement for the IETF",
BCP 95, RFC 3935, DOI 10.17487/RFC3935, October 2004,
.
Arkko Expires January 19, 2020 [Page 6]
Internet-Draft Web Packaging Architeture July 2019
Author's Address
Jari Arkko
Ericsson
Email: jari.arkko@piuha.net
Arkko Expires January 19, 2020 [Page 7]