When a Query is put into the system for a specific email account, XKeystore returns all the "metadata" -- addressees and, importantly, the subject line, which, of course, usually summarizes the basic content of the email -- and also scans the email for additional email addresses inside the email. Like if someone said "Contact these other parties" and then listed some emails. And even here, a slide indicates that sometimes the system makes an error and includes a few words it mistook for an email address.HTTP (ie, the web) and SMTP (ie, email) are functionally very simple protocols. That's part of what has driven their success. In terms of the protocols themselves, though they can transmit binary data (ie, a computer file) the protocols themselves run in plain text, and the basic data structures sent over both HTTP and SMTP are structured plain text. By structured, I mean the contents are easily readable by a human being but follow basic grammar rules strictly enough for a machine to understand them. Most other internet protocols (eg, for chat rooms) follow similar rules and principles.Remember the layered approach to a data packet? At the lowest (interesting) level, the metadata consists of just the IP address of the sender and recipient. All that tells you is who operates the computer at either end (and perhaps what type of communication is taking place, ie, email, web, chat, etc); it's up to the higher layers to make the data mean something, and often the real person sending and receiving communications is not the same person as the one who owns and operates the computer on the other end. For example, looking just at the IP layer for an email would likely tell you the identity (by IP address) of the sender and the identity of that person's email server. You would then have to look at that email server, see who it is communicating with, and make an educated guess which message the server sends out corresponds to the one you are interested in. That message is sent to the recipient's email server, and you have to then wait for the recipient to check their email to find out their IP address.HTTP and SMTP operate several layers above the basic IP address system, and they are the level when the data starts to mean something to human beings. Since we know the NSA's program collects email addresses, we know that they are looking at the SMTP layer at least -- possibly higher up, but definitely to the SMTP layer.The application layer also contains metadata, and this is the metadata the NSA is talking about. It's important to understand this because the application layer metadata contains a lot more information than just who sent a packet and where the packet is going. Let's look at a typical email sent over SMTP:
S: 220 mail.example.com ESMTPC: HELO sender.example.comS: 250 mail.example.comC: MAIL FROM: sender@example.comS: 250 okC: RCPT TO: recipient1@example.comS: 250 okC: RCPT TO: recipient2@example.comS: 250 okC: DATA[ Next layer of email data is sent ]C: .S: 250 okC: QUITS: 221 mail.example.comConnection closed.You can see from the above message that you can identify the (claimed) sender, the intended recipient(s), and maybe (with some extensions) the size of the message. That's not much information, but it's easier than trying to track a single email message via the IP layer. You have everything you need in one package. Also, if the connection is not encrypted but does authenticate the user sending the email, the user's login and password may be available as metadata. Let's look at a sample HTTP transaction:
C: GET http://www.example.com/ HTTP/1.1S: HTTP/1.1 200 OKS: Server: Apache-Coyote/1.1S: Content-Type: text/htmlS: Transfer-Encoding: chunkedS: Vary: Accept-EncodingS: Date: Thu, 01 Aug 2013 22:05:29 GMTS: S: 2000S: S: [ next layer data -- content of the web page ]The above exchange contains the only metadata available at the HTTP layer. The interesting thing here is that we have the URL requested ("http://www.example.com/"). For a public webpage, anyone can plug in that URL and get, probably, the same or similar content as the user was looking at, if the server does not require a login. Remember that the URL identifies not just the server that is hosting the web page being loaded, it identifies the specific website and the specific page on that website being requested. Starting only with this layer of metadata, it is trivially possible (if not error-free) to retrieve the actual content provided to the user. If the server does require a login, the metadata at this layer may actually expose the user's credentials to the NSA (depending on the server configuration).So, with regard to HTTP connections, the fact that the NSA can independently retrieve the full content of the pages being viewed means that only storing "metadata" is an almost meaningless distinction. They can turn that metadata into a reasonable-probability copy of the actual page viewed trivially.But it gets worse. Take a look back at the sample SMTP connection above, and then reread this description of the XKeystore's capabilities:
When a Query is put into the system for a specific email account, XKeystore returns all the "metadata" -- addressees and, importantly, the subject line, which, of course, usually summarizes the basic content of the email -- and also scans the email for additional email addresses inside the email. Like if someone said "Contact these other parties" and then listed some emails. And even here, a slide indicates that sometimes the system makes an error and includes a few words it mistook for an email address.Did you see anything in that SMTP connection which could not be readily identified as either being an email address or being some other part of the protocol? Anything that could possibly represent a user saying "Contact these other parties"? Anything that could be mistaken for an email address without actually breaking the SMTP protocol? Anything at all that is user-entered data besides the sender and recipient email addresses?Clearly we need to go up one more layer. The NSA is looking through the actual content of the email message at this point. Here's a typical email message from a programmer's point of view:
From: sender@example.comTo: recipient1@example.com, recipient2@example.comSubject: This is a message about cakeDate: Thu, 01 Aug 2013 22:05:29 GMTThe only thing at this layer that could be credibly described as "metadata" is the header of the message. As before, the header is structured data. The program you use to read your email needs to read and understand the lines on top, so they are designed to follow a strict format that can be easily parsed. The header part of the email ends at the first blank line, and everything after that is entered freeform by the user -- ie, content, not metadata by any possible argument. (It is possible to add more layers on top of this -- but for a simple email, this is the end of the line).If you are only looking at metadata, you need to look at the headers and ignore everything after the blank line. But everything in the headers is structured. The user enters some of the data (particularly the subject line, which can be revealing of the email's contents), but the email fields themselves need to be machine-readable and are mostly generated by the email application itself, with validation performed on the email addresses entered by the user.In other words, a competent programmer can reliably parse out email addresses from the structured header fields with effectively no chance of getting user-entered content by mistake, unless the user was hand-crafting the email. All they have to do is stop reading the message at the first blank line (as I have marked in the example with a dividing line).In order to get occasional cases where the Xkeystore retrieves "metadata" in the form of email addresses that turns out to be user-entered content instead, the NSA must be retrieving and parsing the content of the email. They may have coded their application to only show what they think are email addresses, but they are extracting those email addresses from the content, not from the headers. Which means they must be collecting and analyzing the content, not just the metadata.It's like a pretty girl who wants to change clothes in your bedroom. Does she trust you not to look or does she find a screen or use a bathroom or closet so that you can't look? Does it matter if you promise not to look?Clearly, the NSA has the ability to intercept email content, not just metadata; just as clearly, they are actually intercepting the full email content and collecting it for analysis. They are asking us to trust them not to look at the content, even though they already have it. Maybe they have built their application so that they can't look without getting permission, but according to Snowden, the permission system is a joke and a rubber stamp. We already know that Homeland Security does keyword scanning of content, and I'm betting the NSA is doing the same thing with its application, and if the right keywords are there -- or the right sender or recipient, two or three degrees away from a "suspected" terrorist -- the content is flagged for a closer look. Or the NSA analyst can make up his own justification and get it rubber stamped.And we can't see how their application works, or have any way of knowing that it does what it says it does. In this analogy, the NSA is the guy wearing a nice Google Glass device, and he tells the pretty girl in his bedroom she can strip down right there in front of him and she will be perfectly safe -- he's written his own privacy app, you see, and when it detects a pretty girl in his field of view it doesn't let him look. He's just watching you to keep you safe, you see. He's not recording the whole thing and uploading it to his friends. (Or then again, maybe he is...)I'm no pretty girl, and no terrorist either, but I sure as hell don't trust them not to look at my data if I say something with enough juicy keywords in it. Their own slides prove they have it, and they look at it, and we have only their word that their privacy app does what they say it does.I don't trust them, and I don't trust their privacy app. The constitution says they can't have our data without a warrant issued by probable cause. It doesn't matter if they promise not to look; if they don't have it, they can't look.
Contact these other parties* recipient3@example.com * recipient4@example.com.Tell them the cake is a lie. There is no cake @ the end of the org chart meeting.
This entry was published Fri Aug 02 11:21:44 CDT 2013 by TriggerFinger and last updated 2014-03-16 02:50:19.0. [Tweet]
Follow @TriggerBlog |