« Mukasey Gets the Nod From Senate Judiciary | Main | Pakistan and the Serious People, One »

November 07, 2007


Welcome back EW. It's nice to have our resident psychic back ;) As they say truth is stranger than fiction, or in this case the same as wild-arsed speculation. Who would have ever imagined ;)

I wonder if Joe Knollenberg had his Chief of Staff request the NSA run a listing of all Toyota buyers in MI, because Al-Qaeda only buys foreign cars as part of their plan to destroy America?

Perspicacity. You could author a book called On the Decay of the Art of Lying oh, you did that. Good to have you back. How was the NFC North tour?

The fruit of this program lies mainly in basic research, not in finding actual terrorists. Although work on sparse network analysis is starting to mature, there is nothing on massive, robust networks. As the map becomes equivalent to the territory, the nature of statistical inquiries shifts from finding patterns and trying to reconstruct the generator functions to finding patterns that represent well known perturbations. In order to study this kind of thing, the researchers have to impose the perturbations, so target populations for that kind of work are better chosen from those who are not actually interesting from a security point of view. And, obviously, national security, at least when you force yourself to see like a state, is far broader than terrorism perpetrated by malicious foreigners.

Datamining with modeling. I am reminded of another attempt at that: the weather. Since we have data from every possible source, many dating consistantly back to the 1860s, there was no shortage of data. The modeling took quite a while, while they were figuring things out. As an example, they predicted no El Nino one year, which was stark raving wrong, and it turned out that the smoke/debris from a volcanic eruption obscured the oceanic patterns that are indicators of an El Nino. It took about 10 years to amalgamate reliable data modeling, and another 5 to fine tune the meta patterns. With climatic change ever present there will always be needs to tweak the models, and they are not going to be able to predict the odd micro climate (areas less than 5 miles) for a long time, if ever.

Obviously there are grave differences between data points collected from, and models extrapolated from human activities as opposed to the weather. I believe unfortunately that attempts to stop data mining and modeling will not suceed, IMO attempts to stop these activities will only drive them underground, into the black pit of the Pentagon, reguardless of how much legistation is passed. The software and hardware industries have too much interest in selling their products, and the military thinks this is a cheap way of to replace the human intelligence side that we lack. That there have been no obvious sucesses with the mining/modeling will never deter them, they will just point out that it took the programmers 15 years to get the weather problem mostly under control, so the military should be allowed at least as much time to perfect their models.

If the weatherperson is wrong about the weather, one laughs at the weatherperson and moves on. If the government is wrong about a terrorist, there can be serious consequences, from breaches of privacy, to torture and death. Never mind that terrorists can change their ways much more quickly and much more effectively than the weather, the military will argue that these are necessary tools that need time to be fully developed. Sadly, they will probably get their way.

In my opinion the only points of control that we have are data interpretation and storage. We must never allow the interpretation, or the storage of data to ever be done by third parties. Congressional and legal oversite must be maintained on both. As a witness in a CoIntPro case I saw first hand how the third parties gussied up info and downright lied to get money from the all too willing and gullible government. This must never be allowed to happen again.

Oh my god, this happened to me. I bought stuff at a middle eastern grocery store (can of dolmas, flat bread and some spices, for the record) on san pablo ave.in berkeley CA and the next day I got a call from (I thought) my visa company asking if I had made such a purchase. I always wondered about that call....

falafel eaters: BAD
water boarders: GOOD

what has america become?

Sailmaker, the weather problem comes from the desire to predict (and, at least in Johnny von Neumann's dreams, control). A state can afford to wait until a bad NGO is identified before responding. The tricky bit with networks is the recursive or self-modifying property. Weather is a dynamical system, so a much different form of computation. So to learn how to recognize patterns of interest in network data, you need the initial pattern and the response to perturbations. For example, a quilting club and a grow-op might have similar patterns of network traffic, but if you conspicuously park a large sedan, with a well dressed fellow wearing sunglasses, in front of the house of one of the participants of either group, the quilting traffic is probably unchanged while the other group goes silent (it's not just cyber-perturbations that are of interest).

The models from sparse networks are already good enough to significantly influence advertising, trading, commerce, and a bunch of other stuff where serious money is at stake. The only people working on robust networks are the NSA. Perhaps the Chinese are way ahead in this game.

I think the goal is to predict behavior - that the goal is to prevent bad behavior; the zero - tolerance-darn-close-to-thought-police Cheney thing.

I think that there are contractors for NSA, as well as private firms (like the one Poindexter headed) doing work on both datamining and robust systems . Other governments that are working on the problems can be seen by the lists of attendees at VLDB:

Link to the security side of VLDB

Link to VLDB

I put the weather analogy up as a conventional use of data mining/modeling as an example of how data and modeling have worked in the public view. I think that the weather modeling will be used by the military to convince people to at least let them continue to drag-net datamine all of the U.S. for at least the rest of this administration.

Ken, but that's the catch isn't it? The Bush mantra is all about PRE-emption, they have no interest in waiting for a bad NGO to be identified. If they did, they could bloody well get a warrant. They want to sweep people off the streets on tenuous suspicions at best. They have already done it. And god help you if you get picked up, they torture the innocent as readily as the guilty.

Phred, the type of NGOs I meant were the law abiding, citizen led, politically active groups so loved by de Tocqueville. Warrants are still hard to come by for these types. "Bad" is in the eye of the beholder, and when the beholder is Cheney, then bad is anyone who disagrees with him. You should hold no illusions that this program was ever intended to prevent, or identify, unlawful behavior. The law is but a weak tool for controlling the citizenry; it has had its day (cf. police states from the middle of the last century).

Here is Jorge Luis Borges (but substitute social relations for geographic data): "In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography."

VLDBs are microscopic compared to the daily traffic of voice and data packets around the world. Maybe the NSA lacks people with imagination and curiosity and so they're just scaling up VLDB techniques using bigger and faster computers, but I wouldn't bet on it. They have a mighty big carrot dangling out there (rather conspicuously with all these leaks).

The way Yahoo, Google, and Oracle are going with data is to push more data to the pointers as the hardware is the chokepoint, and use vast arrays of processors do searches. I presume that the military is doing the same. Some operating system gurus say that the real time copying the AT&T whistleblowers says is happening is possible, but that they believe (and do not know for sure) that the government is stripping off the meta data for storage and later datamining. Even so, rough calculations show terrabytes of data per day could be being collected.

As Jon Stewart said, "What? They could not connect the dots before 9/11, and we are giving them MORE dots?????

The comments to this entry are closed.

Where We Met

Blog powered by Typepad