I enjoyed the David Fincher remake of The Girl With The Dragon Tattoo more than I thought I would. Rather than a shallow and cynical Hollywood cash-in, it's actually a tense, atmospheric, only slightly voyeuristic crime thriller. My favourite part, though, was when Lisbeth Salander begins to solve a 40 year old murder cold case using SQL.
We see her tapping at her laptop as she hacks effortlessly into the Swedish police database, interspersed with green-tinted tracking shots of scrolling text as she types in keywords like 'unsolved' and 'decapitation', though never quite the whole query:
Naturally I couldn't help stitching a few screenshots together in Photoshop, and this is what I got:
Immediately moviegoers will notice that this can't be Oracle SQL - obviously the AS keyword is not valid for table aliases. In fact as we pull back for a thrilling query results listing we see the mysql prompt and giveaway use [dbname] connect syntax and over-elaborate box drawing.
Notice we can just make out the 'FT' of an ANSI left join to the Keyword table.
Finally we get a full-screen shot of the results listing for Västra Götaland:
Here's what we were able to reconstruct in the Oracle WTF Forensics department:
SELECT DISTINCT v.fname, v.lname, i.year, i.location, i.report_file FROM Incident AS i LEFT JOIN Victim AS v on v.incident_id = i.id LEFT JOIN Keyword AS k ON k.incident_id = i.id WHERE i.year BETWEEN 1947 AND 1966 AND i.type = 'HOMICIDE' AND v.sex = 'F' AND i.status = 'UNSOLVED' AND ( k.keyword IN ('rape', 'decapitation', 'dismemberment', 'fire', 'altar', 'priest', 'prostitute') OR v.fname IN ('Mari', 'Magda') OR SUBSTR(v.fname, 1, 1) = 'R' AND SUBSTR(v.lname, 1, 1) = 'L' ); +--------+---------+------+-----------+----------------------------------+ | fname | lname | year | location | report_file | +--------+---------+------+-----------+----------------------------------+ | Anna | Wedin | 1956 | Mark | FULL POLICE REPORT NOT DIGITIZED | | Linda | Janson | 1955 | Mariestad | FULL POLICE REPORT NOT DIGITIZED | | Simone | Grau | 1958 | Goteborg | FULL POLICE REPORT NOT DIGITIZED | | Lea | Persson | 1962 | Uddevalla | FULL POLICE REPORT NOT DIGITIZED | | Kajsa | Severin | 1962 | Dals-Ed | FULL POLICE REPORT NOT DIGITIZED | +--------+---------+------+-----------+----------------------------------+
Shocked moviegoers will have been left wondering why a genius-level hacker would outer-join to the Victims and Keywords tables only to use literal-text filter predicates that defeat the outer joins, and whether MySQL has a LIKE operator.
75 comments:
Also it seems that the table design has separate fields for the different date parts instead of having a single date datatype column.
ts ts ts
I expect they considered "where i.incident_date >= date '1947-01-01' and i.incident_date < date '1967-01-01'", but decided that despite representing a better data model it provided less cinematic impact.
I was shocked to see such appallingly explicit portrayal of an entity-attribute-value model. Not to mention the sick insinuation of the presence of an in-lined BLOB.
Young impressionable minds witness this kind of thing and studies have shown they go out into the world and commit copy-cat acts.
Just this week I came across an EAV model that mirrored this scene in a chilling manner.
The BBFC must come down on this sort of thing and not merely assume they have discharged their social responsibility by slapping an 18 certificate on it.
I find the DISTINCT clause funny, why would multiple rows be returned? Sure, there can be many incidents for the same victim but the incident type is limited to "HOMICIDE". Could the same victim be killed multiple times?
I assume l.status is a typo on your side (should be i.status)
I hope for a next film remake they hire a senior database expert to help them designing a better data model and also rewrite SQL for more accurate semantic and also better performance.
Klippx - I think she needs the DISTINCT because of the keyword list.
Thanks for spotting the typo, I'll fix it...
Adrian - I did consider starting with a parental advisory that this blog post contained strong though ANSI-compliant language, but in the end I felt it detracted from the artistic integrity of the piece.
LEFT JOIN...becuse not all incidents have victims or keywords
Except ones where v.sex = 'F' and k.keyword in ('fire', 'altar', 'priest', 'prostitute').
Professor Plum in the Conservatory with the Lead Pipe.
A ruthless and cunning criminal who left no trace of tell-tale keywords behind in his crime scene.
WHERE i.year = 1926
AND i.type = 'HOMICIDE'
AND v.sex = 'M'
AND i.status = 'UNSOLVED'
AND k.keyword IN ('Candlestick', 'Knife', 'Lead Pipe', 'Revolver', 'Rope', 'Spanner')
AND SUBSTR(v.lname, 1, 1) = 'B');
Hey, it worked in the movie. :) The casual viewer is convinced she is a hacker queen.
Awesome post!
I'll bet this will show up as pasted code in a java app at some stage in the near future!
SQL? Pasted into a Java app? Are you crazy? Something similar, yet totally unintelligible may be dynamically generated by Hibernate in the data persistence layer, surely?
Posts like these reinstate my lost faith in humanity. I love you guys.
"LEFT JOIN...becuse not all incidents have victims or keywords "
But one of the predicates is that the keyword falls within a set, so the keywords are not optional.
And to a prior comment, the distinct is necessary due to the presumable many to one relationship between keyword and incident.
MySQL does, in fact, have a LIKE operator (though I don't see where it's used in the snippet you posted).
Great post *and* comments, but definitely: Noons FTW!
They should have just stuck with the usual rotating 3D shapes, furious key-bashing and flashing "HACK IN PROGRESS" text. Far less embarrassing.
I wrote the prompt code for the MySQL command line client. I did not get royalties or even put in the movie credits =(
Oracle owns mysql now, maybe the line shouldn't have been taken so literal and it was just a mistake by the person saying it.
What's the EAV portion that Adrian is alluding to? I feel I must be missing something.
Hey, you might understand SQL but how about some basic HTML: Don't hot link images from some poor guy's site. Download them and upload them to blogger so you won't make him exceed his quota at his website provider.
Rottenchester: It's his own domain that's hosting the images :)
Yup, great idea that turned out to be...
Scumbag author:
Knows SQL
can't link images to save his own life
Thanks RC :) I wasn't expecting someone to link to it from news.ycombinator.com and suddenly get 65,000 hits. Just moved them to Flickr, normal service now resumed...
Sheeri -
> "MySQL does, in fact, have a LIKE operator (though I don't see where it's used in the snippet you posted)."
That was my point. They expect a movie audience to believe that a world-class hacker would type "substr(v.fname ,1,1) = 'L'" when she could have just used "v.fname like 'L%'", and possibly got an index range scan?
Luke - I often wonder what Adrian is on about. No idea either.
I've used "as" with Oracle...
But only in SELECT lists and WITH clauses ;)
The fact that the "table name AS" syntax is what jumped out at me when I watched this scene is what I really find funny.
Real man would have used MDX ;)
@pedro: you're just saying that because it rhymes to "cam wee sex"
Honestly, I'm just impressed that they used real SQL, a real database, a real terminal emulator, and real Google Maps. So many movies depict hacking/computers in general as looking like some kind of video game...
Relevant: http://www.youtube.com/watch?v=dFUlAQZB9Ng
"It's a UNIX system!"
Congratulations to all for the great comments :)))
Regarding the 'like':
do you know if there is an index on both fname and lname?
If not, a full table will be executed as using the substr.
But the substr is faster.
So the sexy girl is right. Go back to study more guys.
Actually the Terminal title states 'mysql' so that solves the problem before even looking at the syntax.
The terminal window title says "Terminal - mysql". So MySQL it is.
Great post !
You fucking nerds need to get laid...
I have little confidence in policepersons being able to spell dismemberment in their reports. It's not as though they had drop down menus in 1947.
Regarding multiple homicides on a single victim, sure, why not? All you need are methods that are not instantaneous.
They should pipe the output through DECTALK, the Swedish accent would add verisimilitude.
I am not a robot. But my kid is. urytua
@Massimo
> "But the substr is faster."
That's right, "substr(v.fname ,1,1) = 'L'" is much faster than "v.fname like 'L%'". Everyone knows that.
Gotta watch that movie at last, that's one more incentive.
Guys... If the database was perfect they would also have perfect security, which would ruin the whole story.
In the real world, budgets require shit security and shit database design.
I think it's super awesome that they're actually using not only a real computer OS, but an actual real computer language and terminal interface. And you're complaining that the results don't match the query?
Bartholomeus is obviously not aware of the many hot and available women who hang out in the SQL bars of London, just waiting to be impressed by sardonic attacks on the misrepresentation of predicate logic in the contemporary media.
@klippx:
I don't know about the rest of the world, but in Sweden the same crime might get several incident reports, and it won't be cleaned up.
For example: One row for the one witness, and another for another witness if the cops aren't sure it really is the same murder. Another report for the first parts of the body, another for the chopped up parts nearby, and yet another for the place where they dismembered the body (and therefore might actually had killed the victim) etc etc etc.
In short, there are about 100 murders in Sweden annually, and about 150 reports. If you want to be sure, about the numbers you have to remove multiple rows.
@Logan - we love them for it too. Nobody's complaining.
I look forward to seeing a lot more SQL in the movies after this.
@Robbin - please tell me you hacked into the Västra Götaland police DB for those stats.
strange there are so few results considering they also wanted to know whether a priest was involved in raping a 'female prostitute'; multiply that by 100 if it the homicide involved the rape of a minor....a parody, not to be taken seriously!
Also, I'm fairly sure that Swedish domestic police databases would have keywords in Swedish??
You really took the magic out of the movie.. thx
Guys... It's MySQL, not Oracle SQL
... OR ( SUBSTR(v.fname, 1, 1) = 'R' AND SUBSTR(v.lname, 1, 1) = 'L' );
"The inspector then concentrated on the names. He obtained a list of everyone in Hedestad named Mari, Magda, or Sara or who had the initials R.L. or R.J. He had a
list of 307 people. Among these, 29 actually had some connection to Harriet."
OK, I am a robot.
Joel - why do you think you're a robot? We're here to help.
Oliver - are you writing a novel?
I've been having difficulty reading many of the recaptcha presented by various bloggers. I don't have problems (besides carelessness) reading anything else. What do you make of the number in the "robot" link I posted?
I don't see a number. But maybe it's only visible to robots.
Yet, the Please prove you're not a robot expects you to enter the number. Therefore, if you can't see the number, you are a robot.
I just think this is an irritating wtf. Unfortunately, until some real non-repudiable infrastructure comes along, I'll be complaining about it.
I did think about switching it over to the Voight-Kampff option, but that needs a camera to monitor pupil dilation and a lot of questions about turtles.
I'm glad all of you geniuses are all in one place. It's like everyone saw the same movie and then talking about it scientifically. wow, I'm impressed.
I think I see the film William, had not thought these issues.
It looks interesting.
Best regards!
lmao awesome post
I don't get it - what's wrong with the JOIN? She wants to see the data from joined tables in the output after all.
I'd like to see the actual query rewritten by you, guys, to see what you think should be the correct version. Not just pointing fingers "this is wrong, that is wrong".
The "like" part is bad of course, but well, maybe "substring" + "=" will ensure case sensitivity? Not sure why would that be needed, but well, it's just a movie after all - you can forgive substring instead of like in a movie. I know you can - just think of all the other movies about hackers and computers ((-;
It was the outer joins that (surely) shocked moviegoers. If we were to rewrite it in a real SQL crimefighting scenario tomorrow we would leave out the "outer" keyword and use the time thus saved for catching criminals.
With hindsight the 'Mari, Magda' thing didn't help but that's forgivable. We probably wouldn't use green on black or uppercase keywords either but again, this is the movies and I suppose they add drama.
More importantly perhaps, this piece is intended as a joke. It is not actually a criticism of any aspect of this tense, stylish, atmospheric and only slightly voyeuristic crime thriller, least of all its database query syntax. It's not "Taken" after all. What was funniest to me was noticing my own reaction to the "AS" keyword as the text scrolled across the screen, and it is this tension between movie drama and office life that I have tried to capture.
I like soup
Obviously, I'm late to the party. I worked as a consultant on The Social Network and The Girl with the Dragon Tattoo, part of which work included creating this sequence. I must admit, I never hoped it would get so much attention, nor apprehended that it would be subjected to such scrutiny.
For anyone interested, you can check out the original screen capture here. Also for your reading pleasure (and I'm sure my continued embarrassment), SQL dumps for the mock Värmland and Västra Götaland police databases.
To give you a sense what we were going for, have a look at the script, Cmd-F 161AA. (I have no idea whether this copy is posted legitimately; apologies if it suddenly disappears.)
Thanks for by far the most thorough, if scathing, code review I've ever endured. I'll be coding gingerly for at least a month.
Paul - welcome, and thank you. You are a hero to Swedish SQL-based crime fighters everywhere.
Wondering how she knew that the keywords would be inserted in lowercase in the database...anyway she saved some time skipping the UPPER function
Inspired by Paul's comment and links to the actual source code (always a help in a forensic SQL reconstruction exercise) I took another look at my screenshots and noticed the third small screenshot (beginning 'ctim as v') fits on the end of the longer Photoshopped one - now updated - and explains why there were no results for victims with initials 'R L' (it's part of an 'OR' condition, along with the 'Mari'/'Magda' thing that didn't get any hits either).
@oracleappsnotes: String matching in MySQL is case-insensitive by default.
@Thomas Okken
Ah! That explains it.
I just googled "girl with the dragon tattoo SQL" after taking the exact same screenshot as your last one. I didn't realize there was a fuller view of the query. The end of the query has an "AND" [AND SUBSTR(v.lname,1,1) = L]. So, for the whole outer AND (containing [k.keyword IN ...], a last name starting with L is a requirement! Right? None of the last names start with L. All the other parts of the WHERE clause are ANDs.
Maybe I'm missing something, but it doesn't seem like the query results could have come from this query. Is it a movie mistake, or is it me?
My impression is she should have written the last line with another set of parentheses, because wasn't she looking for "R.L."?
OR (SUBSTR(v.fname, 1, 1) = 'R' AND SUBSTR(v.lname, 1, 1) = 'L' ));
The final "AND" consists of three alternative conditions separated by "OR". The results shown must have matched on one or more of the keywords in the first "OR" condition, and the whole 'Mari', 'Magda', 'R. L.' hunch went nowhere.
Post a Comment