Two Croissants

[Draft] How Prism could be actually a good option

Posted on June 12, 2013 by Bertil

Last week,
a whistle blower revealed that the NSA claims within the US
intelligence community (IC) to have direct and exhaustive
access to the data servers of many large US-based digital
services. Most reactions that I’ve read are either from
privacy advocate who would be flabbergasted, if they were
surprised, and a small majority of the US public who sees
that as a reasonable solution to terrorist.
The problem revealed many issues, none the least of which is
that the IC didn’t spot someone with so much access as
considering to blow the whistle: discussing moral issues is
very important and relevant, but please take a second to
imagine What if that guy defected?
This to show that they are massive problems within IC, mostly
political: a criminal sense of entitlement and rectitude, a
lack of self-criticism and endogamy.
That goes beyond the IC and into the US Army: Abu Grahib
was an example of that entitlement gone wild. Another
less disheartening example — and, importantly, described
as relevant by the authorities — would be a documentary
Restrepo. Soldiers carry cameras while keeping the peace
in an Afghan valley and the filmmaker later simply edited
their rush into a film. Their constant, open, transparent
despise for the local population reeks through every hand
gesture, every sentence.
I suspect that their relative silence is habitual, but might
be an outrage so great it can only be contained, or,
hopefully a significant reboot.
What are my intentions with this piece?
I want to challenge what appears to me as a consensus that
privacy should be respected
More specifically, I believe that computers would be a
trusted intermediaries between two steps that can now be
distinct — by were not with human inspectors:
collection of information and
revelation to a human prejudice.
There are no real contradictor
I’m not really, but we need one
I only hate one thing: “You disagree with me
therefore you are stupid” Learn immensely from
understanding coherent opposing points.
Main reason why sophist taught to argue it:
making cogent is best way to make sure you got
it, rather than ridicule it as a in-group;
Point was not absolute relativism, but humanistic
concern that one might be wrong, and exercise
against abrupt defensiveness
More than corrupt Congress, Red vs. Blue overall
political and geographical structure, this is the
problem of our time, on Social media & others
Better than “Who cares: those are bad guys—I’m not” a
truly misinformed approach
Abuses are far too easy to set up, as demonstrated by
the history of US secret services many criminal abuse
of the law
There is a track record of innocent people being
tortured to death, including the anal rape of a then
9 year-old boy in front of his father.
Yes, the IC includes as its most central members
documented child rapists and their apologists who
roam free. I did mention a cultural dissonance
earlier, did I?
Actually, that particular case, and the use of
torture makes a compelling case against secrecy in
extreme cases: rather than interrogations enhanced by
violence damaging to the truth, an interrogation
augmented by network counterpoints would help spot
contradictions. I’m not sure anyone needs a big
computer and too much data for that, but if need be:
my point is that idea is better than physical
violence, a currently repeatedly used technique,
against all international treaties.
Complete digital disclosure seems a far
preferable option — if any differences are to be
noted, it works sometimes
Obviously, that needs to be decided by a judge
who has a strong understanding of due process
I can’t imagine why anyone would imagine I’d be a
terrorist apologist, but I came too close to several of
those famed events, too many to think that fighting
terrorism is a costly luxury.
Both brothers were threatened
in 95, so anyone assuming that it started in 2001
has my sincerest contempt.
More recently, in Boston—real problem was the
completely excessive reaction of the Boston Police
myself above Victoria with security consultant
I still think car accidents are a thousand times more
important problem.

Many friends have SSSS, Arabic names
The current system is not good, far from it, and
trying to improve it would be welcome.
One potential improvement would include algorithms —
roughly what is known as machine learning, and
network algorithms
I work on those, I’m harshly critical of machine
learning, and I’m all too familiar with the
limited capabilities of network algorithmic
outside of the IC, so this is very far from a
current recommendation — an exercise in theory
I know that IC has been working on network since
identifying 9/11/2001suspects cell structure (to
figure out how many masterminds were outside of
the planes, presumably alive and which); I
suspect that IC has better software solutions,
and a far less critical approach to machine
learning; actually, I know they do: check th
elite information that trickles about the company
‘Palantir’
I want better understanding of those issues
Eg : status for lawyers, Glenn Greenwald, friends,
etc.
Develop those for companies, why not government?
Grey information, but with better return can
Secrecy is useful, but certainly not what we saw:
large digital corporations are transparent about
their practice, and that is a good thing.
Many security consultant would prefer
Problem: culture of security consultants in general,
and the industry—they deem opponents negative
So big that fails to have proper internal controls;
always the faint possibility of a whistle blower that
would fix that—basically Julian Assange’s argument
for Wikileaks.

Not really a hero
at least not a knight in shining armor:
not very brilliant
Fairly basic mistakes
got the essentials right
not a total victim like Manning
Depressed, gay during DADT, tortured
Was in control, considered if could have done or
not, took initiative
very partial leak: focus on the core of the
problem
Reality might be hubris
or it could be sleeper agents within all six
organization
Last point (sleepers) is the scariest:
Large corporation with that responsibility
can have leakers of that magnitude
Same for ‘IC’
(Intelligence community), the network of
CIA, NSA, other agencies and the many
contractors
What is the IC?
It’s a coherent network of a dozen ‘letter soup’ agencies and
private contractors who work for them, all holding classified
accreditation; it’s hard to circle as most of the contracts
involved are secret, but it is increasingly private.
One interesting aspect of such a resounding echo chamber is
it’s ability to follow phases: lately, it’s been a
fascination for applied graph theory, combined with extensive
records — ‘Big Data’. A company representative of that trend
would be Palantir.
That fascination is old: it started when Met those while
talking to a member, during my PhD
Most of them can be seen watching

“Prism” an implementation and a concept
No doubt that Prism is squarely against the US Constitution
all those briefed on it have a duty to disobey;
What if technology has made the constitution slightly
obsolete?
However, overall gathering is not necessarily a bad thing on
itself
Many threats, abuse — too long to list, scary and
rightfully the main reason why this is not a good option
for the moment
Potentially changed by strong cryptography and human
institution
None of those are safe, as demonstrated by
countless
Best is probably the eponimous Minority Report.
Intellectual exercise in changing the constitution
Proper judges, trained on those models, capable of
assessing the many false positive
Having judges capable of understanding technology
might seen even more unlikely, as demonstrated by the
criminal incompetence of [Hadopi]
My idea is closer to the computer in Person of Interest,
without the creepy controller who try to take over.
OK, that is not possible: extreme power corrupts
extremely, etc. This is an intellectual exercice: I want
to list what is needed to challenge the principle behind
privacy that information is revealed and known or not at
all.
Basically, exploring the idea of grey information,
steganography, enlightenment, and its ramification
Ads in GMail do it quite well, with a spectacularly lower
cost for mistakes, so there seems to be a possibility.
What if a computer could,
with a lot of help from analysts (notably is generally called
‘supervised learning’ which is actually just handing it a
sample) who never really see personal information find out 50
suspicious cases, including the 10 most threatening plots
around the world?
Of course, it would have to be very intrusive in its
collection of information
however, the revelation would be minimal
I know what I’m talking about: that’s highly possible, but
requires dedicated software
isolate concerning cases?
biggest problem: false positive
actually, false negatives too: those are terrorist accidents
like London, but how to treat those is obvious: don’t — real
concerning moral problem is can we train analyst in making
sense of false positive without a clear moral compass?
Fiction provides us with a clear response when you have
an absolute moral compass: Person of Interest. If ugly
people are always violent, morally corrupt, and cute and
timid people are always adorable, there is no problem.
Human access to information limited by the machine and a
chain of proof
Generating algorithmically a chain of proof that could
sustain legal scrutiny, and the many layers of the law is
actually a very complicated problem. A friend once said
she was able to — I still have a hard time believing she
did — let’s say she did.
Then the “anything, anywhere about any body” wouldn’t be
the same when it refers to
a random Booz Allen employee
storage — because that data is already stored anyway
A big debate to be had about whether the best IT security
professionals are at Google or for the NSA.
Google stores information with fairly informed consent, so as
long as they respect cur process and industry standards, they
can safely claim their storing it is fine
This post is to argue a government could have a
non-accessible-to-human copy
Actually, that’s the case of Google’s if I understand
properly
Of course, if the government is corrupt, there is a
problem, but the only solution that I find to those is
oversight, and that has failed to massively in the US
lately that it’s hard to just assume that makes sense.
Let’s say the US signs The Hague convention. Once again:
this is a ridiculous attempt in theory, and an informed
counter-point to the wave of hostility to Prism
Truth is: in reality, at Google’s, access it permitted to
very few, mainly to assess major legal issues (read: child
pron, and potentially terrorism although I doubt a Google
specialist would be able to find suspicious what I presumed
are encoded conversations)
Those have to associate every query with a case file, and
that is seemingly very closely controlled
There was one abuse—sad, spotted, which at the time
was deemed a proof the system worked
Inventing a case file is potentially easy, or other
abuse—let’s assume that monitoring officers for those are
as many, incorruptible and monitored as the first layer.
Basically, ask what the back office and the
Inspection Générale of the Société générale did for
the team Delta around Jérôme Kerviel and do not do
that.
Filling in details as to why you need is tedious (and I
understand analyst want to avoid that and open bar sounds
more appealing) but it is relevant and necessary, unless
other means of information are illegal, frown upon, there
are suspicion of a mole, etc. All problems that would
actually greatly benefit from proper documentation:
documented illegal spying, form the secret service would
be used to justify more extensive laws, etc.
The real issue was not as much that so much information was
gathered, but that it was considered a good idea to boast
about it, and let so many people share it, that one thought:
that’s bad
Yes, I have seen such database—general attitude is fairly
cavalier around them
In my case, the people in charge were fully aware of
the implications, so much so that it drove one of
them into very scary problems.
There is a lack of software to help figure out things
from it without breaching privacy
I wanted to so some calculations of Facebook graphe —
never could, no one at Facebook would take in request
Actually fairly simple at a basic level: would any number
printed out be different if one or two values where
different?

* [ ] Moral selection bias
One thing: Intent really matters in perception.
I’ve seen several scientific papers published with
interesting social details, not the least amusing one
this map of Facebook relations — all of which, to a
neophyte in large database would need to have access to
an extensive personal network. All have been acclaimed as
futuristic, beautiful, interesting visualizations.
Formally, most of those were made with the same meta-data
that many now claim the government shouldn’t have access
too
Examples
Graph of Belgium
Positions in Paris, Rome during concert
Map of NY by international calls
Difference is: those were stored extremely detailed,
indeed, but processed and shown only as collective.
The public imagines that there is something, either
soft (a habit, an institution, personal ethics,
research goals, lack of time or interest) or hard
(law, code, technical means) that prevented the
scientist from accessing such personal information.
Truth is: I’ve seen every possibility, generally
without real checking from the scientists. Spying
on your significant other because you have
extensive e-mail structure doesn’t really make
sense when he or she is the only person that
doesn’t yawn when you talk too much about how a
triadic clustering at twice the complexity
factor, that’s a good (algorithmic) bargain.
Plus, you know far too well that the very active
relations you unearth that way are too often
spammy joke chains, gossipy non-sense and people
arguing to get out of those.
Really interesting relations — just like that
intelligence officer whom I met once, with whom I
talked for two hours at best, at a conference
where he was actually using another grad
student’s badge, so he most likely never had his
name on the same document as me — those require
to understand context, and how that person could
change your perspective, and have.
I guess I’m not convinced about an algorithmic
solution just yet.
Arguments of the CI
“Publishing details would harm national security”
Does anyone else has a problem with the word “national”
in that sentence? Seriously, that’s how you thank the
Coalition of the Willing?
No, this is a question of democracy, and I have seen
nothing so far that, as far as my imagination allow me to
think like a terrorist, would change my attitude
The biggest problem that I read from the many reactions
is the idea that if some people can see anything, than
they will. No: your life might not be purely legal, it’s
most likely not very interesting too.
Included in the word surveillance
http://www.guardian.co.uk/commentisfree/2013/jun/1
1/nsa-surveillance-us-behaving-like-china
<http://www.guardian.co.uk/commentisfree/2013/jun/
11/nsa-surveillance-us-behaving-like-china>
Certainly true for the Stasi
About the technology
There has been a lot of fantasy about what is possible: from
IBM first typewriters that helped write down list people in
Concentration camp to very fancy recommendation engines,
there are worlds of sophistication — and all have been
described with the same
Nee dot be clear as to what those software can do — as
far as I can tell
Associate network equivalents
One person uses two cell phones, one official and
one less
theory is that if he uses it to call
everybody but one with one, and the same list
plus one very bad guy, he is trying to hide
something.
Seriously, using your burner to call your contact
book? That doesn’t make sense: what is used is
the network of telephone, antennas at a given
time: if he carries both at all times, you can
associate the two. So do not carry your burner
and live in crowded areas. Not sure that one was
hard — even if the algos are actually fairly
expensive to run live, and the CIA hates that
people who are not the Agency can to that and
burn their own agents, who carry their burners
with them, and can have atypical movements.
Cluster
Find out the real frontiers of a group of
friends; helps to sort ‘I know him from the
mosque’ and ‘We are part of the same cell’
Seriously, I tune those: those are very
sensitive. Not very convincing in general. Most
cells have hierarchical belonging, so whomever is
on the frontier probably is so for a reason
Pattern recognition
That one is very, very greedy—only one who could
justify the extent of the server farms that has
been mentioned, with decyphering. Others can be
run (and have) with simple servers.
Can isolate methodology and cell structure
I do not believe it is efficient.
All of those can be easily disrupted with something
as sophisticated as compartmentalization — as
practiced in Sleeper cell, the very good TV series.
Interestingly, that compartmentalization is both
for internal reasons — mainly fighting embedded
agents — and because the principles used to track
9/11 terrorist were proudly explained in the
NYTimes after the fact. Intelligence community is
its worst enemy.
If it was meant to detect terrorism, how would that
super-computer work?
Associating commercial transactions to known threat
patterns
bomb making elements, finding large cash transaction,
or repeated small purchases
Not sure how to train that one properly, other than
with ‘scenarios’ but could help especially if it is
handled the same way credit card handle fraud
detection: “We would like to thank you for you recent
purchase of a boat. By the way: security line is …”
Far more likely to be relevant for organized crime.
Network of scammers, notably for delivery post boxes of
goods from stolen credti cards (a problem of systemic
nature by excellence)
A camera doesn’t see: it records.
There is no human judgement behind most security cameras,
and that’s frustrating, odd and new. As such, many do
stupid things in front of those cameras, but they are
anticipating that there is someone watching, but that
someone is powerless—an inversion rite of sorts.
For someone to actually see the content of that record,
you need to have someone who argues that there is a
problem — convincingly enough to have that tape
considered as a trace and a certain moment isolated.
It doesn’t mean privacy shouldn’t be respected; it means
that the standard for removing the protection that an
individual expects from the consequences of an authority
watching, considering, analysis as such, his records
needs to be raised high.
One way to have a high standard for watching, but
still have information is robot-analysis.
Human looks are partial, interpretative, normed: they are
judgmental. Combined with agency over you, it leads to
always uncomfortable and sometimes bad things. That’s
what the concept of privacy is trying to protect us
against. When that fails, the idea that one has the right
to defend, and offer an alternative explanation from
traces is a sign of how creative can get story telling
from partial evidence.
Should the disclosures be extended to non-imminent
terrorists?
Cf. road auto-ticketing devices
Correction via feedback: “those are dangerous
clerics” — well, thank you for telling me, I wasn’t
sure about they actual policies
I’ve been there: having algorithmic feedback is
very useful, even for live decisions like
following a cleric.

Posted in Uncategorized | 1 Comment

On troll and coffee

Posted on April 3, 2013 by Bertil

Evgeny Morozov publish couple of days ago a hatchet piece on Tim O’Reilly. It’s both well documented, and surprisingly vitriolic as explained by Tim himself, with his disarming kindness. I have little to add on the topic that isn’t in the comments of Tim reaction. The accusation of Evgeny being a troll came up several times, triggering my well-documented reflex “If someone is called a troll, invite him for coffee.”

In his case, I have. AFK, he is a dismissive jerk that deserves every accusation he gets, especially not even trying to look for constructive options. He openly refuses factual corrections or theoretical suggestions; face-to-face, he does so with visible contempt. In his eyes, you can agree with him or be un-redeemably wrong, but you do not matter either way.

Invite trolls for coffee: that way, if they really are destructive, you’ll be all the more confident about it.

Posted in Uncategorized | Tagged Being nice, Trolls, Walk the walk | Leave a comment

Three lines of code to avoid being creepy

Posted on April 1, 2013 by Bertil

I like to read papers detailing how targeted marketing is Terminator’s SkyNet, or at least getting there. The wild speculations in there make SciFi features like the ‘Enhance’ button rather petty. They are very revealing of targeting mistakes and misunderstandings. For instance, those are too common and widely spread not to correspond to something commonly shared by people who have no idea what is happening, hence, they must actually adress a very sensible issue. Not sophisticated vector forests with asymptotical Barnard hyper-spheres, but something my mother would notice and perceive and not-human. One day, playing with a client’s ad platform, I figured it out.

What is the creepiest thing you can do to someone? Bible readers would know that: flogging righteous people, killing babies, sleeping with family members… Nope: All that is horrid and disgusting, and certainly what Terminator’s SkyNet has in mind for us but it’s not creepy. The breathing red light in T2000 eyes is creepy: that simple rhythm of an LED smells of betrayal and partially human. One of the creepies moment in the Gospel is at the end. It’s read on the Friday just before Easter, after Jesus was arrested. A woman asks Peter: “I recognise you. You were with him!—No I wasn’t!” What a jerk. But then she asks again. And he denies again, swears. And again: it goes on three times like this. He denies his faith and savior three times in a row, and goes with his repeated denegations from a coward and a jerk to the worst traitor possible. That’s actually an echo to his boasting about his faith three times, just hours before (and a cock singing three times just an instant after that).

The feeling triggered by the repetition if incredibly tragic. Humans know this — well, not severely autistic humans do: autism comes of as nothing but very passionate traits, and yet, that single repetition feature makes their entire behaviour come off as machine-like. Despise for repetition can be felt in many contexts: haggling, asking for clarification or swearing: you can repeat a question if you are incredulous, but not twice, and certainly not more. You can lower price, but three times because that would be perceived as indefinitely. Communication is considered broken otherwise. There would be no soul between your ears, you would have perceived how uncomfortable that is and changed the message sooner.

“I’ve tried three times and it didn’t change anything” is one of the most commonly heard complaints at tech held desk—in spite of the well-known definition of madness by Einstein: expecting something different (from an inanimate object). Computer prompts are frustrating like that, and marketing campaigns too. Far too much, especially for such well-controlled events. I often explain: if you want to make targeting, predictions and recommendations make sense, understand those as questions from a presumed human operator to a client. Your friends can ask if you are gay if you just mentioned liking Abba—without the implied joke and prejudice on their current fans, it would be deeply wrong to say, e.g.: from your entire profile, you probably want to join that gay dating website. But an inference, even implied by “People who bought this also…” that’s human.
However, exact repetition even of a legitimate question is creepy, and not processing a refusal as a definite No, not letting that clear cut end of that discussion is probably hurting your campaign efficiency. Probably — let’s not assume, and let you measure.

For that, here are three very simple lines of SQL to help you plot how your campaign efficiency is dropping when becoming insistent (and creepy). I assume that you have a table named, say, ‘Ads’ with every screen appearance of:

a given ad copy, or maybe campaign (uniquely identified with a ‘CID’);
shown to a certain viewer, cookie or user (uniquely identified as ‘UID’);
a ‘DateTimeId’ for the horo-date, and
‘Click’, a binary information about whether that ad lead to a click-through.

%% First line takes every combination and figures out which time the user first clicked and how many times he had to see that particular copy of that ad.

Select UID, CID, Min(Rank(Clicked = TRUE by UID, CID order by DateTimeId)) as ConvRanking, Count(*) as TotalViews
from ‘Ads’ into #Views Group by UID, CID as Repeats

%% Please note that, ‘ConvRanking’ can be null if the user didn’t click through.

%% Second and third line aggregates by campaign and user

Select Count(Distinct CID, UID) NbPairing, ConvRanking from #Views Group by ConvRanking,

Select Count(Distinct CID, UID) NbPairing, TotalViews from #Views Group by TotalViews

%% For a big operator, that table might be split by dates: I trust you know how to merge the temporary tables.

Plot both curves to represent how many repetitions (abscisse) you needed to convert someone (NbPairing as ordinate for ConvRanking), and how many times you tried and paid for it (NbPairing as ordinate for TotalViews): one has to be above the other. If both are close, you efficiently repeat and stop when the user is converted. If ConvRanking is below but not to the left, you conversion rate is low but your repetition relevant. If ConvRanking is to the left, then you repeat views of your copy too much, either already converted users, or not interested ones. Creepy and clueless either way.

I won’t breach confidentiality and share exactly how those curves looked like for clients, but I can tell you this: I have yet to find a client where the conversion rate does not drop at an alarming rate, revealing a large share of their marketing effort very easily filtered out: repetition beyond the Xth. Repetition does look like it deepens their inventory, without having to find more partners, however the curb that you just drew proves how pointless that can be.

Of course, now they have to encode counters in their platform, a surprisingly unplaned and hard thing to add. Now, the developer tasks for that hates you; bring chocolate. More importantly, a priority budget to do it most likely just revealed itself.

This goes again the usual carpet-bombing that most old-school advertising agencies liked, because it was so efficient—efficient because it saved creative effort and struggle to reach an agreement with their advertising clients, and maximised their share of buying space. However, it matches Google approach. It also gives an efficient tool to curb re-targeting perceived violence: imagine your assistant saying “By the way, you mentioned you wanted to buy this” one — that’s an appropriate and smart reminder; imagine that every time you come across said assistant… I hope you got my point.

I’m positive with a proper classification or either, more nuanced things can be done, but this is not post about “How 30 lines of code can prevent you from being creepy.” That would be almost too hard.

Posted in Uncategorized | Tagged Advertising, Repetition, SQL Code, Targeting | Leave a comment

WTF stupid questions

Posted on March 25, 2013 by Bertil

Most of my friends. Well, the younger, meaner ones. Well, there’s actually a dozen of them, max. Let’s call them my… noisier friends like to post screen grabs of stupid things that people say on Tumblr. It used to be Yahoo! Questions, it was Facebook for a while, but I’m old enough to recognise the geoCities frames on some of them. The whole approach is one of the most salient part of folklore, and generally feels like someone is piling on a stupid person somewhere. It feels a little bit too much like middle-school to me: attack whomever spoke up and said something you would not have, or didn’t want to have said.

I do realise that the proper reflex when someone is asking, say — let’s have the oldest myth of them all — if someone in a couple expecting ask if their baby-girl-to-be might become be pregnant too, the most likely reaction is to wonder how that person reproduced, and facepalm the lack of Darwinian selection away. However, I challenge anyone who laugh at this to describe at what age a female fœtus develops proto-reproductive organs, how developed they are and when biology was able to describe those properly. That question was (and presumably still is for some rare pathological cases) at the frontline of science. The frequency that particular story pops up says something about symbolic relations and smart Freudian would have a field day studying the history of that question and its mockery.

Those question and the reaction to them seem to challenge greatly the idea that there is no bad questions, just bad answers. How could someone ask “Why French don’t have their own word for ‘entrepreneur’ and had to use the American one?” and not be judged for the bluntness of their assumption? Well, they make a great point on the cross-influence of language and culture; and they don’t speak French. Billions of people don’t either, and some are very smart.

However, the best science comes from stupid: people so ignorant they felt like asking if the Earth is flat, what happen at the edges? XVIIIth century doctors who where well trained and knew that full human bodies where in the head of spermatozoids, yet wondered how big a magnifying glass was needed to actually see them. Marie Curie who couldn’t admit that all thermal energy had to come from chemistry, so that block of uranium had to be oxidising slowly, somehow.

Everytime someone mocks others, especially people that he doesn’t know, remember: intelligent people do not think others are stupid.

Posted in Uncategorized | Leave a comment

What doctors do

Posted on March 23, 2013 by Bertil

I know I’m not a doctor but after talking to many, working for a health-focused start-up for a year and a half and teaching statistics to many, there are things about medical practice and health that are better understood from the eyes of a statistician.

Watching House, M.D. with doctors is fun, if you have that kind of humour. First Hugh Laurie is an incredible actor, but more importantly, actual doctors participate: they try to figure out what is wrong. Except, unlike a common mystery series, they are trained for it, and have embedded good practice in testing, up to a point that it is a moral code for them — and they are genuinely offended when the wrong test is tried. And they argue, a lot. Generally, the show appears to neglect common ailment, and test done to sort them out.

When discussing with them, that seems to be the problem, but it’s not how doctors express it: they talk a lot about the order of things. “They should [test that] FIRST!” However, they usually have a hard time explaining why in layman’s term: this test goes first. They were taught that way. Diverging from this path is wrong. This order is present in medical thesaurus. It corresponds to the dangerosity of a potential diagnositic (suspicion of brain clots go very first, no matter what) but mostly what is actually a statistical notion, that most non-statistician only have a loose grasp: prevalence. Literrally, the likelihood, given the set of observed issues of each diagnostic. House, M.D. is written mainly by writers who are not practicing doctors, and who get their medical knowledge by opening thesaurus and looking down the list, and rare ailments. They write without prevalence in mind, and this drive practicing doctors mad:

The cynical part is that both diagnostic and treatment costs time and money and puts patients’ health in danger; that’s usually low, but when it is not (total XXX and surgery with its potential infections are classic threats that are recommended with parsimony).

What is poorly understood by patients —and rarely explained— is the difference between treatment and diagnostic. Doctors actually don’t do much, or rarely. They check a lot more than they make a difference: most people are healthy, and most desease actually cure themselves. In the sliver of cases when doctor prescribe more than rest, pain-killers and avoiding more harm, they generally nudge the body towards a faster recovery. Anti-biotics are a common re-inforcement for the natural immune functions; they have almost eradicated the common bacteria, they do not work on viruses — so, whenever you were last prescribed some, it’s highly likely you needed none of it, and were better off sleeping and drinking water.

What remains are rare cases where doctors know what is happening, and can do something about it. That’s far less likely than say case where they are not sure, and would rather not do much. However, because they live for those, and would hate themselves for not trying, they focus all their energy on finding those rare moments, and intervening — and they should. Medical shows, and common folklore on medecine does too: waiting in bed for things to patch up is a less appealing story. Even more shocking: having an unnoticeable, active day while your kidney is flushing leftovers from an infection that white cells have fended off before you could notice is the most boring story one can imagine. It still makes the bulk of your health.

Statistics is not about numbers, but framing the problem, and what should strike statisticians about current understanding of medical practice is the focus on salience —exceptional prevalence—rather than the consideration for the whole set. It is done in medical science, and the cost of including healthy sample to compare results is expensive, can come off as unnatural, but is necessary.

More importantly, expecting doctors to do something is wrong—it would be like expecting a driver to swerve every time they look at the road. Good driving has surprisingly few turns. Doctors correct courses on complex machine that heal themselves. Recognising their expertise in not doing much, the parcimony of their work is necessary. Just like recognizing the value of parcimony of many expert and complex decision makers: managers, educators. Silence, patience, approval can be golden.

This long rant not to dive on negative work, but to recommend a metric.

Most coders and dev-ops spend a majority of their time not coding, but correcting bugs. It’s tedious, and can get at their psyche. Re-focusing their work on positive outcomes is necessary. One way to do that is by lines of code written; a good idea, but great code should often be sparse. Another way, that makes more sense to me, is by number of interaction processed without a hitch. There is an eery threat in a “Velociraptor free environment for [•] days” poster, but it does make more sense than counting failures. The proper metric is probably to be adapted to the office dynamic.

Similarly, doctors have patients die on them, and a never ending stream of people suffering coming in. That’s not good, or uplifting except in the rare case of their intervention has a visible impact — which is, as I pointed out, more rare to them than it could seem from a patients’ point of view. What doctors could use are different metric than How many died on you today? How many patients are smiling today among all those that came in a week ago, maybe? New treatments are a good guess too, even if also quite rare or often niche. How much of a difference a reassuring voice made, even if viruses are generally just pissed away? Miners should not be paid for the diamonds they find, that would be unfair and counter-productive. Doctors should not be motivated by miracles either.

Posted in Uncategorized | Tagged Medical, Metrics, Positive Feedback, Prevalence | Leave a comment

Two common mistakes

Posted on November 18, 2012 by Bertil

I came across two common mistakes daily, and I wanted to spell them out simply.

The first one is a classic reasoning mistake: If many of A are B, then many of B should be A. That is only true of both A and B are as common. That leads to magical thinking and cargo cult. Every instance of discrimination, denunciation, correction, condemnation, seems to come with that. I’m not sure why is it so obvious to me while so transparent to most around me, but I’m sure I’d rather have a way to call it.

The second one is an application of that—in the face of success. People ask: what do successful people or companies do or have done? or rather, because this is often subjective, because “successful people must know better about what they owe they success too.” Then let’s do that, let’s take to heart what successful people believe to be a proper role model, and be successful ourself. Yes, because such confessions are rarely about actions, but perceived values, the stories that successful people tell themselves, often to feel better. How many managers, shocked at the magnitude of taxes they have to pay, or rather how little influence they have over that while they are so used to control most things around them — how many argue they they succeed thanks to low taxation, or rather they would succeed better without so much, while neglecting the education of their employees, the unemployment benefit that allowed them flexibility to come and work for a risky company?

As a data scientist, one comes across those daily, yet I don’t have a proper name for those. “Post hoc ergo procter hoc” doesn’t really cover it; “Success fallacy” might, although it misses the many more cases when the bias isn’t normative. If you have a better idea for a name, I’ll be happy to credit you every time.

Posted in Uncategorized | Leave a comment

Why I left Quora

Posted on September 24, 2012 by Bertil

The last post on this blog was about Quora, a promissing community, so much so I had since left this blog unmonitored.
For almost two years, I was an active member. Last week, I came across a series of heinous and prejudiced answers — both signed and anonymous, something so despicable that it deserves no publicity here. When denouncing the harm those did, I was threatened with legal action and told by no less than three community administrators that accepting those silently was a rule on Quora, ironically called “Be nice”.
Under no condition will I continue to be associated with a website promoting that kind of prejudice; I severed all ties with Quora, with no hesitation.
As a consequence, I will resume this blog.

Posted in Uncategorized | Leave a comment

Financial bubbles

Posted on May 2, 2012 by Bertil

I wanted to write this article for a long time, but the dogfight between Dan Gillmor and Chris Dixon triggered it.

I spent my first years as a grad student wondering how bubbles were even possible: there were rumours about it in 2000, but agents were considered rationals — and there certainly were, after listening to so many classes on economics (financial maths, really) like I did. That summer, I worked as an intern for one of the most advanced portfolio managers at the time on ‘cushion-guaranteed funds’ and the reality stroke me: those funds had the most dangerous mix I coudl thing of: they worked based on an idea that was fairly technical, very simple to explain, and wrong. Alas, wrong for the reasons that seemed obvious to me but needed to think in many dimensions.
Two years later, I would finally have a proper class on economics (courtesy of Pr. Boyer & Orléan, notorious non-neoclassical figures, a crime at the time). Their course featured bubbles preeminently: their definition was simple, there can and will be bubbles if

some economic agents know the value of something, and
others don’t (know) but still buy it.

The later ones aren’t necessarily stupid; it could be because figuring out the value is expensive, and they make the bet that the first ones were honest. Note that there is no disparity in valuation yet: it’s simply at this stage, about ownership and knowledge. However, it rarely lasts long; either mischievous behaviour from people who look like they know, or cargo cult from those who don’t, either could lead to absurd valuations of stocks that aren’t worth it.

Let me repeat that: bubbles go through three distinctive steps

new economic model (agile or scrum, SoLoMo, what have you) that isn’t trivial;
public acceptance of the (often legitimate) high valuations associated with those;
excessive extension of those valuations to other assets.

Note that in 2001 the most egregious of valuations, Google, was a fraction of what it proved to be worth: most ratios that investors would look at remained within what VCs deemed acceptable for certain companies. However, the management at Pets.com was not the same as Google’s and… Well, you get my point: a bubble starts when non-specialised investors get in the picture, ie. step 2. but it only does damages when step 3 is revealed.

So, back to the Gillmor vs. Dixon — and more generally to Is there a tech bubble? and Should we trust well-known investors about it?
Are investors who know little about tech considering to pour large amounts on, say, Facebook? Well, it’s hard to call Yuri Milner or Goldman-Sachs uninformed, but the IPO should offer a resounding Yes. It doesn’t mean that Facebook valuation is above what it should be: the company will grow and have a major economic impact (that may or may not translate into profits and longevity). Therefore, there isn’t an over-estimation yet; I would actually advise anyone who can spare the cash to buy a share in Facebook, if only to make history. However everything is in place for a bubble: weird valuation formulas and a public hungry to put their first savings from the recovery somewhere shiny.

Well, there isn’t an over-estimation, if you conveniently exclude Groupon. And maybe Zynga. But no one mentions those. And Christ Dixon says there isn’t a bubble… So what gives? Because Chris doesn’t know anyone who ignores proper valuation rules: if he knows them, they know him and therefore, they can get proper investment advice from the horse’s mouth — and knowing Chris, he doesn’t save his enthusiasm and effort to preach for the right companies to invest in. Therefore, no, neither him nor most pundits have seen any significant over-estimations, the last stage of a bubble: not in private because they talked those away long before they happened, and not in public either, because we are not there yet, and we won’t be for a while: many valuable and deserving companies will need money before the made-up ones appear and raise the roof. And those fakesters won’t have access to the inner circle. Once again: look at Groupon’s current investors’ list.

Is Chris, or any well-known Angel, sincere when they deny the bubble? Yes, more than you would know, therefore calling him biased and suggesting that this is because he is financially invested is probably in poor taste: among many reasons, including that

if he sold out (which he won’t) he would have more money than he can spend, so why try to have more?; and
he won’t sell because his money is to invest; suggesting he did so unwisely is actually fairly insulting to his identity as an entrepreneur and an Angel. It’s his life and legacy, and that bias is much stronger. And he should be praised for it, not called on it.

He is smart, deeply invested in the Valley business so he won’t see anything wrong until it’s too late. Don’t ask him about bubble, but do ask him about good companies to invest in, and spread that word, preferably with detailed explanations, and financial targets to avoid excessive spin.

Who should you ask, if you are a journalist covering a possible bubble?
Anyone with a little money on the side, who doesn’t know much about tech. Ask if they are considering to buy some Facebook stock, or Groupon, or maybe if they’d be interested in Twitter if it ever opens its stock — and ask about share structure, possible strategic options to increase revenue or profit, the evolution of the cost structure, value creation and retention, competitive advantages, response to threats. Compare that to answers from people who consider investing in more common stocks, like car manufacturers, insurance, distribution or real estate. You may have reassuring responses: Facebook and twitter are extensively covered; you may have scary one: those business are very unusual and presumably obey natural contestable monopolies that make them worthwhile but riskier investments than most would think.
The best coverage might actually not be the usual interview summary, but a multiple-choice questionnaire: test a reader’s knowledge of the industry, and, to someone who failed, explain that those were only a few among the many questions that they need to figure out before putting their money anywhere. Make sure more non-specialised financial analysts would fail those, because surprisingly enough after fifteen years since Netscape’s first floated, tech is still an odd sector with spectacular value discrepancies.

I’ve heard of this a new way to assess possible new hires: don’t ask for recommendations, but write to wonder if their are exceptionnal. Unless you get a “He’s looking for a job?!” don’t bother. Maybe suggest to potential investors to extend to buying stock using the same filter: ask if the company is exceptional, and don’t bother if you don’t get an over-enthusiastic response. You might get punked (just like you might hire a complete tool away from someone who lied) but knowing the hacker’s ethic ruling in the Silicon Valley, I doubt it.

Posted in Uncategorized | 1 Comment

You do not need invites to see what Quora is about

Posted on January 8, 2011 by Bertil

Ce billet est disponible en français.

For days, I saw requests for invitations to Quora everywhere. I was surprised because, after sending links to friends and seen bloggers include more links (me included), I was sure that the site was entirely public and even crawled by search engine spider (except the signatures and comments, but that’s another story).

Just another twit made me doubt. So I took the time to disconnect.

Ok, now I see the problem.

The landing page is not very welcoming, and it probably explains the backlash against the perceived elitism. Believe me, it is purely superficial: the site is very easy to navigate without any invitation. Go for example on page one of the founders. Want to see what they say about vegetarian restaurants in Paris, or what your friend tells Thomas? Use the search bar, framed in black, top of the screen: it finds contributors, questions, topics. It does this even before you finish typing, like Google Instant Search.

The only page on which the bar does not appear, it is the landing page. Why? It is a common arbitration on ergonomics, which had not anticipated the wave of media attention. One should unburden the screens for new comers, to guide them and reassure them. I say this every day to my clients, and it is one of the simplest advice, the most effective and most neglected. But here it is: one query bar to search/navigate/ask questions, that’s a lot. It is disturbing, complicated and very visible — so it was removed from the landing page as prohibiting only in appearance-a-while to explore each Quora like Wikipedia.

We should change that; I took the lead and asked for you, but you are better placed than me to answer: How should we rethink this page? Content portal, video presentation, simple example, having a regular cute, geographically close? Tell me.

And it had not been a problem so far? No, because until the end of December, all newcomers were invited by a friend who had a particular page to offer them (his own, or asking a question their expertise for the most part). This is the first time I see this screen.

But naturally, without registration, that is to say, without invitation, you can not contribute. It is quite unfortunate, but (as you understand it now) most active members would prefer that you spend time exploring the site before writing, just like Wikipedia. You’ll probably prefer to browse first with the ergonomics for Non-registered readers, which is much simpler. The full version more complex than an aircraft cockpit as I realised trying to explain how it worked before yesterday.

When you have located a page that deserves your attention, ask for an invitation to interested parties via his or her Facebook page, Twitter or blog (click on the name, the three links are the little icons to the right of the avatar). This will give your the opportunity to make friends and discuss your contribution with the right person. Seeing you so are so civil, I cannot doubt that he or she will certainly be happy to offer virtual tour of the place. If it were me, you’d probably even the right to a tea (or beer) — actually, if you are in Paris, you would.

This visit-first-talk-later, plus the new invention of Charlie (a permit to ask questions for newcomers) is a bit condescending to me you say. Honestly, you find it unpleasant to go to a site where a stranger is here to help, and without 1337-speak, trolls or grievers?

Update : to share internally about the issue, I wrote a post on the topic

Posted in Uncategorized | 2 Comments

See you on Quora

Posted on September 16, 2010 by Bertil

Since last post, I got used to the idea that I am an independent consultant (I already had a company); to make it happen, I’ve met many web entrepreneurs who asked what I would do. Truth be told, my answer depends a lot on how much you know about the web:

for newcomers (and I might have found a continuous stream of those) I can expose the principles of making money on-line (funnel, community structure) and help you decide what would be a good starting strategy;
if you already have some activity, but face some decisions (stalling community, investment) I can offer data-based strategic moves — I’m actually surprised at how few companies sell non-generic data analysis (among what I’ve seen, East Agile might be the only one);
if you are an experienced social media player, I can audit your control panel, process, threats assessment and explain the principle behind the latest trend in SNA.

The distinction in three is a little bit artificial: I’m assuming, as an independent, my main asset is to be flexible and tailor my service.
Someone also suggested me to teach social media principles to advertising agencies, or to make user-friendly video presentation of the latest discovery in Web Science — more excellent ideas, if you ask me. Anyway, prospects seem good, and I haven’t stopped finding ways to advertise my service: meet-ups, IQVine and, what appears to be the most promising expert community: Quora.

The project was started by Facebook early developers. It has a common ambition with Wikipedia or Google: be a repository of, and structure all knowledge — but mostly resembles Facebook. It’s blue, has an omnipresent News Feed and seemingly redundant Alerts, Friending (asymmetric, though) and all the fun happens with Likes and in the non-hierarchical comment threads. Your identity on Quora is actually based on your Facebook “real person” account; no institutions can speak as such, and you can’t separate facets of your life: your love for gourmet tea with an unsanitary interest in Open Web standards. (Both interests are well covered, as you would expect on any Silicon Valley seeded service — however, when I mentioned my concern to filter by interests, rather than merge quirky hobbies and tech standards, the founder was surprised.)

The readership on Quora is supposed to be huge, and the visible following orders of magnitude bigger than on this blog — obviously. In addition to that, instead of having to pick one of my drafts in here and polish it to release in the void of your comments, I can answer someone’s expressed concerns, compare my positions with other people responding to a similar angle; readers don’t hesitate to thank me, edit or vote up my rants… Therefore, it seems like the necessary thing to do to post there what I could not motivate myself to publish here. I’ll probably keep on blogging, and will certainly promote my proudest answers here, but I’d recommend you check my profile, and the whole website actually, because it is great.

Many prospects have asked a sample of what I can do, and I was thinking about describing elementary analysis: virality and SNA, adoption threshold, centrality and activity correlation, including sample code. I’m not sure about the proper language to use (R, awk, SQL, pseudo-code) but this represents more than one post. Several open questions on Quora are close to that too, so I’ll probably cross-post, a summary there and sketches here.

I’ve also accepted to teach Digital Economics to CS Masters; more work, good complementarity with what I want to do; yet again, sleepless nights in perspective, but the promise of interesting posts. I haven’t decided what kind of interactions I want to have with my students, but a blog is likely so far.

Posted in Uncategorized | 1 Comment

Two Croissants

[Draft] How Prism could be actually a good option

On troll and coffee

Three lines of code to avoid being creepy

WTF stupid questions

What doctors do

Two common mistakes

Why I left Quora

Financial bubbles

You do not need invites to see what Quora is about

See you on Quora

Archives

Meta