> If we have data, let’s look at *data*. If all we have are *opinions*, let’s go with *mine*.
— [James L. Barksdale](https://en.wikipedia.org/wiki/James_L._Barksdale)
# D·A·T·A
## for 🙋 and 🤖
November 9th 2016 — [Media@LSE][] {.footer}
[Media@LSE]: http://www.lse.ac.uk/media@lse/
## Bonjour,
I am **Thomas**
[thom4.net](https://thom4.net) – [@thom4parisot](https://twitter.com/thom4parisot) {.footer}
@@@
![Pardon my French](../../2015/images/pardon-my-french.jpg)
@@@
## BBC R&D
[bbc.co.uk/rd](http://bbc.co.uk/rd)
@@@
![Full Stack JavaScript](../../2015/images/javascript.png)
[thom4.net/node.js](https://thom4.net/node.js)
@@@
![Sud Web](../../2015/images/sudweb.png)
[sudweb.fr](http://sudweb.fr)
# Physical data
@@@
~~~~
[DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/)
@@@
~~~~
[DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/)
@@@
@@@
@@@
# Eureka machine
@@@
@@@
> Military camps foretell many battles abroad
@@@
\_ ͜ ͜ | \_ ͜ ͜ | \_ \_ | \_ \_ | \_ ͜ ͜ | \_ ͜
~~~~
Latin hexameter verse encoding
@@@
> If we had it running continuously, it would take 74 years for it to do its full tour before it started repeating itself.
# Analog data
@@@
# Radio, TV, CBs, etc.
@@@
# Signals and frequencies
@@@
~~~~
https://upload.wikimedia.org/wikipedia/commons/6/6d/Sine_waves_different_frequencies.svg
@@@
![](images/radio-signal.gif)
@@@
![](images/pal-frame-signal.png)
@@@
~~~~
https://ancientelectronics.wordpress.com/2013/01/23/choosing-the-right-tv-for-classic-gaming-in-the-usa/
@@@
![](images/ntsc-signal.png)
~~~~
https://commons.wikimedia.org/wiki/File:Ntsc_channel.svg
# Towards Digital Systems
@@@
@@@
# Pre-digital content = _black box_
@@@
# _What_ is it?
@@@
# What does it _contain_?
@@@
# Can we _find it_?
@@@
# We need _data around_ data
@@@
# We need _metadata_
@@@
# Speech to text
@@@
@@@
# Speaker recognition
@@@
~~~~
http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP293.pdf
@@@
# Digital Data
@@@
@@@
@@@
@@@
@@@
@@@
# Data Recommendations
@@@
> _Information filtering_ system that seek to predict the rating/preference that a user would give to an item.
– [Wikipedia](https://en.wikipedia.org/wiki/Recommender_system)
@@@
![](images/moreover-links.png)
@@@
> This month, these ads stopped appearing on Slate.
> Among the reasons: The links can lead to questionable websites, run by unknown entities. Sometimes the information they present is false.
— [Sapna M. and John H.]( http://www.nytimes.com/2016/10/31/business/media/publishers-rethink-outbrain-taboola-ads.html) (from New York Times)
@@@
# Editorial validation
@@@
@@@
![](images/from-other-news-sites.png)
@@@
# Curated feeds, etc.
Aka a brain is used to refine a list built by machines.
# Editorial Algorithms
@@@
~~~~
It is the project which sustain the previous news recommendations example
@@@
> This project looks at ways to automatically _extract editorial metadata_ (such as tone, language and topics) about _web content_, making it easier to find the right content for the _right audience_.
~~~~
Mention the data on tap
@@@
@@@
# Editorial datapoints
@@@
@@@
# To _connect_
Content which are not aware of each other with unified vocabulary.
@@@
# To _learning_ first
What's out there, magazine makers, what is a content, etc.
@@@
# _Manual_ recommendations
Learning from users, exploring different themes, refining on the go. Good for one to many, you know what's out there.
@@@
# Fully _automatic_
Again, learning from users. Good for many to many.
@@@
# _Semi-automatic_
Gather then make an editor decide. You know what's out there.
# Wrap up 🙌
@@@
# It's been a while
We collect data since we were capable of engraving stones.
@@@
# Big means _nothing_
Quality, filters, freshness, confidence?
@@@
# Data _creates_ data
Interaction metrics, trafic analytics, etc.
@@@
# _Mistakes_
In the measurements, in the way of looking at the problem, data themselves.
@@@
# Possible _bias_
Point of view of creators, classification systems.
@@@
# _You_ decide
@@@