> If we have data, let’s look at *data*. If all we have are *opinions*, let’s go with *mine*.
— [James L. Barksdale](https://en.wikipedia.org/wiki/James_L._Barksdale)
# D·A·T·A
## for 🙋 and 🤖
November 9th 2016 — [Media@LSE][] {.footer}
[Media@LSE]: http://www.lse.ac.uk/media@lse/
## Bonjour,
I am **Thomas**
[thom4.net](https://thom4.net) – [@thom4parisot](https://twitter.com/thom4parisot) {.footer}
![Pardon my French](../../2015/images/pardon-my-french.jpg)
## BBC R&D
![Full Stack JavaScript](../../2015/images/javascript.png)
![Sud Web](../../2015/images/sudweb.png)
# Physical data
[DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/)
[DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/)
# Eureka machine
> Military camps foretell many battles abroad
\_ ͜ ͜ | \_ ͜ ͜ | \_ \_ | \_ \_ | \_ ͜ ͜ | \_ ͜
Latin hexameter verse encoding
> If we had it running continuously, it would take 74 years for it to do its full tour before it started repeating itself.
# Analog data
# Radio, TV, CBs, etc.
# Signals and frequencies
# Towards Digital Systems
# Pre-digital content = _black box_
# _What_ is it?
# What does it _contain_?
# Can we _find it_?
# We need _data around_ data
# We need _metadata_
# Speech to text
# Speaker recognition
# Digital Data
# Data Recommendations
> _Information filtering_ system that seek to predict the rating/preference that a user would give to an item.
– [Wikipedia](https://en.wikipedia.org/wiki/Recommender_system)
> This month, these ads stopped appearing on Slate.
> Among the reasons: The links can lead to questionable websites, run by unknown entities. Sometimes the information they present is false.
— [Sapna M. and John H.]( http://www.nytimes.com/2016/10/31/business/media/publishers-rethink-outbrain-taboola-ads.html) (from New York Times)
# Editorial validation
# Curated feeds, etc.
Aka a brain is used to refine a list built by machines.
# Editorial Algorithms
It is the project which sustain the previous news recommendations example
> This project looks at ways to automatically _extract editorial metadata_ (such as tone, language and topics) about _web content_, making it easier to find the right content for the _right audience_.
Mention the data on tap
# Editorial datapoints
# To _connect_
Content which are not aware of each other with unified vocabulary.
# To _learning_ first
What's out there, magazine makers, what is a content, etc.
# _Manual_ recommendations
Learning from users, exploring different themes, refining on the go. Good for one to many, you know what's out there.
# Fully _automatic_
Again, learning from users. Good for many to many.
# _Semi-automatic_
Gather then make an editor decide. You know what's out there.
# Wrap up 🙌
# It's been a while
We collect data since we were capable of engraving stones.
# Big means _nothing_
Quality, filters, freshness, confidence?
# Data _creates_ data
Interaction metrics, trafic analytics, etc.
# _Mistakes_
In the measurements, in the way of looking at the problem, data themselves.
# Possible _bias_
Point of view of creators, classification systems.
# _You_ decide