> If we have data, let’s look at *data*. If all we have are *opinions*, let’s go with *mine*. — [James L. Barksdale](https://en.wikipedia.org/wiki/James_L._Barksdale)
@@@ @@@ @@@ @@@
# D·A·T·A ## for 🙋 and 🤖 November 9th 2016 — [Media@LSE][] {.footer} [Media@LSE]: http://www.lse.ac.uk/media@lse/
## Bonjour,
I am **Thomas** [thom4.net](https://thom4.net) – [@thom4parisot](https://twitter.com/thom4parisot) {.footer} @@@ ![Pardon my French](../../2015/images/pardon-my-french.jpg) @@@ ## BBC R&D [bbc.co.uk/rd](http://bbc.co.uk/rd) @@@ ![Full Stack JavaScript](../../2015/images/javascript.png) [thom4.net/node.js](https://thom4.net/node.js) @@@ ![Sud Web](../../2015/images/sudweb.png) [sudweb.fr](http://sudweb.fr)
# Physical data @@@ ~~~~ [DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/) @@@ ~~~~ [DIY Music Box](https://christiehubbard.wordpress.com/tag/diy-music-box/) @@@ @@@ @@@ # Eureka machine @@@ @@@ > Military camps foretell many battles abroad @@@ \_ ͜   ͜   | \_ ͜       ͜   | \_   \_ | \_   \_ | \_   ͜   ͜   | \_ ͜  ~~~~ Latin hexameter verse encoding @@@ > If we had it running continuously, it would take 74 years for it to do its full tour before it started repeating itself.
# Analog data @@@ # Radio, TV, CBs, etc. @@@ # Signals and frequencies @@@ ~~~~ https://upload.wikimedia.org/wikipedia/commons/6/6d/Sine_waves_different_frequencies.svg @@@ ![](images/radio-signal.gif) @@@ ![](images/pal-frame-signal.png) @@@ ~~~~ https://ancientelectronics.wordpress.com/2013/01/23/choosing-the-right-tv-for-classic-gaming-in-the-usa/ @@@ ![](images/ntsc-signal.png) ~~~~ https://commons.wikimedia.org/wiki/File:Ntsc_channel.svg
# Towards Digital Systems @@@ @@@ # Pre-digital content = _black box_ @@@ # _What_ is it? @@@ # What does it _contain_? @@@ # Can we _find it_? @@@ # We need _data around_ data @@@ # We need _metadata_ @@@ # Speech to text @@@ @@@ # Speaker recognition @@@ ~~~~ http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP293.pdf @@@
# Digital Data @@@ @@@ @@@ @@@ @@@ @@@
# Data Recommendations @@@ > _Information filtering_ system that seek to predict the rating/preference that a user would give to an item. – [Wikipedia](https://en.wikipedia.org/wiki/Recommender_system) @@@ ![](images/moreover-links.png) @@@ > This month, these ads stopped appearing on Slate. > Among the reasons: The links can lead to questionable websites, run by unknown entities. Sometimes the information they present is false. — [Sapna M. and John H.]( http://www.nytimes.com/2016/10/31/business/media/publishers-rethink-outbrain-taboola-ads.html) (from New York Times) @@@ # Editorial validation @@@ @@@ ![](images/from-other-news-sites.png) @@@ # Curated feeds, etc. Aka a brain is used to refine a list built by machines.
# Editorial Algorithms @@@ ~~~~ It is the project which sustain the previous news recommendations example @@@ > This project looks at ways to automatically _extract editorial metadata_ (such as tone, language and topics) about _web content_, making it easier to find the right content for the _right audience_. ~~~~ Mention the data on tap @@@ @@@ # Editorial datapoints @@@ @@@ # To _connect_ Content which are not aware of each other with unified vocabulary. @@@ # To _learning_ first What's out there, magazine makers, what is a content, etc. @@@ # _Manual_ recommendations Learning from users, exploring different themes, refining on the go. Good for one to many, you know what's out there. @@@ # Fully _automatic_ Again, learning from users. Good for many to many. @@@ # _Semi-automatic_ Gather then make an editor decide. You know what's out there.
# Wrap up 🙌 @@@ # It's been a while We collect data since we were capable of engraving stones. @@@ # Big means _nothing_ Quality, filters, freshness, confidence? @@@ # Data _creates_ data Interaction metrics, trafic analytics, etc. @@@ # _Mistakes_ In the measurements, in the way of looking at the problem, data themselves. @@@ # Possible _bias_ Point of view of creators, classification systems. @@@ # _You_ decide @@@
# Thanks 👍 @@@