Записки программиста, обо всем и ни о чем. Но, наверное, больше профессионального.

2015-06-17

Об ML

Взрывной рост IT в последние десятилетия принес в нашу жизнь много разных чудес. Мы их не замечаем, поскольку успеваем привыкнуть к ним за то время, что требуется индустрии на создание масспоп продукта. Но чудеса от этого не становятся менее чудесатыми.
А дальше – больше. Количество непременно перерастает в качество. Никакой фантазии не хватает на то, чтобы представить, как изменится обстановка вокруг нас лет через 50. Если не будет войны, конечно.

Возьмем, к примеру, обработку данных и аналитику. Казалось бы, ерунда какая, сто лет в обед. Но количество принесло новое качество. Шустрые каналы связи, огромные вычислительные мощности, океан уже накопленных и продолжающих поступать с ускорением данных – получаем онлайн аналитику и разнообразные предсказатели. Data Mining & Machine Learning вышли из лабораторий и идут в народ.

Любой может создать и обучить электронного болвана помощника, подсказывающего, какой едой можно затариться. Пользоваться им и того легче – просто сфоткай интересующий продукт на мобилку:

Bringing Deep Learning to the Grocery Store
...
when we go to the grocery store, it can be difficult to really know exactly what we're purchasing and where it comes from.

Inspired by this problem, a few of us decided to build an application that provides information on a packaged food product based on an image taken with a smartphone. In a future blog post, we will share what we built and how we built it. In this notebook, however, we delve deeper into the actual implementation.

This notebook is divided into 5 main parts:

Data Acquisition - Downloading the data and deduplicating it with the deduplication toolkit.
Finding Similar Foods - Pre-computation which identifies similar foods within the datset. This is useful in similar item recommendations
Image Feature Extraction - Finding a vector representation of images in the dataset using a Deep Learning model
Building the Nearest Neighbor Model/Querying the catalog - Building a model with which you can match a new photo to one in the dataset.
Building a Predictive Service - Turning all our hard work into a hosted service, which later is queried by our phone app!
...



А что там под капотом? Нивапрос, народ интересуется, мы отвечаем:

Top 10 data mining algorithms in plain English
Today, I’m going to explain in plain English the top 10 most influential data mining algorithms
1. C4.5
2. k-means
3. Support vector machines
4. Apriori
5. EM
6. PageRank
7. AdaBoost
8. kNN
9. Naive Bayes
10. CART


И даже можно подробно разжевать – как построить свою собственную нейросеть, с блекджеком и шлюхами, на Python:

How to implement a neural network
These tutorials focus on the implementation and the mathematical background behind the implementations. Most of the time, we will first derive the formula and then implement it in Python.

The tutorials are generated from IPython Notebook files, which will be linked to at the end of each chapter so that you can adapt and run the examples yourself. The neural networks themselves are implemented using the Python NumPy library which offers efficient implementations of linear algebra functions such as vector and matrix multiplications


Поглубже копнуть специфические свойства рекуррентных нейросетей:

The Unreasonable Effectiveness of Recurrent Neural Networks
There's something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I've in fact reached the opposite conclusion). Fast forward about a year: I'm training RNNs all the time and I've witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you.

We'll train RNNs to generate text character by character and ponder the question "how is that even possible?"
...


Ну и всяких прочих источников вдохновения:

Materials for Learning Machine Learning






original post http://vasnake.blogspot.com/2015/06/ml.html

Комментариев нет:

Отправить комментарий

Архив блога

Ярлыки

linux (241) python (191) citation (186) web-develop (170) gov.ru (159) video (124) бытовуха (115) sysadm (100) GIS (97) Zope(Plone) (88) бурчалки (84) Book (83) programming (82) грабли (77) Fun (76) development (73) windsurfing (72) Microsoft (64) hiload (62) internet provider (57) opensource (57) security (57) опыт (55) movie (52) Wisdom (51) ML (47) driving (45) hardware (45) language (45) money (42) JS (41) curse (40) bigdata (39) DBMS (38) ArcGIS (34) history (31) PDA (30) howto (30) holyday (29) Google (27) Oracle (27) tourism (27) virtbox (27) health (26) vacation (24) AI (23) Autodesk (23) SQL (23) humor (23) Java (22) knowledge (22) translate (20) CSS (19) cheatsheet (19) hack (19) Apache (16) Klaipeda (15) Manager (15) web-browser (15) Никонов (15) functional programming (14) happiness (14) music (14) todo (14) PHP (13) course (13) scala (13) weapon (13) HTTP. Apache (12) SSH (12) frameworks (12) hero (12) im (12) settings (12) HTML (11) SciTE (11) USA (11) crypto (11) game (11) map (11) HTTPD (9) ODF (9) Photo (9) купи/продай (9) benchmark (8) documentation (8) 3D (7) CS (7) DNS (7) NoSQL (7) cloud (7) django (7) gun (7) matroska (7) telephony (7) Microsoft Office (6) VCS (6) bluetooth (6) pidgin (6) proxy (6) Donald Knuth (5) ETL (5) NVIDIA (5) Palanga (5) REST (5) bash (5) flash (5) keyboard (5) price (5) samba (5) CGI (4) LISP (4) RoR (4) cache (4) car (4) display (4) holywar (4) nginx (4) pistol (4) spark (4) xml (4) Лебедев (4) IDE (3) IE8 (3) J2EE (3) NTFS (3) RDP (3) holiday (3) mount (3) Гоблин (3) кухня (3) урюк (3) AMQP (2) ERP (2) IE7 (2) NAS (2) Naudoc (2) PDF (2) address (2) air (2) british (2) coffee (2) fitness (2) font (2) ftp (2) fuckup (2) messaging (2) notify (2) sharepoint (2) ssl/tls (2) stardict (2) tests (2) tunnel (2) udev (2) APT (1) Baltic (1) CRUD (1) Canyonlands (1) Cyprus (1) DVDShrink (1) Jabber (1) K9Copy (1) Matlab (1) Portugal (1) VBA (1) WD My Book (1) autoit (1) bike (1) cannabis (1) chat (1) concurrent (1) dbf (1) ext4 (1) idioten (1) join (1) krusader (1) license (1) life (1) migration (1) mindmap (1) navitel (1) pneumatic weapon (1) quiz (1) regexp (1) robot (1) science (1) seaside (1) serialization (1) shore (1) spatial (1) tie (1) vim (1) Науру (1) крысы (1) налоги (1) пианино (1)