VSnake notes: Whole thing in memory

2013-11-22

Whole thing in memory

Since the traditional means of retrieving the data from a relational database is not fast enough and 11 million records is not such a big set after all, I decided to put the whole thing in memory

Mansour Raad

Я просто тащусь от того, как этот парень решает задачки. Профи в BigData.

Вот еще пример.

Дано: 11 миллионов записей неких данных, в виде CSV файла.

Надо: сделать веб-карту на которой пользователи могут видеть плотность распределения этих данных, в зависимости от заданных параметров.

Решение: засунуть все данные в BigMemory, заодно написав параллельный загрузчик; написать веб REST интерфейс к этим данным, реализующий ArcGIS Rest JSON interface for an ArcGISDynamicLayer; написать рендерер, рисующий растровую картинку по отобранным данным. За кадром осталось решение выборки и агрегации (нужна же плотность?) данных.

Ну и развернуть приложение: на Амазоне поднять три инстанса, на двух — хранилище данных, на одном веб-сервис (и, видимо, веб-карта).

И все это писано на Java, исходники лежат на Github.

Since the traditional means of retrieving the data from a relational database is not fast enough and 11 million records is not such a big set after all, I decided to put the whole thing in memory. BTW, this is a meme that has been trending for while now, and the most vocal about it is SAP HANA.
I decided to use Terracotta's BigMemory to hold the data in the "off-heap" and use its EHCache query capability to aggregate and fetch the data. Now despite the name, I used the cache "eternal" capabilities to forever hold the data elements.

http://thunderheadxpler.blogspot.ru/2013/04/bigdata-terracotta-bigmemory-and-arcgis.html

original post http://vasnake.blogspot.com/2013/11/whole-thing-in-memory.html

Tools

VSnake notes

2013-11-22

Whole thing in memory

Комментариев нет:

Отправить комментарий

Архив блога

Ярлыки

Обо мне

Links