VSnake notes: Crunch

2014-01-23

Crunch

Crunch — грызть с хрустом.

The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines.

Running on top of Hadoop MapReduce, the Apache Crunch™ library is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.

http://crunch.apache.org/

Прототип проекта (очень хорошо документированный), демонстрирующий параллельное выполнение ГИС-задачи «сколько точек попало в полигон»:

Here is a proof-of-concept project that spatially enables a crunch pipeline with a Point-In-Polygon function from a very large set of static point data with a small set of dynamic polygons.

Crunch has simplified so much so the process, that is came down to a one line syntax:

final PTable<Long, Long> counts = pipeline.
        readTextFile(args[0]).
        parallelDo(new PointInPolygon(), Writables.longs()).
        count();

The spatial operation is performed using the Esri Geomerty API for Java. The result of the spatial join is a count of points per polygon.

И далее https://github.com/mraad/SpatialCrunch

http://thunderheadxpler.blogspot.ru/2013/05/creating-spatial-crunch-pipelines.html

Мансур продолжает нас радовать экспериментами по обработке BigData, причем это не просто данные, а пространственные, что для ГИС гиков — бальзам на сердце.

original post http://vasnake.blogspot.com/2014/01/crunch.html

Tools

VSnake notes

2014-01-23

Crunch

Комментариев нет:

Отправить комментарий

Архив блога

Ярлыки

Обо мне

Links