Workshopok - Adattárház Fórum 2014

Stephen Brobst

Chief Technology Officer, Teradata

Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on high-performance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. Stephen has been on the faculty of The Data Warehousing Institute since 1996. During Barack Obama’s first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD).

Agile data warehousing architecture

This half day workshop examines the trends in analytic technologies, methodologies, and use cases. The implications of these developments for deployment of analytic capabilities using agile methodologies will be discussed with examples of future architecture directions and implementation. You will learn about:

Interactive analytics using in-memory, columnar, and other emerging database technologies
Analytics in the cloud
Agile analytics deployment methodologies with integrated sandboxes
Leveraging open source technologies such as Hadoop, R, PIG, Mahout, and other new generation opportunities
NoSQL and NoETL analytic architectures

Izsák Tamás

Adatbázis szakértő és vezető fejlesztő, AppWorks

Az APPWORKS ügyvezetője, aki több mint 10 éves adatbázis szakértőként és vezető fejlesztőként szerzett tapasztalattal rendelkezik, a relációs- és NoSQL adatbázisok területén (Oracle Database, MongoDB). Az APPWORKS Magyarországon elsőként szerezte meg a MongoDB Ready Partner minősítést.

Bevezetés a MongoDB használatába

A félnapos workshop a vezető NoSQL adatbázis, a MongoDB tulajdonságait, architektúráját és alkalmazási területeit mutatja be. Foglalkozik az adatmodellezés, a lekérdezések és az adatmódosító műveletek, valamint a magas rendelkezésre állás és a skálázhatóság alapjaival. A MongoDB a vezető NoSQL adatbázis, amely lehetővé teszi a vállalatok számára, hogy még agilisabbak legyenek és még hatékonyabban növekedjenek. A Fortune 500 vállalatai és a startup cégek egyaránt használják, hogy segítségével új típusú alkalmazásokat hozzanak létre, javítsák az ügyfél-élményt, lerövidítsék a piacra lépéshez szükséges időt és csökkentsék költségeiket.
A MongoDB egy agilis adatbázis, mely lehetővé teszi, hogy a sémák olyan gyorsan változzanak, ahogy az alkalmazások fejlődnek, miközben továbbra is biztosítja azt a funkcionalitást melyet a fejlesztők a hagyományos adatbázisoktól várnak, mint például a másodlagos indexek, a teljes lekérdezési nyelv és a szigorú konzisztencia.
A MongoDB legkiemelkedőbb előnyei a skálázhatóság, a teljesítmény és a magas rendelkezésre állás. Legyen az akár egyetlen szerveren vagy akár nagy, komplex, több-telephelyes architektúrán kiépítve. Kihasználva az In-Memory computing előnyeit, a MongoDB egyaránt nagy teljesítményt biztosít az olvasás és az írás területén. A natív replikáció és az automatikus feladatátvétel (failover) pedig biztosítja a vállalati szintű megbízhatóságot és működési rugalmasságot.
Tematika:

Ismerkedés a MongoDB-vel

Alapvető fogalmak
Telepítés
JSON / BSON
MongoDB Shell

Első adatbázis műveletek

Alapvető fogalmak (documents, collections)
CRUD (Create, Read, Update, Delete)
Aggregation framework alapfogalmak
Indexelés alapfogalmak

Magas rendelkezésre állás (Replication)

Koncepció
Tervezés

Terjeszkedjünk horizontálisan (Sharding)

Koncepció
Tervezés

Alex Dean

Co-founder, Snowplow Analytics Ltd

Alex Dean is the co-founder and technical lead at Snowplow Analytics. Snowplow is a web and event analytics platform with a difference: rather than tell our users how they should analyze their data, we deliver their event-level data in their own data warehouse, on their own Amazon Redshift or Postgres database, so they can analyze it any way they choose.
At Snowplow Alex is responsible for Snowplow’s technical architecture, stewarding the open source community and evaluating new technologies such as Amazon Kinesis. Prior to Snowplow, Alex was a partner at technology consultancy Keplar, where the idea for Snowplow was conceived. Before Keplar Alex was a Senior Engineering Manager at OpenX, the open source ad technology company.
Alex lives in London, UK.

From zero to Hadoop - running your first Hadoop jobs on Elastic MapReduce

This Workshop is an interactive tutorial to get attendees running their first MapReduce job on Hadoop (using Elastic MapReduce and Scalding).
Hadoop is everywhere these days, but it can seem like a complex, intimidating ecosystem to those who have yet to jump in. In this hands-on workshop, Alex Dean, co-founder of Snowplow Analytics, will take you "from zero to Hadoop", showing you how to run a variety of simple (but powerful) Hadoop jobs on Elastic MapReduce, Amazon's hosted Hadoop service. Alex will start with a no-nonsense overview of what Hadoop is, explaining its strengths and weaknesses and why it's such a powerful platform for data warehouse practitioners. Then Alex will help get you setup with EMR and Amazon S3, before leading you through a very simple job in Pig, a simple language for writing Hadoop jobs. After this we will move onto writing a more advanced job in Scalding, Twitter's Scala API for writing Hadoop jobs. For our final job, we will consolidate everything we have learnt by building a multi-step job flexing Pig, Scalding and Apache HBase, the Hadoop database. Agenda:

Introducing Hadoop
Our simple job:

Setting up EMR, S3 and our local client tools
Writing our Pig Latin script
Running and inspecting our results

Scalding:

Introduction to Scalding and Cascading
Writing our Scalding app
Running and inspecting our results

Putting it all together:

Introduction to HBase
Writing our second Pig Latin script
Updating our Scalding app
Running and inspecting our results

Conclusions & next steps

Please bring

Required: laptop
Required: AWS account
Nice to have: either Scala and SBT pre-installed on laptop, or Vagrant and Virtualbox

Gollnhofer Gábor

Adattárház üzletágigazgató, Jet-Sol

Hatékony ETL folyamatok kialakítása

A félnapos workshop az adattárház fejlesztési feladatainak legnagyobb részét kitévő adatintegrációs és ETL kérdésekkel foglalkozik, kitérve a legfontosabb tervezési, fejlesztési és üzemeltetési témakörökre.
A workshop során érintjük az ETL architektúrákat, a különböző ETL eszközöket, mappingek tervezését/kezelését, némi adatmodellezést, metaadat kezelést és DW automatizálást. Menetközben bemutatok egy-két használható eszközt, megoldást is.
A résztvevőket biztatom, hogy saját kérdéseket, problémákat is hozzanak, amiket a workshop során megpróbálunk közösen megvitatni és akár megoldani is.

MTA SZTAKI

Introduction to Stratosphere

This special workshop and hackaton, co-organized with MTA SZTAKI, will run for the whole day (Part I in the morning, Part II in the afternoon). Stratosphere (stratosphere.eu) is a European Big Data Analytics Platform freshly accepted to the Apache Incubator. It combines the strengths of MapReduce / Hadoop extended with powerful programming abstractions in Java and Scala and a high performance execution engine. Stratosphere supports iterations, incremental iterations and arbitrary large DAGs of operations natively. This hands-on workshop demonstrates Stratosphere’s ability to provide a flexible big data platform which is easy to install and use and can scale out to the cloud to support data scientists. The example applications presented are mainly from the field of recommender systems. The afternoon session focuses on Stratosphere’s brand new stream processing support.

Stratosphere Workshop and hackaton - Stratopshere basics

Agenda:

Intro to Stratosphere and PACT 30 min
Intro to Alternated Least Squares (ALS) and provision of a Java skeleton 30 min
Coding different ALS implementations using the skeleton 90+ min
Testing performance of the implementations against each other 30- min
Bonus exercise: comparison with Mahouts Custom ALS implementation

MTA SZTAKI

Streaming principles with Stratosphere

Stratosphere Workshop and hackaton - Stratopshere Streaming

Agenda:

Intro to Stratosphere Streaming 20 min
Intro to ALS prediction 10 min
Coding and testing ALS prediction 60 min
Intro to ALS incremental and provision of a skeleton 30 min
Coding and testing ALS incremental 60 min
Bonus exercise: Implementing both ALS extension in the same streaming job