Beer Knowledge Friday – what did we learn?

8th May 2013

Last month saw 3Squared’s monthly ‘beer knowledge Friday’, which gives employees the chance to share knowledge of topics they have a particular interest in or have discovered something new about, in a relaxed and informal environment.

This month we had two volunteers, starting with our Managing Director, Tim Jones, who wanted to talk about ‘Big Data’ and the role of Hadoop.  Here Tim explains what it is:

When we talk about big data, we are talking about the Petabytes (1,024 Terabytes…1,000,000,000 Megabytes!) of data collected and stored by big business.  Facebook alone generates 20 Terabytes per day from social interactions!

With the data being so large, existing data storage and querying techniques (MYSQL and PostgreSQL for example) don’t scale well and are not suited to manipulating huge data sets.  Hadoop is a framework designed to work with big data and overcome the problems older database systems have.  It is an open-source project written in Java by a global community of contributors.

Commercial implementations of the framework have been developed by IBM, Microsoft and (Sheffield based, friend of 3Squared) WANdisco, who recently announced an 100% increase in annual turnover.

Physical storage of data can be spread out over a “cluster” of machines that can be anywhere in the world.  The Hadoop framework allows all the data (wherever it is physically stored) to be queried like a conventional database.

Efficient data queries are made possible by Hadoop’s implementation of the “MapReduce” programming model, which was developed by Google to process the large data sets they collected through indexing the web.  MapReduce allows the computational work necessary to perform database queries to be split into chunks and distributed throughout the cluster of machines. By spreading the work out to “nodes” in the cluster, throughput (speed) can be greatly increased.

Data redundancy is implemented in the Hadoop Distributed File System (HDFS) which is used to store data in clusters of machines. If a machine goes down, then the same data is still available somewhere else.

Exciting times ahead for technology companies, especially those in South Yorkshire if the ‘Cloud City’ data storage project comes to fruition in Sheffield.

 

Next up was ‘New’ Richard Lander, explaining about Pluralsight, the tutorial website for developers.

Richard had used the website in his previous job to build on his knowledge of a huge range of technologies.  The videos on the website also have a range of complexity from beginner to advanced.

Although the courses would be most suited to our .NET team, there is a range of courses available on the site for most of the technologies we use here at 3Squared.  So far Rich has used a number of courses including LINQ Data Access, LINQ Fundamentals, LINQ – Beyond Queries, Unit Testing with MSTest, Mocking with Moq, Design patterns Library and Agile Estimation.

The site has many features including offline mobile viewing capabilities, assessments and before and after code solutions.  I would recommend the site to anyone looking to brush up on their knowledge or learn skills in a new area.

 

As always, it was a great evening of learning in a relaxed environment.  Maybe you could try a beer knowledge Friday at your workplace?