This week, we had the chance to enjoy a 2 days workshop with IBM’s leading data scientists. The whole experience had highlights and things to improve, however, it was quite interesting to reflect on how information (text or numerical) its improving the way we live and mainly do business.
IBM is definitely doing some interesting work when it comes to analyze data, detect consumer patterns and predict our behaviour. One of the most interesting examples we analyzed was the way [Trenitalia prizes its tickets] depending on several inputs. Personally, I have always been curious about the way airlines and other transportation means charge us. Absolutely interesting and useful for daily life. : ftp://ftp.software.ibm.com/software/it/pdf/crearevalore/08_Trenitalia.pdf
Personally, I have been playing around with data for quite a few years; first, using my all time favorite Google Analytics and then collecting digital inputs with Radian 6. Currently, Im exploring [Hadoop](http://www-01.ibm.com/software/data/infosphere/hadoop/) and the process of massive amounts of data to understand correlations and effects on businesses (or any other thing I can think of). This is a neverending world which gets more interesting every time. ![Hadoop](http://www.itcandor.com/wp-content/uploads/2013/02/hadoop-ecosystem.png)
I have also done a few experiments on myself related to my habits and repercussions on my daily life. As Im telling you, its just a matter of diving in and make the numbers talk.
One of the main topics IBM pushed during the lecture was the role of data scientists, a very exciting (and new) job position which analyzes and interprets major amounts of data for business purposes. However, their vision of the skills needed for the job differ of mine.
My sleeping habits tracked for the last 3 weeks. End of MBA term and upcoming trip to Asia, and it stills "flat". pic.twitter.com/QhzMMm0eiI— Cristián (@lavozdecristian) December 18, 2013
After empirical experience and a [few online courses on data](https://datasense.withgoogle.com/) interpreting I believe the background needed to fill this position its not restricted to mathematicians or statisticians, but people who are able to make the correct questions in order to find the correlations that will start digging the meaning out of the noise.
A person who is immersed on the business itself, who also has a clear vision on the WHY we do business and a clear diagnosis of the consumer problems will be able to squeeze the data way better than someone who is only running mathematical models.
After using -for example- the [Google Fusion Tables](tables.googlelabs.com/) for a while, I have been able to get familiar with the way data can be meaningfully extracted to answer a primary question, and believe me, my bachelor didnt had any statistics on its curricula.
In my opinion, Data Scientist are definitely extremely useful and strategic associates that can help companies gain competitive advantage in the market. However, regardless of their background, they need to be able to: - Come up with the right questions (primary and secondary). - Filter and make sense out of the data Communicate effectively (when the less is more comes into scene along data visualization).
When it comes to experts, I cannot forget my all time favorite, [Mr Avinash Kaushik](http://www.kaushik.net/avinash/) who has been evangelizing on data for the last years and sharing truly handy frameworks to measure, improve and select the data that we need (mostly I have use its work on e-commerce). The good thing about Avinash and his work is the fact that he can successfully translate complicated concept into very basic ideas that reach a broader audience. * Now I add [Mrs. Hilary Mason](http://www.hilarymason.com/), one of Bit.ly founders and another Data Scientist to look for when in need of info. You can look for book recommendations on this [Bit.ly bundle](http://bitly.com/bundles/hmason/h) made by her.
After the workshop, my interest on big data only moves further. Visualization and fast processing are two main areas in which I’m exploring. As I was saying before, there is no point in selecting data without sharing in a simple and focused way, in the same way, its quite important to use tools such as hadoop in order to digest fast teras of information.Smart-houses and cities, wearable technology and the famous Internet of things(in which house elements “talk” to each other) are a clear example on why and how we need to use data for our advantage [^1]. This is just the beginning.
Last but not least:
Want to start playing with data and don't know where to get it from? Voilà
[^1]: Link corrected by Jason Hope