How to become big data – data analyst

Anyone who works in the tech industry is aware of the rising demand of Analytics/ Machine learning professionals. More and more organisations have been jumping on to the data driven decision making bandwagon, thereby accumulating loads of data pertaining to their business. In order to make sense of all the data gathered, organisations will require Big Data Analysts to decipher the data.

  Data Analysts have traditionally worked with pre formatted data, that was served by the IT departments, to perform analysis. But with the need for real time or near-real time Analytics to serve end customers better and faster, analysis needs to be performed faster, thereby making the dependency on IT departments a bottleneck. Analysts are required to understand data streams that ingest millions of records into databases or file systems, Lambda architecture and batch processing of data to understand the influx of data.

Also analysing larger amounts of data requires skills that range from understanding the business complexities, the market and the competitors to a wide range of technical skills in data extraction, data cleaning and transformation, data modelling and statistical methods.

Analytics being a relatively new field, is struggling to resource the market demands with highly skilled Big Data Analysts. Being a Big Data Analyst requires a thorough understanding of data architecture and the data flow from source systems into the big data platform. One can always stick to a specific industry domain and specialize within that, for example Healthcare Analytics, Marketing Analytics, Financial Analytics, Operations Analytics, People Analytics, Gaming Analytics etc. But mastering the end-to-end data chain management can lead to plenty of opportunities, irrespective of industry domain.

The entire Data and Analytics suite includes the following gamut of stages:

  • Data integrations – connecting disparate data sources
  • Data security and governance – ensuring data integrity and access rights
  • Master data management – ensuring consistency and uniformity of data
  • Data Extraction, Transformation and Loading – making raw data business user friendly
  • Hadoop and HDFS – big data storage mechanisms
  • SQL/ Hive / Pig – data query languages
  • R/ Python –  for data analysis and mining programming languages
  • Data science algorithms like Naive Bayes, K-means, AdaBoost etc. – Machine learning algorithms for clustering, classification
  • Data Architecture – solutionizing all the above in an optimized way to deliver business insights

The new age data analysts or a versatile Big Data Analyst is one who understands the complexity of data integrations using APIs or connectors or ETL (Extraction, Transformation and Loading), designs data flow from disparate systems keeping in mind data security and quality issues, can code in SQL or Hive and R or Python and is well acquainted with the machine learning algorithms and has a knack at understanding business complexities.

Since Big Data and Analytics is constantly evolving, it is imperative for anyone aiming at a career within the same, to be well versed with the latest tech stack and architectural breakthroughs. Some ways of doing so:

  • Following knowledgeable industry leaders or big data thought leaders on Twitter
  • Joining Big Data related groups on LinkedIn
  • Following Big Data influencers on LinkedIn
  • Attending events, conferences and seminars on Big Data
  • Connecting with peers within the Big Data industry
  • Last but not the least (probably the most important) enrolling in MOOC (Massive Open Online Course) and/ or Big Data books

Since Analytics is a vast field, encompassing several operations, one could choose to specialise in parts of the Analytics chain like data engineers – specializing in highly scalable data management systems or data scientists specializing in machine learning algorithms or data architects – specializing in the overall data integrations, data flow and storage mechanisms. But in order to excel and future proof a career in the world of Big Data, one needs to master more than one area. A data analyst who is acquainted with all the steps involved in data analysis from data extraction to insights is an asset to any organization and will be much sought after!

Programmatic Conversion

Programmatic marketing involves data driven insights to convert prospects into customers. There is more than meets the eye in the case of conversion rate optimization. Some of the deciding factors for conversion are UX design, the landing page, the source of web traffic, content, competitive price of products, good will, social media marketing, effective campaigns and customer engagement. Programmatic marketing entails analsying data at every customer touch point and targeting the consumer with compelling, preferably  personalised, offers. Conversion is not necessarily making a customer shell out money, it could be interpreted as winning customer loyalty by means of signing up for newsletter, downloading whitepapers or trial versions of the product or spending considerable time on the site. This loyalty, in the long run, could result in big wins through persuasion in the form of emails, SMSs, direct contact and targeted recommendations.

Channelizing data about prospects – online behaviour, previous shopping, socio-economic segmentation, online-search, products saved in the online basket, in other words getting to know the customer better to be able to suggest meaningful differences in people’s lives through the products on offer, results in higher conversion rates. It is here that digital convergence is of paramount importance. Digital convergence blends online and offline consumer tracking data over multiple channels to come up with targeted campaigns. Offline tracking through beacon technology is catching up. It is a win-win solution for both the retailer and the consumer providing each with useful information, the consumer, with an enabled smartphone app within a certain distance from the beacon, recieves useful and targeted information about products and campaigns and the retailer gathers data about consumer shopping habbit.

The online experience can be enhanced to reduce the bounce rate by incorporating some of the following design thoughts:

  1. Associative content targeting: The web content is modified based on information gathered about the visitor’s search criteria, demographic information, source of traffic, the more you know about the prospect, the better you can target.
  2. Predictive targeting: Using predictive analytics and machine learning, recommendations are pushed to consumers based on their previous purchase history, segment they belong to and search criteria.
  3. Consumer directed targeting: The consumer is presented with sales, promotions, reviews and ratings prior to purchase.

Programmatic offers the ability to constantly compare and optimize ROI and profitability across mulitple marketing channels. Data about consumer behaviour, both offline and online, cookie data, segmentation data are algorithmically analyzed, to re-evaluate the impact of all media strategies on the performance of consumer segments. Analyzing consumer insights, testing in iterations, using A/B testing contributes to a higher conversion rate. Using data driven methods to gain a higher conversion rate is programmatic conversion and it’s here to stay.

Intelligence Of Things

IoT
IoT

IoT – Internet of things, is the science of an interconnected everyday life through devices communicating over WiFi, cellular, ZigBee, Bluetooth, and other wireless, wired protocols, RFID (radio frequency identification), sensors and smartphones. Data monetization has lead to generating revenue by gathering, analyzing customer data, industrial data, web logs from traditional IT systems, online stream, mobile devices and sensors and an interconnection of them all, in other words, IoT. IoT is hailed as the new way to transform  the education sector, retail, customer care, logistics, supply chain and health care. IoT and data monetization have a domino effect on each other which generate actionable insights for business metrics, transformation and further innovation.

The wearable devices are a great way to keep tab on patient heart rates, step counts, calories consumed and burnt. The data gathered from such devices are not only beneficial for checking vital signs but also can be used to scrutinize effectiveness of drug trials, analyzing the causes behind the way body reacts to different stimulus. IoT in logistics, by reading the bar codes at every touch point that track the delivery of products, comparing the estimated with the actual time of delivery, analyzing the reasons causing the difference can help businesses bolster better processes. In Smart buildings, HVAC (heating, ventilation, air conditioning), electric meters, security alarm data are integrated, analyzed to monitor building security, improve operational efficiencies, reducing energy consumption and improving occupant experiences.

IoT is expected to generate large amounts of data from varied sources  with a high volume and very high-velocity, thereby increasing the need to better index, store and process such data. Earlier the data gathered from each of the sources was analyzed in a central hub and communicated to other devices, but the IoT brings a new dimension called the M2M (machine to machine) communication. The highlights of such M2M platforms are

  • Improved device connectivity
  • API, JSON, RDF/XML integration availability for data exchange
  • Flexible to be able to capture all formats of data
  • Data Scalability
  • Data security across multiple protocols
  • Real-time data management – On premise, cloud or hybrid platforms
  • Low TCO (total cost of ownership)

The data flow for an end-to-end IoT usecase entails capturing sensor-based data using SPARQL for RDF encoded data from different devices, wearables into a common data platform to be standardised, processed, analyzed and communicated further as dashboards, insights, as input to some other device or for continuous business growth and transformation. Splunk, Amazon, Axeda are some of the M2M platform vendors that provide end to end connectivity of multiple devices, data security and realtime data storage and mining advantages. Data security is another important aspect of IoT, adhering to data retention policies. As IoT evolves, so will the interconnectivity of machine-to-machine platforms, exciting times ahead!

Recommendation Systems

Recommendation systems have changed the way people shop online, find books, movies or music, news articles go viral or find friends and work mates on Linkedin. The recommendation systems analyze the browsing patterns on websites, ratings or most popular items at that point of time or the products saved in ones virtual basket to recommend products. Similarly, the common interests, work skills or common geographical locations are used to predict people, that you might want to connect with on social media sites.

Behind such personalized recommendation systems lie big data platforms including software, hardware and algorithms that analyze customer behavior and push recommended products, in real time. The big data platforms handle both data and event data distribution and computation. Data can pertain to how customers or customers similar to the one in question, have rated products in the past while event data could be tracking mouse clicks that trigger events for example viewing a product and sometimes both of the above need to be combined to be able to predict a customer’s choice. Hence, the recommendation system architecture caters to data storage for offline analysis as well as low latency computational needs and a combination of the two.

The data platform architecture needs to be robust enough to ingest continuous real time data streams into scalable systems like Hadoop HBASE or any other big data data storage infrastructure like AWS Redshift. Apache Kafka is usually used as the messaging system for the real time data stream in combination with Apache Storm. Due to high throughput data redundancy needs to be taken care of, in case of failures. If the real time computation needs to take into account customer data like previous purchase history, preferences, products already bought , segmentation based on socio economic demographics or data from ERP, CRM, in that case either all the systems have to be available online to be able to blend the data in real time or the customer detail data could be mashed up, offline to create Single Customer View and queried in combination with the real time event data.

The valueable assests of any organisation are customers,products and now, data. Machine learning algorithms combine the three assets together to leverage business gains and predictive analytics is imperative in being proactive to customer needs. Some of the algorithms used for recommendation engines are content-based filtering, collaborative filtering, dimensionality reduction, Kmeans and matrix factorization techniques. The challenge is not the data storage, with wide availability of highly scalable data storage platforms, but the speed with which the data needs to be analyzed in case of recommendation systems. The best approach is to combine mostly precomputed data with fresh event data using pre modelled algorithms to push personalised recommendations to the customer interface.

The data value chain

lifecycle
The Consumer Lifecycle

The terms “Data driven” and “Big Data” are the buzz words of today, hyped definitely, but the implications and potential are real and huge! Tapping into the enormous amount of data and associating this data from multiple sources creates a data chain, proving valueable for any organisation. Creating a data value chain consists of four parts: collection, storage, analysis, and implementation. With data storage getting cheaper, the volume and variety of data available to be exploited is increasing exponentially. But unless businesses ask the right questions and better understand the value that the data brings in and be sufficiently informed to make the right decisions, it does not help storing the data. For example, in marketing, organisations can gather data from multiple sources about acquiring a customer, about the customer’s purchasing behaviour, customer feedback on different social media, about the company’s inventory and logistics of product delivery. Analyzing this stored data can lead to substantial number of customers being retained.

A few of the actionable insights can be as follows:
  • Improving SEO (search engine optimization), increasing the visibility of the product site and attracting more customers
  • CRO (Conversion rate optimization) i.e. converting prospects into sales, by analzying the sales funnel. A typical sales funnel is Home page > search results page > product page > proposal generation and delivery > negotiation > checkout
  • Better inventory control systems, resulting in faster deliveries
  • Predicting products that a consumer might be interested in, from the vast inventory, by implementing good recommendation algorithms that scan through the consumer behaviour and can predict their preferences
  • If some of the above points are taken care of, customer loyalty can increase manifold, based on the overall experience during the entire consumer lifecycle.
actionable
Data blending which leads to a Single Customer View and Actionable Insights

Often the focus lies on the Big data technology rather than the business value of implementing big data projects. Data is revolutionising the way we do business. Organisations, today, are inundated with data. To be able to make sense of the data and create a value chain, there has to be starting point and the customer is a good starting point. The customer’s lifecycle with experiences at every touch point defines business growth, innovation and product development. The big data implementations allow blending data from multiple sources leading to a holistic single view of customer, which in turn gives rise to enlightening insights. The data pretaining to customer, from multiple sources, like CRM/ERP/Order Management/Logitics/Social/cookie trackers/Click traffic etc., should be stored, blended and analysed to gain useful actionable insights.

In order to be able to store the gigantic amount of data, organisations have to invest in robust big data technologies. The earlier BI technologies that we had do not support the new forms of data sources such as unstructured data and the huge volumes, variety & velocity of data. The big data architecture consists of the integration from the data sources, the data storage layer, the data processing layer where data exploration can be performed and/or topped with a data visualization layer. Both structured and unstructured data from various sources can be ingested into the big data platform, using Apache Sqoop or Apache Flume, real-time interactive analyses can be performed on massive data sets stored in HDFS or HBase using SQL with Impala, HIVE or using statistical programming language such as R. There are very good visualization tools, such as Pentaho, Datameer, Jaspersoft that can be integrated into the Hadoop ecosystem to get visual insights. Organisations can offload expensive datawarehouses to low cost and high storage enterprise big data technology.

bigdatarch
Edited image from Hortonworks

Irrespective of the technical implementation, business metrics such as increasing revenue, reducing operational costs and improving customer experience, should always be kept in mind. The manner in which the data is analyzed could create new business opportunites and transform businesses. Data is an asset and investing in a value chain, from gathering to analyzing, implementing, analyzing the implementations and evolving continuously, will result in huge business gains.

Streamlining the process of processing

simplifyThe customer expectations are very different, now. Decisions need to be taken in real time, to convert a prospective customer into committing. In an age, where customer seeks instant gratification, organisations that have a longer time-to-market due to cumbersome internal processes, customer loyalty is hard to win. For example, a customer visits your physical store, if you offer a discount at the very first visit, the chances that the customer will revisit your store are high. On the other hand, if you are merely noting customer behaviour which then has to pass through unwieldy processes, later, to mete out a discount coupon, the second time the customer visits your store… if at all, is a thing of the past. The advanced analytics systems now, are able to handle data influx from multiple disparate systems, cleanse and house in the dmp (data management platforms), ready to be queried in real time to cater to predictive and actionable insights, on the fly.

However, if the business methodologies used are not complimenting this speed of data processing, the business will still suffer. The widely used, Lean methodology preaches creating more value for customers with fewer resources. Anything that does not yield value should be eliminated. But organisations need to adapt to only the best of the best practices. Following methodologies by the book, on the contrary, causes bottlenecks. To be able to leverage more out of the Business Analytics systems and solutions, the processes and tools, both, need to be streamlined to create customer satisfaction. A lot of the business intelligence projects take too long to deliver and are inflexible, resulting in the functional business teams procuring BI tools which promise quick wins. The problem with such data discovery tools, apart from creating data silos, are that they lack data governance, hinder data sharing at an enterprise level and increase licensing costs.

It is not a solution to have no business process at all. There needs to be accountability and that comes from business processes. It is a continuous iterative process to find the right balance between processes and the speed of delivering value to keep the costs low and increase the profitability of any business. One size does not fit all and it applies to organisations, as well. Methodologies/processes need to be tweaked, tuned and tailor made for each company. Organisations that try to implement Lean/Agile/Scrum but fail are because they lose the customer focus, some companies do not have a clear strategy in place with employees being assigned foggy responsibilities and lack of communication and this in turn results in the focus shifting from the task at hand to the nitty gritties of such project management methods.

To avoid pitfalls, a clear business strategy needs to be defined specifying business goals in order to maximise gains. The next step is to trim all the processes that lead to this gain.

The bridge between Business and Analytics – Business Data Analyst

The terms business analysis and data analysis have traditionally seemed different. With the increasing amount of data available, stored and the need to analyse that data and gain business insights out of it, a new role, Business Data Analyst is critical. Companies lacking the business data analysis talent pool have a lower ROI and will lose out to companies hiring analytics talent.

Most companies, even today,  have the two competencies separate. Business analysts analyze functional requirements and help translate the same to technical specifications while data analysts are more technical, gathering, cleansing and analyzing data. To increase the analytic throughput of a company it is vital to combine the business and analytic competencies to be able to analyze the data from a business aspect, being able to draw conclusions about consumer behaviour, find trends and accordingly make business decisions with targeted marketing campaigns.

As this is an emerging field, it can be challenging to find right people with both the business acumen as well as analytics skillset. There can be myriad ways to bridge this gap. One strategy can be to create teams of people with direct marketing roles along with data analysts and data scientists to utilise the combined specialised competencies. Another strategy can be to train the management team’s analytical skills or beefing up the business knowledge of data analysts.

No matter which strategies are adapted, the new role of Business Data Analyst is paramount for enabling a company to make the right investments at the right time to yield an ROI. Building a data driven company is more than identifying the right BI tools, it’s about driving business through customer behaviour feedback by analyzing data.