Architecting Modern Data Platforms

As organisations struggle to capture and leverage multitudes of data, there is a surge of technological options to choose from. Well designed data platforms facilitate experimentation, have shorter time to markets, have faster adaptation to latest advancements in data technologies, promote self-service thereby accelerating data adoption.  Data being the key enabler for business transformations, it is vital to build platforms that accelerate validation of use cases and can handle scaling of use cases and users. Designing a platform which is elastic enough to embody all the above can be quite a daunting task.

MDA

The primary points to consider when architecting modern data platforms:

  • Customer centric

Organisations battle immensely with legacy data technologies to deliver personalization, and customer experience, despite there being so much emphasis on hyper personalization. Thinking on the lines of creating 360 ° customer view helps align technological choices after business pain points.

  • Cloud Native

Cloud solutions support elastic scaling, high availability  and secure fully managed services with integration to a range of enterprise security systems including LDAP, Active Directory, Kerberos and SAML. Cloud  solutions allow pluggable architecture – replacing components if better options are available with minimum reconstructing. Cloud platforms eliminate the time-consuming work of provisioning resources and infrastructure, thereby reducing time to market.

  • Multi-platform architectures

Be it multi-cloud or multiple data storage patters, it should be the use cases that dictate the architectural patterns and not vice versa. Datawarehouses, datalakes and NoSQL databases can all co-exist on multi-cloud platforms if the use cases demand so. Organisations should avoid platform/vendor lock-ins, because then businesses are forced to make technology choices that are not in the best interests of the company.

  • Microservice-enabled

It is critical to  envision data as not just a means for visualization like a diagnostic tool, data is critical to help organizations adapt to change, in evolving business environments and to innovate and every company wants to expedite the process to be the first ones to come up with innovative products and services. Data plays a key role in this aspect. Monolithic applications are a major bottleneck in this case. In microservices based design small decoupled services are developed completely independent of each other  to achieve business requirements, faster, generally through REST APIs or event streams.

  • Flexible

Modern data platforms should be flexible enough to accomodate rapidly evolving business requirements. Be it integrating new data sources or feeding data into futurist data products. Modern data platforms should simplify testing new ideas on a small scale prior to making heavy investments in infrastructure.

Modernization continues to be a strong trend in data platforms, whether on Hadoop or RDBMS or multi-tenant solutions. It is the ease of integrating new data sources, TCO, prototyping functionalities, security and scaling that matter most in modern platform architectures.

 

Three reasons why Big Data projects fail

technology-3200401_1280

I have not been regular with my personal blog because I have been blogging elsewhere.

Here are the links to my latest blog posts about why Big Data projects fail and how to attract more women into tech.

Having worked extensively in the Big data & IoT space I have closely observed failures over and over again and the reasons for failure being repetitive :

  • Wrong use cases
  • Wrongly staffed projects
  • Obsolete technology

Read the blog post for more details:

Three reasons why Big Data projects so often fail

Being a woman in tech or woman in data I am often the only woman in meetings, trainings and discussions which feels weird. With not many women in tech it gets easier to discriminate the few that do exist. Incidents of mansplaining, gaslighting are rampant and it’s the victim that gets labelled as drama queen while the abusers fo scot free. Organisations that are serious about increasing the number of women in tech need to address glass ceiling, gender wage gaps & bro-culture and cultivate an inclusive work atmosphere. Read my post on how to get more women into tech.

How to Get More Women in Tech

How to become big data – data analyst

Anyone who works in the tech industry is aware of the rising demand of Analytics/ Machine learning professionals. More and more organisations have been jumping on to the data driven decision making bandwagon, thereby accumulating loads of data pertaining to their business. In order to make sense of all the data gathered, organisations will require Big Data Analysts to decipher the data.

  Data Analysts have traditionally worked with pre formatted data, that was served by the IT departments, to perform analysis. But with the need for real time or near-real time Analytics to serve end customers better and faster, analysis needs to be performed faster, thereby making the dependency on IT departments a bottleneck. Analysts are required to understand data streams that ingest millions of records into databases or file systems, Lambda architecture and batch processing of data to understand the influx of data.

Also analysing larger amounts of data requires skills that range from understanding the business complexities, the market and the competitors to a wide range of technical skills in data extraction, data cleaning and transformation, data modelling and statistical methods.

Analytics being a relatively new field, is struggling to resource the market demands with highly skilled Big Data Analysts. Being a Big Data Analyst requires a thorough understanding of data architecture and the data flow from source systems into the big data platform. One can always stick to a specific industry domain and specialize within that, for example Healthcare Analytics, Marketing Analytics, Financial Analytics, Operations Analytics, People Analytics, Gaming Analytics etc. But mastering the end-to-end data chain management can lead to plenty of opportunities, irrespective of industry domain.

The entire Data and Analytics suite includes the following gamut of stages:

  • Data integrations – connecting disparate data sources
  • Data security and governance – ensuring data integrity and access rights
  • Master data management – ensuring consistency and uniformity of data
  • Data Extraction, Transformation and Loading – making raw data business user friendly
  • Hadoop and HDFS – big data storage mechanisms
  • SQL/ Hive / Pig – data query languages
  • R/ Python –  for data analysis and mining programming languages
  • Data science algorithms like Naive Bayes, K-means, AdaBoost etc. – Machine learning algorithms for clustering, classification
  • Data Architecture – solutionizing all the above in an optimized way to deliver business insights

The new age data analysts or a versatile Big Data Analyst is one who understands the complexity of data integrations using APIs or connectors or ETL (Extraction, Transformation and Loading), designs data flow from disparate systems keeping in mind data security and quality issues, can code in SQL or Hive and R or Python and is well acquainted with the machine learning algorithms and has a knack at understanding business complexities.

Since Big Data and Analytics is constantly evolving, it is imperative for anyone aiming at a career within the same, to be well versed with the latest tech stack and architectural breakthroughs. Some ways of doing so:

  • Following knowledgeable industry leaders or big data thought leaders on Twitter
  • Joining Big Data related groups on LinkedIn
  • Following Big Data influencers on LinkedIn
  • Attending events, conferences and seminars on Big Data
  • Connecting with peers within the Big Data industry
  • Last but not the least (probably the most important) enrolling in MOOC (Massive Open Online Course) and/ or Big Data books

Since Analytics is a vast field, encompassing several operations, one could choose to specialise in parts of the Analytics chain like data engineers – specializing in highly scalable data management systems or data scientists specializing in machine learning algorithms or data architects – specializing in the overall data integrations, data flow and storage mechanisms. But in order to excel and future proof a career in the world of Big Data, one needs to master more than one area. A data analyst who is acquainted with all the steps involved in data analysis from data extraction to insights is an asset to any organization and will be much sought after!

Continuous delivery of Analytics

Screen Shot 2016-09-18 at 16.48.40.png

 

I am biased towards Analytics not only because it is my bread and butter but also my passion. But seriously, Analytics is the most important factor that helps drive businesses forward by providing insights into sales, revenue generation means, operations, competitors and customer satisfaction.

wud-slovakia-2015-datadriven-design-jozef-okay-8-638Analytics being paramount to businesses, the placement of it is still a matter of dispute. The organisations that get it right and are using data to drive their businesses, understand fully well that Analytics is neither a part of IT nor a part of business. It is somewhere in between, an entity in itself.

The insights generated from Analytics is all about business drivers:

  • Performance of the product (Product Analytics)
  • How well is the product perceived by customers (Customer Experience)
  • Can the business generate larger margins without increasing the price of the product (Cost Optimisation)
  • What is the bounce rate and what causes bounce (Funnel Analytics)
  • Getting to know the target audience better (Customer Analytics)

While the above insights are business related and require a deep understanding of the product, online marketing knowledge, data stickiness mastery and product management skills, there is a huge IT infrastructure behind the scenes to be able gather the data required and generate the insights.

To be able to generate the business insights required to drive online and offline traffic or increase sales, organisations need to understand their targeted customer base better. Understanding customer behaviour or product performance entails quite a number of technical tasks in the background:

  • Logging events on the website or app such as registration, add to cart, add to wish list, proceed to payment etc. (Data Pipelines)
  • Having in place a scalable data storage and fast computing infrastructure, which requires knowledge about the various layers of tech stack
  • Utilising machine learning and AI to implement Predictive Analytics and recommendations
  • Implementing data visualisation tools to distribute data easily throughout the organisation to facilitate data driven decision making and spread data literacy

As is the case, Analytics cannot be boxed into either Tech or Business. It is a conjoined effort of both business and tech to understand the business requirements and translate the same into technically implementable steps. Many organisations make the mistake of involving Analytics at the end stage of product or concept development, which is almost a sure shot fiasco. Analytics needs to be involved at every step of a product development or customer experience or UX design or data infrastructure to make sure that the events, the data points that lead to insights, are in place from the beginning.

Delivering Analytics solutions is a collaborative effort that involves DevOps, data engineers, UX designers, online marketeers, social media strategists, IT strategists, Business Analysts, IT/Data architects and data scientists. A close co-operation between tech and business leads to continuous delivery of smarter and faster automations, enhanced customer experience and business insights.

Build. Measure. Evaluate. Optimise. Reevaluate.

 

 

Data integration is not a choice!

samsung-793043_640Every organization irrespective of industry has several business processes, each business process being supported by several IT products. Each of these IT products have an insurmountable amount of information that can generate insights which are paramount for any organization. Businesses that have been around for a while have obsolete processes and legacy systems that support the same. A typical organization independent of industry has transaction processing systems, CRM systems, ERP, billing and business analytics solutions. Each solution in itself is a silo if not integrated with the rest of the solutions. Granted that each of these solutions harbour valuable information but the the information residing in each system does not generate a holistic view of the business.

Integrating the silos is a Herculean task, or so it may seem, if the solutions are outdated and do not support APIs, plug-ins and adapters. Most CRM, ERP, Marketing automation products, lately are equiped with some form of connector, enabling data blending. If an organization has systems that do not support the above, then it is wise to migrate or upgrade the solutions to versions compatible with data extraction. Migrating legacy systems is a rocky road but the trade off being elimination of data silos. Often the implementation cycle of new software solutions are so long that the idea becomes outdated even before the roll out. Ofcourse there exist solutions with shorter time-to-market, for example data analytics platform that are run on Spark have a faster implementation cycle and are scalable, providing the flexibility that growing businesses need.

It was not long ago that marketing and data analytics borders got blurred due to new business needs. This has resulted in complex technological challenges. Not all businesses have the budget and resources to invest in migrating and upgrading most of the legacy systems. But in order to appease todays demanding customers, data integration is the key. No customer would like to remember or rummage through their homes to find old reciepts or mails when they call the customer care for a service or to complain. They would very much expect that on identifying themselves, the customer care representative not only solves their grievances but also comes up with suggestions to improve their customer lifecycle, which can be only attained by integrating data from disparate systems to gain a 360 degree view of the customer journey. Data integration, thus is a matter of being in business or out.

To start with, businesses should identify each data silo that exists and the function that each of them fulfill. (There maybe exist examples of one business process that is fulfilled by several software solutions. If an organization lacks data governance, then the number of redundant solutions and products can be plenty.) Listing and mapping business processes to softare solutions clarifies the current architecture. The next process is

  • To identify the to-be roadmap
  • Map solutions that support data blending, to each of the business process whiteboard-849810_640

The solutions that are adapted for new age businesses require to embody the following characteristics:

  • Easy to implement
  • Short implementation time
  • Compatability with a wide range of disparate systems
  • Easy to implement data security and access rights
  • Scalable
  • Forward compatible

Businesses need technology that support business gain and growth and the ever changing rules of the game (read disrutption).