Recommendation systems have changed the way people shop online, find books, movies or music, news articles go viral or find friends and work mates on Linkedin. The recommendation systems analyze the browsing patterns on websites, ratings or most popular items at that point of time or the products saved in ones virtual basket to recommend products. Similarly, the common interests, work skills or common geographical locations are used to predict people, that you might want to connect with on social media sites.
Behind such personalized recommendation systems lie big data platforms including software, hardware and algorithms that analyze customer behavior and push recommended products, in real time. The big data platforms handle both data and event data distribution and computation. Data can pertain to how customers or customers similar to the one in question, have rated products in the past while event data could be tracking mouse clicks that trigger events for example viewing a product and sometimes both of the above need to be combined to be able to predict a customer’s choice. Hence, the recommendation system architecture caters to data storage for offline analysis as well as low latency computational needs and a combination of the two.
The data platform architecture needs to be robust enough to ingest continuous real time data streams into scalable systems like Hadoop HBASE or any other big data data storage infrastructure like AWS Redshift. Apache Kafka is usually used as the messaging system for the real time data stream in combination with Apache Storm. Due to high throughput data redundancy needs to be taken care of, in case of failures. If the real time computation needs to take into account customer data like previous purchase history, preferences, products already bought , segmentation based on socio economic demographics or data from ERP, CRM, in that case either all the systems have to be available online to be able to blend the data in real time or the customer detail data could be mashed up, offline to create Single Customer View and queried in combination with the real time event data.
The valueable assests of any organisation are customers,products and now, data. Machine learning algorithms combine the three assets together to leverage business gains and predictive analytics is imperative in being proactive to customer needs. Some of the algorithms used for recommendation engines are content-based filtering, collaborative filtering, dimensionality reduction, Kmeans and matrix factorization techniques. The challenge is not the data storage, with wide availability of highly scalable data storage platforms, but the speed with which the data needs to be analyzed in case of recommendation systems. The best approach is to combine mostly precomputed data with fresh event data using pre modelled algorithms to push personalised recommendations to the customer interface.