Call for Paper

CAE solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 30, 2023.

Read More

A Reference Architecture and Road map for Enabling E-commerce on Apache Spark

Mohit Sewak, Sachchidanand Singh Published in Distributed Computing

Communications on Applied Electronics
Year of Publication: 2015
© 2015 by CAE Journal

Mohit Sewak and Sachchidanand Singh. Article: A Reference Architecture and Road map for Enabling E-commerce on Apache Spark. Communications on Applied Electronics 2(1):37-42, June 2015. Published by Foundation of Computer Science, New York, USA. BibTeX

	author = {Mohit Sewak and Sachchidanand Singh},
	title = {Article: A Reference Architecture and Road map for Enabling E-commerce on Apache Spark},
	journal = {Communications on Applied Electronics},
	year = {2015},
	volume = {2},
	number = {1},
	pages = {37-42},
	month = {June},
	note = {Published by Foundation of Computer Science, New York, USA}


Apache Spark is an execution engine that besides working as an isolated distributed, in-memory computing engine also offers close integration with Hadoop's distributed file system (HDFS). Apache Spark's underlying appeal is in providing a unified framework to create sophisticated applications involving workloads. It unifies multiple workloads, handles unstructured data very well and has easy-to-use APIs. Apache Spark also offers a streaming component called Spark Streaming, which can write the streamed data in the same data structures, also resides in-memory and can also be read by the Spark's Spark SQL component running on top of core Spark framework. Apache Spark has the ability to provide online machine learning, through its MLlib, and SparkR sub projects. With these, besides streaming data it can also execute machine-learning libraries, functions or algorithms. This paper analyzes Apache Spark and highlights the role of Apache Spark (and eco-system) in the architecture of a modern E-commerce platform. This paper also aims to propose horizontally and vertically scalable reference architectures for both small and medium (SME) & large E-commerce enterprises.


  1. Gartner Says India eCommerce Market To Reach $6 Billion in 2015, http://www. gartner. com/newsroom/id/2876517
  2. US eCommerce Forecast: 2013 To 2018, https://www. forrester. com/US+eCommerce+Forecast+2013+To+2018/fulltext/-/E-RES115513
  3. Finding a Spark at Yahoo! http://blogs. gartner. com/nick-heudecker/finding-a-spark-at-yahoo/
  4. MapR announces Apache Drill and Apache Spark integration, http://www. itwire. com/it-industry-news/development/65714-mapr-announces-apache-drill-and-apache-spark-integration
  5. Hortonworks Invests In Spark On Hadoop, http://www. informationweek. com/big-data/big-data-analytics/hortonworks-invests-in-spark-on-hadoop/d/d-id/1316035
  6. Let Spark Fly: Advantages and Use Cases for Spark on Hadoop, https://www. mapr. com/blog/let-spark-fly-advantages-and-use-cases-spark-hadoop-webinar-follow#. VX7eBGMpldF
  7. Cloudera Offers Apache Spark For Hadoop Big Data, http://google. com/newsstand/s/CBIwxYu3iRE
  8. Apache lights a fire under Hadoop with Spark, http://www. pcworld. com/article/2336380/apache-lights-a-fire-under-hadoop-with-spark. html#tk. rss_all
  9. Pivotal and EMC are betting on Spark cousin Tachyon as in-memoryfilesystem, http://google. com/newsstand/s/CBIwxJba3x8
  10. MemSQL extends in-memory database with Apache Spark connector, http://siliconangle. com/blog/2015/02/10/memsql-extends-in-memory-database-with-apache-spark-connector/
  11. Apache Spark, http://www. cloudera. com/content/cloudera/en/products-and-services/cdh/spark. html
  12. Survey reveals a few interesting numbers about Apache Spark, https://gigaom. com/2015/01/27/a-few-interesting-numbers-about-apache-spark/
  13. Here's why Python and Scala aren't old news in the world of data science, http://google. com/newsstand/s/CBIwqpGhgA8
  14. Apache Spark: Hadoop friend or foe?, http://siliconangle. com/blog/2015/02/05/apache-spark-hadoop-friend-or-foe/
  15. Databricks demolishes big data benchmark to prove Spark is fast on disk, too, http://google. com/newsstand/s/CBIwj7e31ho
  16. 4 reasons why Spark could jolt Hadoop into hyperdrive, http://google. com/newsstand/s/CBIw-Nyvnh8
  17. Mining Ecommerce Graph Data with Spark at Alibaba Taobao, https://databricks. com/blog/2014/08/14/mining-graph-data-with-spark-at-alibaba-taobao. html
  18. The New Retail Reality Calls for the Death of Traditional POS, http://blog. demandware. com/tag/ecommerce/page/12
  19. 4 Reference Architectures To Optimize Your Ecommerce, http://www. rackspace. com/blog/4-reference-architectures-to-optimize-your-ecommerce/
  20. Three New AWS Reference Architectures for E-Commerce, https://aws. amazon. com/blogs/aws/three-new-aws-reference-architectures-for-e-commerce/
  21. Flipkart sends apology mail to customers after its botched 'Big Billion Day Sale', http://ibnlive. in. com/news/flipkart-sends-apology-mail-to-customers-after-its-botched-big-billion-day-sale/504504-7. html
  22. Apache Flink 0. 8. 0 Released, Roadmap for 2015 Published, http://www. infoq. com/news/2015/01/apache-flink-0. 8. 0-released
  23. Cloudera is rebuilding machine learning for Hadoop with Oryx,http://google. com/newsstand/s/CBIwqaqThAk
  24. Haven Big Data Platform, http://www8. hp. com/in/en/software-solutions/big-data-platform-haven
  25. HP Distributed R, http://www. vertica. com/hp-vertica-products/hp-vertica-distributed-r/
  26. Spark fires up near-real-time big data, http://gcn. com/articles/2015/02/09/apache-spark. aspx


Apache Spark, E-commerce, Spark Streaming, Spark SQL, Shark, MLlib, Mahout, SPork, SparkR, GraphX, In-Memory Computing, Distributed Architecture, Big Data, Streaming Engine, Parallel Computing.