February 9, 2021

Scale out workload using Azure HDInsight

Filed under: architecture,cloud Himanshu @ 12:47 pm

Scale out workload using Azure HDInsight

Scale is one of the reasons why organizations go for cloud computing, regardless of their size – enterprise or startup and whether it is for scale up or for scale out. For the uninitiated, simply put Scale Up is to add more power to single machine and Scale Out is to distribute the computing workload across multiple computers. Scale out enables the possibility of scaling to a level that is not possible with scale up. Scale out gives you virtually infinite scale.

Using the arsenal of Azure services, nice varieties of architectural patterns are possible for scaling out for large workload. Azure Functions, App Services, Azure Kubernetes clusters, HDInsight cluster, Azure synapse, Virtual Machine scale set are the most commonly used Azure services for scaling out depending on the kind of workload. We recently solved a large data processing workload using Azure HDInsight.

We used Azure HDInsight for a healthcare client that has datasets with multiple years of patient treatment data (anonymized) across multiple entire health systems. The processing itself was complex and entailed many transformations and computations. The end result was to submit the analyzed information to CMS (Center for Medicare and Medicaid Services) in a condensed and auditable manner. We designed our solution exactly for this high-demand situation. We used Apache Spark on an Azure HDInsight cluster. Azure HDInsight is a platform that makes it possible to easily create cluster of computers with preconfigured open source frameworks like Apache Spark, Apache HBase, Apache Kafka. Since we used Apache Spark; we loaded up the processing code and raw data into the cluster to complete big data processing. We spawned a cluster of about 40 computers working together. Effectively, this meant we ‘created’ a huge computer with about 2.5 terabytes of RAM (that’s right, it’s not typo, terabytes not gigabytes and RAM not disk storage!), about 325 processor core equating to about 650 virtual cores and about 16 terabytes of disk space, working together for a single large workload. Using this design and scale, we could complete the workload in about 8 hours that otherwise would have taken about 2-3 weeks to complete. Finally, we just needed this temporarily so we could tear the entire infra down after we were done thus keeping our infrastructure cost optimal.

While it sounds cliched but there is a lot of truth to the statement “Cloud computing is a big game changer”! There are several situations where having access to large computing resources can be a great competitive advantage. Thanks to AWS, Azure and other cloud providers, this can be done at a fraction of capex however your software architecture needs to support it.

December 16, 2016

World of Serverless Applications

Filed under: architecture,cloud Himanshu @ 2:48 pm

I still remember that night of year 1998. Me and my friend sat with a bunch floppies to install Novel Netware Server on a hardware physically sizing to about same as home refrigerator. And at next day morning we had server ready, giving us ability to share files among a bunch of PCs, and user management. The word “Server” would mean real big thing in those days. Exaggerated analogy could be “In those stone age days!” Lot of things changed in those stone age days v/s now, and today we are talking about Serverless architecture.

Serverless architecture really seems promising to me. It is new paradigm, and once developer toolset and frameworks matures around it, we would be going to new exciting world! Obviously Serverless architecture does not literally mean that there wouldn’t be any server, from that perspective the name is miss-leading. What in nutshell means is software engineer, and deployment team wouldn’t need to do provisioning of the software that solves business problem. Health monitoring software will take care of it, and that too depending upon load. Scalability needs starting from running on ‘No server’ to ‘N server’ will depend upon the load at a point of time. It all started when AWS one more time proving themselves to be leader in the area of cloud innovation, by introducing AWS Lambda in their offerings back in end of 2014. And recently Microsoft also published similar service in their Azure offering, called Azure Functions. Google’s GCP is also following  the lead and has offering with name Cloud Function.

Serverless architecture is Microservices and Scalability on steroid!

This are couple of scenarios that I think it fits best:

  • Startups can be most benefited with Severless architecture. In the beginning there will low usage, and hence less revenue, and hence need to have lessor cost on infrastructure and when the startup grows to large scale, the solution would react and scale to increased load automatically, and so would infrastructure cost. All of this without doing any code change, if Architected right in the beginning.
  • Another scenario could be of IoT web service endpoint to which devices connects to push or pull the data. Number of connected devices would drive the infrastructure cost dynamically.
  • Blogs and Content Management sites. Hey, this is big opportunity! There are many organizations in the world who do not want to have hassle of maintaining servers, do not want to spend a lot on infrastructure. But wants to have lightweight online presence. They will be greatly helped by having platform on which they pay by number of request coming to their site instead of fix hardware cost. What do you say? If you reach their before me, consider giving me credit for the idea :).

While all looks bright around Serverless architecture, here are few suggestions from my exploration:

  • Use coding framework that is light weight on it’s boot-up time. This will avoid having long request time after cold boot, I’m using node.js
  • While better toolsets for developers becoming ready and available for general community, consider using tools like Serverless package available on npm registry provided by Serverless.com. It works very well while working with AWS Lambda. It not only does the deployment of code on AWS Labda but also sets up HTTP endpoint in API gateway if event of Labda is configured to be HTTP endpoint. And everything works very seamlessly. Serverless package can be used even if your solution is not built in node.js
  • Never try to do lot of things together in one Lambda/Function. Break down time consuming work into multiple chunk of work-items and utilize AWS Simple Queue Service (SQS), or Azure Queues.
  • Do good enough logging, as that would be your savior to debug any issue.

Here is quick simple example. Let’s first go through requirements: Application allows user to create, view, edit and archive notes

  • User can add notes. Application records the note along with date-time when it was posted
  • While adding or editing notes, user can do formatting. Supported formatting would be making text bold, italic, and underline
  • User can view list of notes order by date-time it was posted in descending order
  • User can archive note. On archival note would be filtered out from the list of notes
  • User can review list of notes that are archived

Here is how it is structured:

serverless-201612

I’m in process of enhancing this example application with more features and in the process will end up using more of AWS services.

  • Authentication and Authorization,
  • Storing more structure along with note, like author, tagging the note, sharing the note with other subscribers.
  • Searching note
  • Attachments
  • Notification on shared note change

And to support these features will use DynamoDB, Simple Email Service (SES), Simple Notification Service (SNS) and Simple Queue Service (SQS), and more.

Eager to see unfolding of new ways of building and deploying software!

Powered by WordPress