Browsed by
Category: Cloud

Leveraging Private Dev Containers

Leveraging Private Dev Containers

So this should be a pretty quick post, but I thought I would share this tip and trick I found while playing around with the implementation.

Dev Containers really are an amazing advancement in the development tools that are out there. Gone are the days of being handed a massive document and spending a day or two configuring your local development machine.

Dev containers make it easy to leverage docker to spin up and do development inside a container, and then putting that container reference into your repo to be rendered.

Now, the problem becomes, what if you have private python packages or specific internal tools that you want to include in your dev container? What can you do to make it easier to developer to leverage?

The answer is, you can host a container image on a registry that is private and exposed to your developers via their azure subscription. The benefit to this is it makes it easy to standardize the dev environment with internal tools, and make it easy to spin up new environments without issue.

So the question becomes “How?” And the answer is a pretty basic one. If you follow the spec defined here, then you will see in the devcontainer spec for the json file, there is an initializeCommand option, which allows you to specify a bash script to run during the initialization of the container.

But inside that script, you can add the following commands to make sure your dev container works:

az login --use-device-code
az acr login --name {your registry name}
docker pull {repository/imagename}:latest

And then when you build the DockerFile, you just point to your private registry. This means that whenever your team is able to start up their dev container, they will get a login prompt to enter the code, and log into the private docker registry. And that’s it!

How to leverage templates in YAML Pipelines

How to leverage templates in YAML Pipelines

So now secret that I really am a big fan of leveraging DevOps to extend your productivity. I’ve had the privilege of working on smaller teams that have accomplished far more than anyone could have predicted. And honestly the key principle that is always at the center of those efforts is treat everything as a repeatable activity.

Now, if you look at the idea of a micro service application, at it’s core its several different services that are independently deployable, and at it’s core that statement can cause a lot of excessive technical debt from a DevOps perspective.

For example, if I encapsulate all logic into separate python modules, I need a pipeline for each module, and those pipelines look almost identical.

Or if I’m deploying docker containers, my pipelines for each service likely look almost identical. See the pattern here?

Now imagine, you do this and build a robust application with 20-30 services running in containers. In the above, that means if I have to change their deployment pipeline, by adding say a new environment, I have to update between 20 – 30 pipelines, with the same changes.

Thankfully, ADO has an answer to this, in the use of templates. The idea here is we create a repo within ADO for our deployment templates, which contain the majority of the logic to deploy our services and then call those templates in each service.

For this example, I’ve built a template that I use to deploy a docker container and push it to a container registry, which is a pretty common practice.

The logic to implement it is fairly simple and looks like the following:

resources:
  repositories:
    - repository: templates
      type: git
      name: "TestProject/templates"

Using the above code will enable your pipeline to pull from a separate git repo, and then you can use the following to code to create a sample template:

parameters:
  - name: imageName
    type: string
  
  - name: containerRegistryName
    type: string

  - name: repositoryName
    type: string

  - name: containerRegistryConnection
    type: string

  - name: tag
    type: string

steps:
- task: Bash@3
  inputs:
    targetType: 'inline'
    script: 'docker build -t="${{ parameters.containerRegistryName }}/${{ parameters.imageName }}:${{ parameters.tag }}" -t="${{ parameters.containerRegistryName }}/${{ parameters.imageName }}:latest" -f="./Dockerfile" .'
    workingDirectory: '$(Agent.BuildDirectory)/container'
  displayName: "Building docker container"

- task: Docker@2
  inputs:
    containerRegistry: '${{ parameters.containerRegistryConnection }}'
    repository: '${{ parameters.imageName }}'
    command: 'push'
    tags: |
      $(tag)
      latest
  displayName: "Pushing container to registry"

Finally, you can go to any yaml pipeline in your project and use the following to reference the template:

steps:
- template: /containers/container.yml@templates
  parameters:
    imageName: $(imageName)
    containerRegistryName: $(containerRegistry)
    repositoryName: $(repositoryName)
    containerRegistryConnection: 'AG-ASCII-GSMP-boxaimarketopsacr'
    tag: $(tag)
Poly-Repo vs Mono-Repo

Poly-Repo vs Mono-Repo

So I’ve been doing a lot of DevOps work recently, and one of the bigger topic of discussions I’ve been a part of recently is this idea of Mono-Repo vs Poly-Repo. And I thought I would way in with some of my thoughts on this.

So first and foremost, let’s talk about what the difference is. Mono-Repo vs Poly-Repo, actually refers to how you store your source control. Now I don’t care if you are using Azure Dev Ops, GitHub, BitBucket, or any other solution. The idea here is whether you put the entirety of your source code in a single repo, or if you split it up into multiple repositories.

Now this doesn’t sound like a big deal, or might not make sense depending on the type of development code, but this also ties into the idea of Microservices. If you think about a micro-services, and the nature of them, then the debate about repos becomes apparent.

This can be a hot-debated statement, but most modern application development involves distributed solutions and architectures with Micro-services, whether you are deploying to a server-less environment, or even to Kubernetes, most modern applications involve a lot of completely separate micro-services that provide the total functionality.

And this is where the debate comes into play, the idea that let’s say your application is actually made of a series of smaller micro service containers that are being used to completely overall functionality. Then how do you store them in source control. Does each service get it’s own repository or do you have one repository with all your services in folders.

When we look at Mono-Repo, it’s not without benefits:

  • Easier to interact with
  • Easier to handle changes that cut across multiple services
  • Pull Requests are all localized

But it isn’t without it’s downsides:

  • Harder to control from a security perspective
  • Makes it easier to inject bad practices
  • Can make versioning much more difficult

And really in a lot of ways Poly-Repo can really read like the opposite of what’s above.

For me, I prefer poly-repo, and I’ll tell you why. Ultimately it can create some more overhead, but I find it leads to better isolation and enforcement of good development practices. But making each repo responsible for containing a single service and all of it’s components it makes for a much cleaner development experience and makes it much easier to maintain that isolation and avoid letting bad practices slip in.

Now I do believe is making repos for single purposes, and that includes things like a templates repo for deployment components and GitOps pieces. But I like the idea that to make a change to a service the workflow is:

  • Clone the Services Repo
  • Make changes
  • Test changes
  • PR changes
  • PR kicks off automated deployment

It helps to keep each of these services as independently deplorable in this model which is ultimately where you want to be as opposed to building multiple services at once.

Keeping the lights on! – Architecting for availability?

Keeping the lights on! – Architecting for availability?

Hello all, It’s been a while since I did a blog post outside of the weekly updates. But I wanted to do one in terms of conversations that I’ve been having a lot lately and seems to be largely universal. High Availability. So more and more, software is becoming a critical part of every aspect of our lives. To that end, we really see as developers / engineers, the following scenarios have become a constant reality:

  • For end customer software, not having access for an extend timeframe to an app or service can be the final nail in the coffin for a lot of users. Their tolerance for down time continues to drop. If you don’t believe me, research the metrics around how long someone will wait for a video to load before leaving according to YouTube.
  • For enterprises, organizations are becoming more and more reliant on software to function at the most basic level, meaning that outages or downtime windows have an even greater impact on their business, causing more parts of the organization to have to function at a diminished capacity or not at all during an outage.

The end result of these perceptions / realities is that the demands put on software solutions for maintaining availability are going higher and higher. And it becomes important to architect and plan for high availability to start with, as if you don’t it can be very expensive and difficult to retro-fit your applications to meet these demands.

This is a huge topic, and one that I’m not going to be able to cover in one blog post, but I’m hoping that we can identify ways to help if you are being tasked with meeting these demands.

Defining SLA

See the source image

So the first part of this conversation, always in my experience starts the same, “What’s our SLA?”, so let’s talk through what an SLA is? SLA stands for Service Level Agreement, and this is a legal agreement of what level of service you are required to provide.

Now the key part of that, is a “legal agreement”, this is not strictly a software function or engineering concept, but a business agreement in the sense that if an SLA is not met, there is a financial obligation from the organization to compensate the customer (in an enterprise setting).

Be Reasonable…

See the source image
Let’s not get crazy!

So the most common mistake I hear when someone starts down this road is “we need 100% SLA”, which is a bad place to start this process. Realistically this is almost impossible, the idea that you will never have an outage is extreme. And to get this level of resiliency you can expect to pay for it, and its easy to get upside down on your costs by starting out here. And really mean need to be realistic about the ask here.

Let’s walkthrough an example, let’s say you have a software the provides grant processing for a municipality, and that grant reviews are done monday to friday during business hours (8-6pm). If your customer says “We need a 100% SLA”, I would make the counter argument of “Do you really?” If the system is down from 1-2am on a saturday, does that really affect you and the nature of the business? Or is this just a matter of needing the solution to be up during those core business operating hours?

Conversely let’s go the other way, and say that you are providing a solution that provides emergency service communication in terms of a natural disaster? Would your customer be ok with a 5-minute downtime at 2am in the middle of a hurricane? Probably not. So tolerance should be measured in terms of actual impact to the end user and ability to function.

High Availability is like insurance, I can get add-ons to my policy for everything that could ever happen, but that means that I will likely be paying for things I don’t need. I can get volcano insurance in Pennsylvania, but the odds of needing it are so low to make it ridiculous.

So what we should be doing is finding a happy balance between what we can realistically do, and do by following recommended processes, and way the business calculation, and cost.

Let me give you a high level example, let’s say I deploy my production environment to one region, and I’ve calculated that the composite SLA (more on this later) to be 99.9% for one region. That means that right now I am telling my customers that I am expecting about 43.2 minutes of downtime a month.

But if I stood up a secondary region, and built out a lot of automation around failover and monitoring (lets say 80 hours of work), I could raise that SLA from 99.9% to 99.99% which would mean a downtime of 4.32 minutes.

Now what I need to weigh is the following:

  • 80 hours worth of labor costs
  • opportunity cost of not using that labor resource on new features
  • doubling my environment costs (2 active regions)
  • Potential advantage by supporting a higher SLA.

And I look at that and say, I’m saving 38.88 minutes of downtime in the process. So the question is, does that help my business and make sense from a financial position, or am I “ok” taking a financial hit and having only 1 environment up, and paying out if we are down for more than the 99.99% and rolling the dice on that.

I can’t say in the above discussion what the right answer is, because ultimately it depends on the type of business and resiliency of the application. You might be comfortable with that, you might not.

My point is that at the end of the day this is both an engineering problem and a business problem, and likely the right answer is somewhere in the middle.

Now to be clear, other times, especially in enterprise software, the customer may require a certain SLA, and at that point you might have to show that you meet that SLA by having specific redundancies in place. I’ll talk about this more in our next section.

Calculating a composite SLA

See the source image

Another common area of question, is “How do I calculate the SLA of my service?” And this is more straight forward than people realize. Let’s take the following example:

Note: You can find all of azure’s SLAs here.

ServiceSLA
App Service99.95%
Azure SQL99.99%

So based on the table above, the composite SLA would be:

.9995 * .9999 = .9994 = 99.94%

So that would imply that your cloud provider is standing behind these service to have downtime of :

730 (Hours per month) * (1 – .9994) = 26.28 minutes

Now the above is an estimate, but it would be around that time that we could expect to be our monthly downtime. This calculation doesn’t change the more services you add.

Now its important to note, this is the platform SLA, not your SLA. And I say that because at the end of the day, this is assuming that your application doesn’t have issues that cause downtime, so that should be considered as well.

How do we improve our SLA, start with “what is down?”

See the source image

Now for many cloud services, Microsoft and every other cloud provider gives recommendations to enhance resiliency and improve your SLA. One way to do that is to leverage items like Availability Zones and multi-region deployments. This allows you to spread out your application across multi-regions and it makes the probability of an outage drop substantially.

Really the first step here is to do a failure mode analysis, and determination of critical functionality. And what I mean by that is we need to define what constitutes the system being “Down”. So let’s take for instance you have an eCommerce platform, something like NopCommerce, and you have the following use-cases:

  1. Browse the catalog
  2. Add items to shopping cart
  3. Purchase items
  4. Publish blogs
  5. Send out notifications of deals / sales
  6. Process Orders

Now based on the above, we could identify 1,2,3, and 5 as mission critical, if we can’t allow our customers to shop, buy, and receive their products, that means that we are out of business. If we can’t publish a blog when we want to, or if a sale notice goes out a little late, its not ideal, but its not the end of the world. And let’s say that we have azure functions sending the notifications, and the blogs and promotions are managed by Cosmos DB.

So now based on that, we need to examine our architecture and identify what components are required to maintain the 4 key uses cases we identified. Notice I left off the elements that are not part of our key functionality for our SLA.

Let’s say we have the proposed architecture:

Now based on the above, I can calculate our primary region SLA to be:

ServiceSLA
Application Gateway99.95%
App Service99.95%
Azure SQL99.99%
Total SLA99.89%

So as a result of the above, we need to examine what elements of our solution are critical to the meeting our uptime SLA, and then doing a failure analysis. So based on the above use-cases, we can assume that the Traffic Manager, Application Gateway, App Service, and Azure SQL are essential to our meeting of our SLA. For the sake of this example, let’s say that the caching layer meets with industry recommendations and is used only for speed of access, if not available the application will just reach out to the database.

So how do we calculate the compound SLA for the two regions, we do that with the following math:

We basically have to figure out the probability of both regions being offline, so if we take the region “unavailability” of .12% and multiply it by one another:

0.12% * 0.12% = 0.0121%

Convert it back to availability:

100 % – 0.0121% = 99.99%

Now we take that multiplied by traffic manager SLA:

.9999 * .9999 = 99.99%

Failure Mode Analysis:

See the source image

A failure analysis means that we pick apart each element of the infrastructure and identify the following:

  • What potential failures could occur?
  • What are the different “modes” or “states” can this component be in?
  • How likely is a failure of this component?
  • What is the impact of each failure “mode” or “state” on the application?

After examining the above, you need to look at each of the “modes” or “states” and identify the following:

  • How you will respond and recover?
  • How you will monitor for this situation, before, during, and after?

So let’s take an example, because to me that always helps. If we examine the above solution, and say Azure SQL Database. If I were to do a failure mode analysis, I would find the following:

  • The database is offline in the following situations:
    • The database can be offline due to a platform issue
    • the database is shutdown
    • the database is deleted
  • The database is in a degraded state in the following situations:
    • Database is performing slowly due to high website demand.
    • Database is running slowly due to bad query optimization
    • Database is experiencing deadlocks

Now this is by no means an exhaustive list, but it hits the high points for our ecommerce site. Now in those states, I need to identify what do for these scenarios. So the question is how do we respond and recover. In the case of the database, the most common recommendations are, to use a standard tier, and to use active geo-replication.

So for “How do we respond, recover?” I would say we setup active geo-replication of our production database to a secondary region. In the event the database is “offline” we fail-over to a secondary region and leverage traffic manager to route to the backup site. We would see some data loss during the failover, but for this exercise, let’s say that is manageable.

The next question is the most important, how do we monitor for this? The answer is we could do this a couple of ways:

  • Setup alerts via azure monitor around specific metrics.
  • Setup alerts in Application Insights for Dependency failures for database calls.
  • Build a page within our application that Traffic Manager can prob to identify when the database is unreachable and trigger failover.

The next mode was “degraded” and if we examine that the response is to increase the performance tier of the database to respond to increased demand, or do more in-depth analysis around the performance of the database. Again the monitoring would be similar of setting up alerts around these conditions to make appropriate staff aware.

So all kidding aside, this is a huge topic, and one I want to boil down more on how best to implement these solutions. This post didn’t begin to discuss the differences between RTO / RPO, or how you make sure to ensure resiliency through transient fault tolerance or distributed architectures, and that’s just scratching the surface, so more to come.

Leveraging Azure Search with Python

Leveraging Azure Search with Python

So lately I’ve been working on a side project, to showcase some of the capabilities in Azure with regard to PaaS services, and the one I’ve become the most engaged with is Azure Search.

So let’s start with the obvious question, what is Azure Search? Azure Search is a Platform-as-a-Service offering that allows for implementing search as part of your cloud solution in a scalable manner.

Below are some links on the basics of “What is Azure Search?”

The first part is how to create a search service, and really I find the easiest way is to create it via CLI:

az search service create --name {name} --resource-group {group} --location {location}

So after you create an Azure Search Service, the next part is to create all the pieces needed. For this, I’ve been doing work with the REST API via Python to manage these elements, so you will see that code here.

  • Create the data source
  • Create an index
  • Create an indexer
  • Run the Indexer
  • Get the indexer status
  • Run the Search

Project Description:

For this post, I’m building a search index that crawls through the data compiled from the Chicago Data Portal, which makes statistics and public information available via their API. This solution is pulling in data from that API into cosmos db to make that information searchable. I am using only publicly consumable information as part of this. The information on the portal can be found here.

Create the Data Source

So, the first part of any search discussion, is that you need to have a data source that you can search. Can’t get far without that. So the question becomes, what do you want to search. Azure Search supports a wide variety of data sources, and for the purposes of this discussion, I am pointing it at Cosmos Db. The intention is to search the contents of a cosmos db to ensure that I can pull back relevant entries.

Below is the code that I used to create the data source for the search:

import json
import requests
from pprint import pprint

#The url of your search service
url = 'https://[Url of the search service]/datasources?api-version=2017-11-11'
print(url)

#The API Key for your search service
api_key = '[api key for the search service]'


headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    'name': 'cosmos-crime',
    'type': 'documentdb',
    'credentials': {'connectionString': '[connection string for cosmos db]'},
    'container': {'name': '[collection name]'}
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

To get the API key, you need the management key which can be found with the following command:

az search admin-key show --service-name [name of the service] -g [name of the resource group]

After running the above you will have created a data source to connect to for searching.

Create an Index

Once you have the above datasource, the next step is to create an index. This index is what Azure Search will map your data to, and how it will actually perform searches in the future. So ultimately think of this as the format your search will be in after completion. To create the index, use the following code:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
     "name": "crimes",  
     "fields": [
       {"name": "id", "type": "Edm.String", "key":"true", "searchable": "false"},
       {"name": "iucr","type": "Edm.String", "searchable":"true", "filterable":"true", "facetable":"true"},
       {"name": "location_description","type":"Edm.String", "searchable":"true", "filterable":"true"},
       {"name": "primary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "secondary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "arrest","type":"Edm.String","facetable":"true","filterable":"true"},
       {"name": "beat","type":"Edm.Double","filterable":"true","facetable":"true"},
       {"name": "block", "type":"Edm.String","filterable":"true","searchable":"true","facetable":"true"},
       {"name": "case","type":"Edm.String","searchable":"true"},
       {"name": "date_occurrence","type":"Edm.DateTimeOffset","filterable":"true"},
       {"name": "domestic","type":"Edm.String","filterable":"true","facetable":"true"},
       {"name": "fbi_cd", "type":"Edm.String","filterable":"true"},
       {"name": "ward","type":"Edm.Double", "filterable":"true","facetable":"true"},
       {"name": "location","type":"Edm.GeographyPoint"}
      ]
     }

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

Using the above code, I’ve identified the different data types of the final product, and these all map to the data types identified for azure search. The supported data types can be found here.

Its worth mentioning, that there are other key attributes above to consider:

  • facetable: This denotes if this data is able to be faceted. For example, in Yelp if I bring back a search for cost, all restuarants have a “$” to “$$$$$” rating, and I want to be able to group results based on this facet.
  • filterable: This denotes if the dataset can be filtered based on those values.
  • searchable: This denotes whether or not the field is having a full-text search performed on it, and is limited in the different types of data that can used to perform the search.

Creating an indexer

So the next step is to create the indexer. The purpose of the indexer is that this does the real work. The indexer is responsible for performing the following operations:

  • Connect to the data source
  • Pull in the data and put it into the appropriate format for the index
  • Perform any data transformations
  • Manage pulling in no data ongoing
import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    "name": "cosmos-crime-indexer",
    "dataSourceName": "cosmos-crime",
    "targetIndexName": "crimes",
    "schedule": {"interval": "PT2H"},
    "fieldMappings": [
        {"sourceFieldName": "iucr", "targetFieldName": "iucr"},
        {"sourceFieldName": "location_description", "targetFieldName": "location_description"},
        {"sourceFieldName": "primary_decsription", "targetFieldName": "primary_description"},
        {"sourceFieldName": "secondary_description", "targetFieldName": "secondary_description"},
        {"sourceFieldName": "arrest", "targetFieldName": "arrest"},
        {"sourceFieldName": "beat", "targetFieldName": "beat"},
        {"sourceFieldName": "block", "targetFieldName": "block"},
        {"sourceFieldName": "casenumber", "targetFieldName": "case"},
        {"sourceFieldName": "date_of_occurrence", "targetFieldName": "date_occurrence"},
        {"sourceFieldName": "domestic", "targetFieldName": "domestic"},
        {"sourceFieldName": "fbi_cd", "targetFieldName": "fbi_cd"},
        {"sourceFieldName": "ward", "targetFieldName": "ward"},
        {"sourceFieldName": "location", "targetFieldName":"location"}
    ]
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

What you will notice is that for each field, two attributes are assigned:

  • targetFieldName: This is the field in the index that you are targeting.
  • sourceFieldName: This is the field name according to the data source.

Run the indexer

Once you’ve created the indexer, the next step is to run it. This will cause indexer to pull data into the index:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/run/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

reseturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/reset/?api-version=2017-11-11'

resetResponse = requests.post(reseturl, headers=headers)

response = requests.post(url, headers=headers)
pprint(response.status_code)

By triggering the “running” the indexer which will load the index.

Getting the indexer status

Now, depending the size of your data source, this indexing process could take time, so I wanted to provide a rest call that will let you get the status of the indexer.

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/status/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will provide you with the status of the indexer, so that you can find out when it completes.

Run the search

Finally if you want to confirm the search is working afterward, you can do the following:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes/crimes/docs?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will bring back the results of the search. This will bring back everything as it is an empty string search.

I hope this helps with your configuring of Azure Search, Happy searching :)!

Weekly Links – 5/27

Weekly Links – 5/27

Here’s this weeks links:

Technical Links:

Developer Life:

Weekly Links – 5/20

Weekly Links – 5/20

Here’s this weeks links:

Technical Links:

Developer Life:

Weekly Links – 5/13

Weekly Links – 5/13

Here’s this weeks links:

Technical Links:

Developer Life:

Getting Started with Azure (developer perspective)

Getting Started with Azure (developer perspective)

So there’s a common question I’ve been getting a lot lately, and that’s “I want to learn Azure, where do I start?” And this is ultimately a very reasonable question, because as much as the cloud has permuted much of the digital world, there are still some organizations who have only recently started to adopt it.

There are many reasons people would choose to adopt the cloud, scalability, cost, flexibility, etc. But for today’s post I’m going to focus on the idea that you have already decided to go to the Azure Cloud and are looking for resources to ramp up. So I wanted to provide those here:

MS Learn: The site provides videos, reading, and walk-through’s that can assist with learning this type of material:

MS Learn for Specific Services: There are several common services out there that many people think of when they think of the cloud, and I wanted to provide some resources here to help with those:

EDX Courses: EDX is a great site with a lot of well made courses, and there are a wealth of options for Azure and Cloud, here are a few I thought relevant, but it is not an exhaustive list.

  • Architecting Distributed Applications: One common mistake, that many make with regard to the cloud is that they think of it as “just another data center”, and that’s just not true. To build effective and scalable applications, they need to be architected to take advantage of distributed compute. This course does a great job of laying out how to make sure you are architected to work in a distributed fashion.
  • Microsoft Azure Storage: A great course on the basics of using Azure Storage.
  • Microsoft Azure Virtual Machines: The virtual machine is the cornerstone of azure, and provides many options to build an scale out effectively. This is a good introduction into the most basic service in Azure.
  • Microsoft Azure App Service: The most popular service in Azure, App Service enables developers to deploy and configure apps without worrying about the machine running under-the-covers. A great overview.
  • Microsoft Azure Virtual Networks: As I mentioned above, Software Based Networking is one of the key pieces required for the cloud and this gives a good introduction into how to leverage it.
  • Databases in Azure: Another key component of the cloud is the Database, and this talks about the options for leveraging platform-as-a-service offerings for databases to eliminate your overhead for maintaining the vms.
  • Azure Security and Compliance: A key component again is security, as the digital threats are constantly evolving, and Azure provides a lot of tools to protect your workload, this is an essential piece of every architecture.
  • Building your azure skills toolkit: A good beginner course for how to get your skills up to speed with Azure.

Additional Tools and Resources, I would recommend the following:

Those are just some of the many resources that can be helpful to starting out with Azure and learning to build applications for the cloud. It is not an exhaustive list, so if you have a resource you’ve found helpful, please post it in the comments below.

Building a Solr Cluster with TerraForm – Part 1

Building a Solr Cluster with TerraForm – Part 1

So it’s no surprise that I very much have been talking about how amazing TerraForm is, and recently I’ve been doing a lot of investigation into Solr and how to build a scalable Solr Cluster.

So given the kubernetes template I wanted to try my hand at something similar. The goals of this project were the following:

  1. Build a generic template for creating a Solr cloud cluster with distributed shard.
  2. Build out the ability to scale the cluster for now using TerraForm to manually trigger increases to cluster size.
  3. Make the nodes automatically add themselves to the cluster.

And I could do this just using bash scripts and packer. But instead wanted to try my hand at cloud init.

But that’s going to be the end result, I wanted to walkthrough the various steps I go through to get to the end.  The first real step is to get through the installation of Solr on  linux machines to be implemented. 

So let’s start with “What is Solr?”   The answer is that Solr is an open source software solution that provides a means of creating a search engine.  It works in the same vein as ElasticSearch and other technologies.  Solr has been around for quite a while and is used by some of the largest companies that implement search to handle search requests by their customers.  Some of those names are Netflix and CareerBuilder.  See the following links below:

So I’ve decided to try my hand at this and creating my first Solr cluster, and have reviewed the getting started. 

So I ended up looking into it more, and built out the following script to create a “getting started” solr cluster.

sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
sudo apt-get install -y gnupg-curl
sudo wget https://www.apache.org/dist/lucene/solr/8.0.0/solr-8.0.0.zip.asc | sudo apt-key add

sudo apt-get update -y
sudo apt-get install unzip
sudo wget http://mirror.cogentco.com/pub/apache/lucene/solr/8.0.0/solr-8.0.0.zip

sudo unzip -q solr-8.0.0.zipls
sudo mv solr-8.0.0 /usr/local/bin/solr-8.0.0 -f
sudo rm solr-8.0.0.zip -f

sudo apt-get install -y default-jdk

sudo chmod +x /usr/local/bin/solr-8.0.0/bin/solr
sudo chmod +x /usr/local/bin/solr-8.0.0/example/cloud/node1/solr
sudo chmod +x /usr/local/bin/solr-8.0.0/example/cloud/node2/solr
sudo /usr/local/bin/solr-8.0.0/bin/bin/solr -e cloud -noprompt

The above will configure a “getting started solr cluster” that leverages all the system defaults and is hardly a production implementation. So my next step will be to change this. But for the sake of getting something running, I took the above script and moved it into a packer template using the following json. The above script is the “../scripts/Solr/provision.sh”

{
  "variables": {
    "deployment_code": "",
    "resource_group": "",
    "subscription_id": "",
    "location": "",
    "cloud_environment_name": "Public"
  },
  "builders": [{   
    "type": "azure-arm",
    "cloud_environment_name": "{{user `cloud_environment_name`}}",
    "subscription_id": "{{user `subscription_id`}}",

    "managed_image_resource_group_name": "{{user `resource_group`}}",
    "managed_image_name": "Ubuntu_16.04_{{isotime \"2006_01_02_15_04\"}}",
    "managed_image_storage_account_type": "Premium_LRS",

    "os_type": "Linux",
    "image_publisher": "Canonical",
    "image_offer": "UbuntuServer",
    "image_sku": "16.04-LTS",

    "location": "{{user `location`}}",
    "vm_size": "Standard_F2s"
  }],
  "provisioners": [
    {
      "type": "shell",
      "script": "../scripts/ubuntu/update.sh"
    },
    {
      "type": "shell",
      "script": "../scripts/Solr/provision.sh"
    },
    {
      "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
      "inline": [
        "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
      ],
      "inline_shebang": "/bin/sh -e",
      "type": "shell"
    }]
}

The only other script mentioned is the “update.sh”, which has the following logic in it, to install the cli and update the ubuntu image:

#! /bin/bash

sudo apt-get update -y
sudo apt-get upgrade -y

#Azure-CLI
AZ_REPO=$(sudo lsb_release -cs)
sudo echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
sudo curl -L https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
sudo apt-get install apt-transport-https
sudo apt-get update && sudo apt-get install azure-cli

So the above gets me to a good place for being able to create an image with it configured.

For next steps I will be doing the following:

  • Building a more “production friendly” implementation of Solr into the script.
  • Investigating leveraging cloud init instead of the “golden image” experience with Packer.
  • Building out templates around the use of Zookeeper for managing the nodes.