Browsed by
Tag: Configuration

Configuration can be a big stumbling block when its comes to availability.

Configuration can be a big stumbling block when its comes to availability.

So let’s face it, when we build projects, we make trade-offs. And many times those trade-offs come in the form of time and effort. We would all build the most perfect software ever… if time and budget were never a concern.

So along those lines, one thing that I find gets glossed over quickly, especially with Kubernetes and micro services … configuration.

Configuration, something where likely you are looking and saying, “That’s the most ridiculous thing I’ve ever heard.” We put our configuration in a YAML file, or a web.config, and manage those values through our build pipelines. And while that might seem like a great practice, in my experience it can cause a lot more headaches in the long run than your probably expecting.

The problem with storing configuration in YAML files, or Web.configs, is that they create an illusion of being able to change these settings on the fly. An illusion that can actually cause significant headaches when you start reaching for higher availability.

The problems these configuration files can cause is the following:

Changing these files is a deployment activity

If you need to change a value for these applications, it requires changing a configuration file. Changes to configuration files usually are tightly connected to different restart process. Take App Service as a primary example, if you store your configuration in a web.config and you make a change to that file. App Service will automatically trigger a restart, which will cause a downtime even for you and or your customers.

This is further difficult in a kubernetes cluster, in that if you use a YAML file, it requires the deployment agent changing the cluster. This makes it very hard to change these values due to a change in application behavior.

For example, if you wanted to change your SQL database connection if performance degrades below a certain point. That is a lot harder to do when you referencing a connection string in a config file on pods that are deployed across a cluster.

Storing Sensitive Configuration is a problem

Let’s face it, people make mistakes. And of the biggest problems I’ve seen come up several times is that I hear the following statement, “We store normal configuration in a YAML file, and then sensitive configuration in a key vault.”

The problem here is that the concept of what “sensitive” means and that it means different things to different people. So the odds of something being miss-classified. It’s much easier to manage if you tell your team that for all settings, treat them as sensitive. It makes management a lot easier and limits you to a single store.

So what do we do…

The best way I’ve found to mitigate these issues, is to use an outside service like KeyVault to store your configuration settings, or azure configuration management service.

But that’s just step 1, step 2 is to on startup cache the configuration settings for each micro service in memory in the container, and make sure that you configure it to expire after so much time.

This helps by providing an option where by your microservices startup after deployment, reach out to a secure store, and cache the configuration settings in memory.

This also gains us several benefits that mitigate the problems above.

  • Allow for changing configuration settings on the fly: For example, if I wanted to change a connection string over to a read replica, that can be done by simply updating the configuration store, and allowing the application to move services over as they expire the cache. Or if you want even further control, you could build in a web hook that would force it to dump the configuration and re-pull it.
  • By treating all configuration as sensitive you ensure there is no accidental leaks. This also ensures that you can manage these keys at deployment time, and not have them ever be seen by human eyes.

So this is all great, but what does this actually look like from an architecture standpoint.

For AKS, its a fairly easy implementation, to create a side car for retrieving configuration, and then deploy that sidecar with any pod that is deployed.

Given this, its easy to see how you would implement separate sidecar to handle this configuration. Each service within the pod is completely oblivious to how it gets its configuration, it calls a micro-service to get it.

I personally favor the sidecar implementation here, because it allows you to easily bundle this with your other containers and minimizes latency and excessive network communication.

Latency will be low because its local to every pod, and then if you ever decide to change your configuration store, its easy to do.

Let’s take a sample here using Azure Key Vault. If you look at the following code samples, you can see how here’s a configuration could be managed.

Here’s some sample code that could easily be wrapped in a container for your configuration to keyvault:

public class KeyVaultConfigurationProvider : IConfigurationProvider
    {
        private string _clientId = Environment.GetEnvironmentVariable("clientId");
        private string _clientSecret = Environment.GetEnvironmentVariable("clientSecret");
        private string _kvUrl = Environment.GetEnvironmentVariable("kvUrl");

        public KeyVaultConfigurationProvider(IKeyVaultConfigurationSettings kvConfigurationSettings)
        {
            _clientId = kvConfigurationSettings.ClientID;
            _clientSecret = kvConfigurationSettings.ClientSecret;
            _kvUrl = kvConfigurationSettings.KeyVaultUrl;
        }

        public async Task<string> GetSetting(string key)
        {
            KeyVaultClient kvClient = new KeyVaultClient(async (authority, resource, scope) =>
            {
                var adCredential = new ClientCredential(_clientId, _clientSecret);
                var authenticationContext = new AuthenticationContext(authority, null);
                return (await authenticationContext.AcquireTokenAsync(resource, adCredential)).AccessToken;
            });

            var path = $"{this._kvUrl}/secrets/{key}";

            var ret = await kvClient.GetSecretAsync(path);

            return ret.Value;
        }
    }

Now the above code uses a single service principal to call upon keyvault to pull configuration information. This could be modified to leverage the specific pod identities for even greater security and cleaner implementation.

The next step of the above implementation would be to leverage a cache for your configuration. This could be done piecemeal as needed or in a group. There are a lot of directions you could take this but it will ultimately help you to manage configuration easier.

Leveraging Azure Search with Python

Leveraging Azure Search with Python

So lately I’ve been working on a side project, to showcase some of the capabilities in Azure with regard to PaaS services, and the one I’ve become the most engaged with is Azure Search.

So let’s start with the obvious question, what is Azure Search? Azure Search is a Platform-as-a-Service offering that allows for implementing search as part of your cloud solution in a scalable manner.

Below are some links on the basics of “What is Azure Search?”

The first part is how to create a search service, and really I find the easiest way is to create it via CLI:

az search service create --name {name} --resource-group {group} --location {location}

So after you create an Azure Search Service, the next part is to create all the pieces needed. For this, I’ve been doing work with the REST API via Python to manage these elements, so you will see that code here.

  • Create the data source
  • Create an index
  • Create an indexer
  • Run the Indexer
  • Get the indexer status
  • Run the Search

Project Description:

For this post, I’m building a search index that crawls through the data compiled from the Chicago Data Portal, which makes statistics and public information available via their API. This solution is pulling in data from that API into cosmos db to make that information searchable. I am using only publicly consumable information as part of this. The information on the portal can be found here.

Create the Data Source

So, the first part of any search discussion, is that you need to have a data source that you can search. Can’t get far without that. So the question becomes, what do you want to search. Azure Search supports a wide variety of data sources, and for the purposes of this discussion, I am pointing it at Cosmos Db. The intention is to search the contents of a cosmos db to ensure that I can pull back relevant entries.

Below is the code that I used to create the data source for the search:

import json
import requests
from pprint import pprint

#The url of your search service
url = 'https://[Url of the search service]/datasources?api-version=2017-11-11'
print(url)

#The API Key for your search service
api_key = '[api key for the search service]'


headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    'name': 'cosmos-crime',
    'type': 'documentdb',
    'credentials': {'connectionString': '[connection string for cosmos db]'},
    'container': {'name': '[collection name]'}
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

To get the API key, you need the management key which can be found with the following command:

az search admin-key show --service-name [name of the service] -g [name of the resource group]

After running the above you will have created a data source to connect to for searching.

Create an Index

Once you have the above datasource, the next step is to create an index. This index is what Azure Search will map your data to, and how it will actually perform searches in the future. So ultimately think of this as the format your search will be in after completion. To create the index, use the following code:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
     "name": "crimes",  
     "fields": [
       {"name": "id", "type": "Edm.String", "key":"true", "searchable": "false"},
       {"name": "iucr","type": "Edm.String", "searchable":"true", "filterable":"true", "facetable":"true"},
       {"name": "location_description","type":"Edm.String", "searchable":"true", "filterable":"true"},
       {"name": "primary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "secondary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "arrest","type":"Edm.String","facetable":"true","filterable":"true"},
       {"name": "beat","type":"Edm.Double","filterable":"true","facetable":"true"},
       {"name": "block", "type":"Edm.String","filterable":"true","searchable":"true","facetable":"true"},
       {"name": "case","type":"Edm.String","searchable":"true"},
       {"name": "date_occurrence","type":"Edm.DateTimeOffset","filterable":"true"},
       {"name": "domestic","type":"Edm.String","filterable":"true","facetable":"true"},
       {"name": "fbi_cd", "type":"Edm.String","filterable":"true"},
       {"name": "ward","type":"Edm.Double", "filterable":"true","facetable":"true"},
       {"name": "location","type":"Edm.GeographyPoint"}
      ]
     }

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

Using the above code, I’ve identified the different data types of the final product, and these all map to the data types identified for azure search. The supported data types can be found here.

Its worth mentioning, that there are other key attributes above to consider:

  • facetable: This denotes if this data is able to be faceted. For example, in Yelp if I bring back a search for cost, all restuarants have a “$” to “$$$$$” rating, and I want to be able to group results based on this facet.
  • filterable: This denotes if the dataset can be filtered based on those values.
  • searchable: This denotes whether or not the field is having a full-text search performed on it, and is limited in the different types of data that can used to perform the search.

Creating an indexer

So the next step is to create the indexer. The purpose of the indexer is that this does the real work. The indexer is responsible for performing the following operations:

  • Connect to the data source
  • Pull in the data and put it into the appropriate format for the index
  • Perform any data transformations
  • Manage pulling in no data ongoing
import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    "name": "cosmos-crime-indexer",
    "dataSourceName": "cosmos-crime",
    "targetIndexName": "crimes",
    "schedule": {"interval": "PT2H"},
    "fieldMappings": [
        {"sourceFieldName": "iucr", "targetFieldName": "iucr"},
        {"sourceFieldName": "location_description", "targetFieldName": "location_description"},
        {"sourceFieldName": "primary_decsription", "targetFieldName": "primary_description"},
        {"sourceFieldName": "secondary_description", "targetFieldName": "secondary_description"},
        {"sourceFieldName": "arrest", "targetFieldName": "arrest"},
        {"sourceFieldName": "beat", "targetFieldName": "beat"},
        {"sourceFieldName": "block", "targetFieldName": "block"},
        {"sourceFieldName": "casenumber", "targetFieldName": "case"},
        {"sourceFieldName": "date_of_occurrence", "targetFieldName": "date_occurrence"},
        {"sourceFieldName": "domestic", "targetFieldName": "domestic"},
        {"sourceFieldName": "fbi_cd", "targetFieldName": "fbi_cd"},
        {"sourceFieldName": "ward", "targetFieldName": "ward"},
        {"sourceFieldName": "location", "targetFieldName":"location"}
    ]
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

What you will notice is that for each field, two attributes are assigned:

  • targetFieldName: This is the field in the index that you are targeting.
  • sourceFieldName: This is the field name according to the data source.

Run the indexer

Once you’ve created the indexer, the next step is to run it. This will cause indexer to pull data into the index:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/run/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

reseturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/reset/?api-version=2017-11-11'

resetResponse = requests.post(reseturl, headers=headers)

response = requests.post(url, headers=headers)
pprint(response.status_code)

By triggering the “running” the indexer which will load the index.

Getting the indexer status

Now, depending the size of your data source, this indexing process could take time, so I wanted to provide a rest call that will let you get the status of the indexer.

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/status/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will provide you with the status of the indexer, so that you can find out when it completes.

Run the search

Finally if you want to confirm the search is working afterward, you can do the following:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes/crimes/docs?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will bring back the results of the search. This will bring back everything as it is an empty string search.

I hope this helps with your configuring of Azure Search, Happy searching :)!

Configuring Terraform Development Environment

Configuring Terraform Development Environment

So I’ve been doing a lot of work with a set of open source tools lately, specifically TerraForm and Packer. TerraForm at its core is a method of implementing truly Infrastructure as Code, and does so by providing a simple function style language where you can create basic implementations for the cloud, and then leverage resource providers to deploy. These resource providers allow you to deploy to variety of cloud platforms (the full list can be found here). It also provides robust support for debugging, targeting, and supports a desired state configuration approach that makes it much easier to maintain your environments in the cloud.

Now that being said, like most open source tools, it can require some configuration for your local development environment and I wanted to put this post together to describe it. Below are the steps to configuring your environment.

Step 1: Install Windows SubSystem on your Windows 10 Machine

To start with, you will need to be able to leverage bash as part of the Linux Subsystem. You can enable this on a Windows 10 machine, by following the steps outlined in this guide:

https://docs.microsoft.com/en-us/windows/wsl/install-win10

Once you’ve completed this step, you will be able to move forward with VS Code and the other components required.

Step 2: Install VS Code and Terraform Plugins

For this guide we recommend VS Code as your editor, VS code works on a variety of operating systems, and is a very light-weight code editor.

You can download VS Code from this link:

https://code.visualstudio.com/download

Once you’ve downloaded and installed VS code, we need to install the VS Code Extension for Terraform.

Then click “Install” and “Reload” when completed. This will allow you to have intelli-sense and support for the different terraform file types.

Step 3: Opening Terminal

You can then perform the remaining steps from the VS Code application. Go to the “View” menu and select “integrated terminal”. You will see the terminal appear at the bottom:

By default, the terminal is set to “powershell”, type in “Bash” to switch to Bash Scripting. You can default your shell following this guidance – https://code.visualstudio.com/docs/editor/integrated-terminal#_configuration

Step 4: Install Unzip on Subsystem

Run the following command to install “unzip” on your linux subsystem, this will be required to unzip both terraform and packer.

sudo apt-get install unzip

Step 5: Install TerraForm

You will need to execute the following commands to download and install Terraform, we need to start by getting the latest version of terraform.

Go to this link:

https://www.terraform.io/downloads.html

And copy the link for the appropriate version of the binaries for TerraForm.

Go back to VS Code, and enter the following commands:

wget {url for terraform}
unzip {terraform.zip file name}
sudo mv terraform /usr/local/bin/terraform
rm {terraform.zip file name}
terraform --version

Step 6: Install Packer

To start with, we need to get the most recent version of packer. Go to the following Url, and copy the url of the appropriate version.

https://www.packer.io/downloads.html

Go back to VS Code and execute the following commands:

wget {packer url} 
unzip {packer.zip file name} 
sudo mv packer /usr/local/bin/packer
rm {packer.zip file name}

Step 7: Install Azure CLI 2.0

Go back to VS Code again, and download / install azure CLI. To do so, execute the steps and commands found here:

https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-apt?view=azure-cli-latest

Step 8: Authenticating against Azure

Once this is done you are in a place where you can run terraform projects, but before you do, you need to authenticate against Azure. This can be done by running the following commands in the bash terminal (see link below):

https://docs.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-with-cli

Once that is completed, you will be authenticated against Azure and will be able to update the documentation for the various environments.

NOTE: Your authentication token will expire, should you get a message about an expired token, enter the command, to refresh:

az account get-access-token 

Token lifetimes can be described here – https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-token-and-claims#access-tokens

After that you are ready to use Terraform on your local machine.