ELI5 Software & Technology

ELI5 Software & Technology gathers up simple explanations in a layperson-friendly manner

Name ELI5 Source Tags
Ansible

Ansible is a kind of scripting language for distributed procedures. You have probably seen shell scripts for certain procedures before such as installing and configuring the required software for a service. It may create a user with the right permissions, install some packages, download and compile source code, fill inn a configuration file, start services, etc. Writing such a script does not take much longer then just typing inn the commands by hand. However making sure it works every time requires a lot more time and effort. What if a user already exists but with the wrong permissions, what if a file is already downloaded, what if a command returns an error message, etc. You suddenly need a lot of code around each statement to handle the exceptional cases. Then you get into the issue of how to write a script that is able to do things on multiple servers, especially if the order of operations is important. And there might be similar servers but with slight configuration differences which have to be taken into consideration when writing the scripts.

Ansible is designed to handle all these cases for you. It can handle most of the exceptional cases by itself. You usually describe how you want the system to look after that task is complete. For example yo may say that a service should be running and enabled to start at boot on a specific group of machines and Ansible will find out how to make that happen.

Reddit Ansible, Devops, and Automation
CAP Theorem

The CAP theorem by Eric Brewer states that any distributed system with shared data is likely to see tension between the following systemic properties:

Consistency - All the replicas are in sync and maintain the same state of any given object at any given point of time. Also known as sequential consistency.

Availability - A request will eventually complete successfully. A read/write request on any node of the system will never be rejected as long as the particular node is up and running.

Network Partition Tolerance - When the network connecting the nodes goes down, the system will still continue to operate even though some/all nodes can longer communicate with each other.

Alternately, the theorem states that it is possible for the system to satisfy any two but not all of the properties mentioned above.

Quora Distributed Systems
Docker

Docker is a container for an application. In the old days, applications were run directly on servers and sometimes they had to pay special attention to what kind of server it was running on. Docker now says to the application: ‘here, expect this kind of hardware and I’ll make sure the actual hardware acts that way, so you don’t have to worry about interfaces, storage, etc’.

Docker then says, ‘By the way, I know applications usually like to be configured and connect to certain things, why don’t you tell me what those are so then the higher-ups can make sure you get what you need?’’ The application obliges, and Docker then exposes settings and connections so sysadmins can point them where they need to be easily, for many containers.

Now for the part most people may not see immediately: Docker then talks to any number of management softwares, saying ‘Here, configure me’, along with tens, hundreds, thousands of other containers. With configuration exposed, containers are easier to manage and change. And then, with software made to launch docker container containers, you can put out new containers with new configurations easily, then kill off the old ones or leave them around in case anything goes wonky. Some OSes, like CoreOS, go one step further and let you move Docker containers around to different hardware automatically, making the servers a layer instead of a single thing, as well as making configurations easier by copying (replicating) them among all the serves automatically.

TL;DR: Docker is mostly exciting because it makes the application more portable, changing the cornerstone of how IT used to be (application stuck on one machine, hard to change/upgrade, hard to manage in general).

It’s not the only containerized platform either, but they all follow the same basic concept. Instead of virtualizing the hardware and software, you virtualize the hardware and give the software a very rigidly defined set of interfaces that make it easier to make things that scale, are redundant, and resilient.

Reddit Docker and Containers
Hadoop

Some computational problems can be solved very easily by breaking down the data into smaller buckets. For (a rather simple) example, say you’re trying to find the largest number in a hundred million numbers. You can look through all of them one by one. Say you have a powerful computer than can look through a million numbers an hour, you’ll need 100 hours or little over 4 days to do this.

Now if you broke your data into a 100 pieces, and gave it to 100 computers, each computer will find its largest number in 1 hour and then you spend a few more seconds to find the largest among those hundred, you’re pretty much done in about an hour.

Problems like these are called embarrassingly parallel and the method of breaking it down(Mapping) into pieces and then joining the individual results to form a global result(Reducing) was described in a Google paper, and the technique is called MapReduce.

Hadoop is an open source software that makes doing mapreduce type programming easier. You dont have to worry about installing the program on your 100 machines, breaking your initial data into pieces, copying it to all 100 machines, copying results over from 100 machines, etc. All the housekeeping is managed by Hadoop. Once you setup a hadoop cluster over the 100 machines, you can give it any program and data and it takes care of all the behind the scenes work and give you back the result

Reddit Distributed file system, Big Data, and Apache
Kafka

Kafka is useful because servers crash, and a Kafka cluster can keep itself together even if one Kafka server crashes. If your producers send directly to your consumers, then if any consumer crashes before doing work, you’ve lost those messages. If a Kafka consumer crashes, it can just re-ask Kafka for the data when it starts up; if a Kafka server crashes, the producers and consumers can just talk to a different server (unless too many Kafka servers crash at once, and you can configure how many is ‘too many’).

Kafka is also useful because it gives you a level of abstraction between your producers and your consumers. They both only need to know how to find Kafka, and the system will configure itself based on what they say. If your producers talk straight to your consumers, then they not only need to know who all the consumers are, they need to know if you ever add a consumer.

Reddit Message Broker and Stream Processing
Kubernetes

Container systems, e.g. Docker, create packages (images) for applications, which make them more portable, easing the development process. Docker can be used to run sets of containers on a single host based on these images.

When companies want to run these containers at scale, they need a system which can manage deploying containers across a set of systems. These systems will schedule sets of containers to sets of VMs (or physical servers) and can do things like move containers around from VM to VM automatically as needed.

Kubernetes is the most popular of the set of systems which carry out this function.

Reddit Kubernetes, Orchestration, and Containers
Multi-Tenancy

Multi-tenancy means that a single instance of the software and its supporting infrastructure serves multiple customers. Each customer shares the software application and also shares a single database. Each tenant’s data is isolated and remains invisible to other tenants.

Digital Guardian SAAS and Distributed Systems
Node.js

Node.js is a JavaScript interpreter plus a set of libraries that let your code interact with the operating system (something unusual for JavaScript, which is normally sandboxed by the browser) . The javascript interpreter part is borrowed directly from Chrome. Chrome’s V8 engine was open-sourced by Google, and it is used to run javascript similarly to how Java code can be run by the Java Runtime Environment, and C# can be run by the .Net runtime environment. One difference is that javascript interpreters don’t make you compile your code first, so you can edit the human-readable source code and immediately re-run it to see the results.

The end result is that developers are now able to write new kinds of applications in javascript. They aren’t limited to client-side web apps now. They can run javascript on the server-side, as well as on the desktop now.

It opens javascript up to a lot of new functionality. Some people hate this, because javascript can be a pretty messy language. But, personally, I find the language’s flexibility refreshing. It lets you drop in and out of object-oriented and functional programming as you see fit, and has powerful type coercion to go with its dynamic typing. It’s also UTF-8 by default, so it works well around the world. And javascript is one of the easiest languages to learn. But it has enough depth that it takes years to master.

One related tool that has grown to be fundamental to the Node.js ecosystem is the package manager, called NPM. It makes it really simple to find and use other people’s code within your own project. And since javascript isn’t compiled as a separate step, you can edit any code from any package. You can use it unaltered or altered. And if you alter it, you can easily reshare your version of it.

What else? Node.js makes it really quick and easy to write any type of program that uses the internet. Sending/receiving/interpreting/responding to data over the internet can be cumbersome in many languages like the .NET family, Java, and Perl. It takes a lot of lines of code in these languages. But Node.js makes these kinds of programs simple. You can do most internet-related tasks in 5 - 50 lines of code, mostly because javascript was built for the internet. .NET was also built for the internet, but it had to be based on and inter-operate with C++.

Reddit Runtime environment and Javascript
Single-Tenancy

A single instance of the software and supporting infrastructure serve a single customer. With single tenancy, each customer has his or her own independent database and instance of the software. Essentially, there is no sharing happening with this option.

Digital Guardian SAAS and Distributed Systems

Built with Jekyll + ListJS + Bootstrap. Credit to jekyll-db.