Pantry Post-mortem

It is been almost a year since I finished working for my previous company. Now I think I can tell what happened there.

I was hired to implement the system to start instances automatically. Company needed this urgently. People waited for several weeks to get new virtual machine up and running. Everything was installed manually. At that point it was impossible to get list of software. And people needed tons of software.

What we dreamed of was 5-minute start. We spent at least 6-human-year.

We configured (or mostly half-configured) chef, wsus, sensu, AD. And implemented/ improved cookbooks for configuring clients for this services. As a result we started about 1000 instances. I believe we had to ssh/rdp to at least 400 instances to finish broken installation.

People needed admin access and we gave it. It is hard to tell now, but I think it was the bad decision. We don’t know what is installed on instances. Because People needed additional software, but they installed it on their own, and we have never find out. They promised to create the ticket, but tickets obviously were ignored.

Our dream was – install everything in automated way. But we forgot our customers – it is better to do MVP with manual step than not to provide working solution at all. We never reached 100% automated installation. We created hacks to be able to install software and we never removed those.

Our goal were self-serving teams. But teams never tried to install something in automated way. They were ready to install something manually instead of waiting. People didn’t care about future.

Looking back – we spend more time on implementing system, than system saved.

The result is all here: https://github.com/QuickbridgeLtd?utf8=%E2%9C%93&query=Pantry

 

On updating chef nodes

After almost a year of starting instances we got the problem of keeping chef-clients recent on instances.

Couple times we had problems due to bugs in old chefs.

First task was to get list of all installed chef-clients on our boxes.

We found the article about it. But the method is dirty:

Get response in Ruby, transform to json, parse it using Ruby.

Yak.

Here is simpler way using Ruby

require 'chef/knife'
Chef::Knife.new.configure_chef
result = Chef::Search::Query.new.search('node', 'name:*')[0]
versions = result.each_with_object(Hash.new(0)) {|node, hash| hash[node['chef_packages']['chef']['version']] += 1 rescue nil}

Additionally with that way we can get the list of broken nodes, those won’t have node[‘chef_packages’] at all, that is why I used rescue nil.

For cleaning, I used couple more lines:

result.select do |node|
  node['chef_packages'].nil?
end.each do |node|
  Chef::ApiClient.load(node.name).destroy if Chef::ApiClient.list.keys.include?(node.name)
  node.destroy
end

Updating was also tricky. There is omnibus_updater cookbook for keeping chef up-to-date. There is a fork which allows update windows nodes. But the easiest way I found is just update chef gem on nodes using winrm/ssh. It is not cleanest way (old versions of Ruby and old gems), but still it is fast and clear.

Winbind & Ubuntu 14.04

Recently we decided update our EC2 instances to Ubuntu 14.04.

We had simple Chef recipe based on different walk-through.

Joining to domain with

net ads join dnscreate@AWS.DOMAIN.COM

failed with

kerberos_kinit_password dnscreate@AWS.DOMAIN.COM failed: Cannot contact any KDC for requested realm

 

Same recipe worked with test kitchen. I spent couple days trying to find the reason. It appeared that winbind started to use UDP 88 instead of TCP 88. I didn’t find that in changelog, so beware.

Why I don’t like MySQL

We interviewed several people previous week. Standard questions (I hope 😉 about everything. I like to ask questions about SQL and ActiveRecord

.
The fastest way to check database’s knowledge is asking to make query with all available statements.
I use something like select all user groups which have user with more than 100 tags on photos.

where User(name:string, group:string, id:int)
Photos and UserPhotos – simple tables.

First task is to build database. Not everyone can build that scheme.
But then I can see roots of MySQL.
Only MySQL allows grouping with fields which not in select query.
People who worked with MySQL only don’t see any problem here: “It will select data what you need”.

Differences in old and new pagings in SQL Server

I’ve added SQL SERVER 2011 support to Rails ActiveRecord adapter.
Main thing that I added was new paging.
(more about it here http://msdn.microsoft.com/en-us/library/ms188385(v=sql.110).aspx)

I want to describe several problems that occurred while I implemented this.

First difference that ordering is necessary when paging is used.

Second difference is that offset must be used always, while limit isn’t necessary.

Third one is difference in limit 0.
select top(0) * from some_table
will return a list of columns.

select * from some_table order by id OFFSET 0 ROWS FETCH FIRST 0 ROWS ONLY
will raise error.

Forth difference with ordering with grouping.
For example: there are posts and comments tables.
We want select only 5 after latest posts where last comments was left.

select top(5) distinct id
from (
select posts.id, ROW_NUMBER() OVER (ORDER BY comments.posted_at) as rn_id
from posts left join comments on comments.post_id=posts.id) help_query
where help_query.rn_id > 1

In “Denali” we use only one query.
select distinct posts.id
from posts left join comments on comments.post_id=posts.id
order by comments.posted_at

is wrong. Error: ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

OK. Lets do standard trick with grouping.

select posts.id
from posts left join comments on comments.post_id=posts.id
group by posts.id
order by max(comments.posted_at)

It is almost right. But now null values on first place.

So last variant is:
select posts.id
from posts left join comments on comments.post_id=posts.id
group by posts.id
order by CASE WHEN (max(comments.posted_at) is null) then 1 else 0 end DESC, max(comments.posted_at)

SQL or NoSQL

There is one good rule to find out what database you need.

LOLCATs rule: If LOLCATs can (and will) be added in database, NoSQL can be chosen with a clear conscience.

Otherwise, other reasons should be checked:
Should data be consistent?
What kind of statistics do you need?(real-time – NoSQL, data mining – SQL)
Are there lots of relations in database? Is a lot of joins planned?