On updating chef nodes

After almost a year of starting instances we got the problem of keeping chef-clients recent on instances.

Couple times we had problems due to bugs in old chefs.

First task was to get list of all installed chef-clients on our boxes.

We found the article about it. But the method is dirty:

Get response in Ruby, transform to json, parse it using Ruby.

Yak.

Here is simpler way using Ruby

require 'chef/knife'
Chef::Knife.new.configure_chef
result = Chef::Search::Query.new.search('node', 'name:*')[0]
versions = result.each_with_object(Hash.new(0)) {|node, hash| hash[node['chef_packages']['chef']['version']] += 1 rescue nil}

Additionally with that way we can get the list of broken nodes, those won’t have node[‘chef_packages’] at all, that is why I used rescue nil.

For cleaning, I used couple more lines:

result.select do |node|
  node['chef_packages'].nil?
end.each do |node|
  Chef::ApiClient.load(node.name).destroy if Chef::ApiClient.list.keys.include?(node.name)
  node.destroy
end

Updating was also tricky. There is omnibus_updater cookbook for keeping chef up-to-date. There is a fork which allows update windows nodes. But the easiest way I found is just update chef gem on nodes using winrm/ssh. It is not cleanest way (old versions of Ruby and old gems), but still it is fast and clear.

Differences in old and new pagings in SQL Server

I’ve added SQL SERVER 2011 support to Rails ActiveRecord adapter.
Main thing that I added was new paging.
(more about it here http://msdn.microsoft.com/en-us/library/ms188385(v=sql.110).aspx)

I want to describe several problems that occurred while I implemented this.

First difference that ordering is necessary when paging is used.

Second difference is that offset must be used always, while limit isn’t necessary.

Third one is difference in limit 0.
select top(0) * from some_table
will return a list of columns.

select * from some_table order by id OFFSET 0 ROWS FETCH FIRST 0 ROWS ONLY
will raise error.

Forth difference with ordering with grouping.
For example: there are posts and comments tables.
We want select only 5 after latest posts where last comments was left.

select top(5) distinct id
from (
select posts.id, ROW_NUMBER() OVER (ORDER BY comments.posted_at) as rn_id
from posts left join comments on comments.post_id=posts.id) help_query
where help_query.rn_id > 1

In “Denali” we use only one query.
select distinct posts.id
from posts left join comments on comments.post_id=posts.id
order by comments.posted_at

is wrong. Error: ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

OK. Lets do standard trick with grouping.

select posts.id
from posts left join comments on comments.post_id=posts.id
group by posts.id
order by max(comments.posted_at)

It is almost right. But now null values on first place.

So last variant is:
select posts.id
from posts left join comments on comments.post_id=posts.id
group by posts.id
order by CASE WHEN (max(comments.posted_at) is null) then 1 else 0 end DESC, max(comments.posted_at)