Monday, October 22, 2012

varnish ~ sometimes fails but don't tell


We were playing around with the automation around famous Varnish-Cache service and stumbled upon something... the silent killer style of Varnish Cache.

The automated naming was such that it was appending different specific service-node host-names and service-names to form backends and then load balance onto them using "director ... round-robin" configuration. It was creating names that way to avoid name collision for same service running on different nodes load-balanced by Varnish-Cache.

We checked for the configuration correctness
$ varnishd -C -f /my/varnish/config/file

It passed.

We started the Varnish service
$ /etc/init.d/varnish start

It started.

We tried accessing the services via Varnish.

It failed saying there no http service running at Varnish machine:port.

Now what. Can't figure out anything wrong in configuration VCL. So, start looking at logs.
Tried starting varnishlog service

$ service varnishlog start

This failed giving error about _.vsm being not present, which was actually present and with right user permissions.

Then a colleague of mine suggests he has faced such issue before due to extremely long backend names.

I extremely shortened the backend name and it started working.

So, the length there does effect but the VCL gives no error when starting Varnish-Cache.

BTW, from the checks performed... the maximum character backend name working for the configuration was 44 Character long.