Записки программиста, обо всем и ни о чем. Но, наверное, больше профессионального.

2013-08-01

С чего начать разборки

Допустим, вам выдали юниксовую систему и попросили разобраться, почему тормозит/глючит/падает. С чего начать?

Полезная шпаргалка для сисадмина:

A few “must have”:
What exactly are the symptoms of the issue? Unresponsiveness? Errors?
When did the problem start being noticed?
Is it reproducible?
Any pattern (e.g. happens every hour)?
What were the latest changes on the platform (code, servers, stack)?
Does it affect a specific user segment (logged in, logged out, geographically located…)?
Is there any documentation for the architecture (physical and logical)?
Is there a monitoring platform? Munin, Zabbix, Nagios, New Relic… Anything will do.
Any (centralized) logs?. Loggly, Airbrake, Graylog…
Who’s there?
$ w
$ last
What was previously done?
$ history
What is running?
$ pstree -a
$ ps aux
Listening services
$ netstat -ntlp
$ netstat -nulp
$ netstat -nxlp
CPU and RAM
$ free -m
$ uptime
$ top
$ htop
Hardware
$ lspci
$ dmidecode
$ ethtool
IO Performances
$ iostat -kx 2
$ vmstat 2 10
$ mpstat 2 10
$ dstat --top-io —top-bio
Mount points and filesystems
$ mount
$ cat /etc/fstab
$ vgs
$ pvs
$ lvs
$ df -h
$ lsof +D / /* beware not to kill your box */
Kernel, interrupts and network usage
$ sysctl -a | grep ...
$ cat /proc/interrupts
$ cat /proc/net/ip_conntrack /* may take some time on busy servers */
$ netstat
$ ss -s
System logs and kernel messages
$ dmesg
$ less /var/log/messages
$ less /var/log/secure
$ less /var/log/auth
Cronjobs
$ ls /etc/cron* + cat
$ for user in $(cat /etc/passwd | cut -f1 -d:); do crontab -l -u $user; done
Application logs
There is a lot to analyze here, but it’s unlikely you’ll have time to be exhaustive at first. Focus on the obvious ones, for example in the case of a LAMP stack:
Apache & Nginx; chase down access and error logs, look for 5xx errors, look for possible limit_zone errors.
MySQL; look for errors in the mysql.log, trace of corrupted tables, innodb repair process in progress. Looks for slow logs and define if there is disk/index/query issues.
PHP-FPM; if you have php-slow logs on, dig in and try to find errors (php, mysql, memcache, …). If not, set it on.
Varnish; in varnishlog and varnishstat, check your hit/miss ratio. Are you missing some rules in your config that let end-users hit your backend instead?
HA-Proxy; what is your backend status? Are your health-checks successful? Do you hit your max queue size on the frontend or your backends?

Conclusion
After these first 5 minutes (give or take 10 minutes) you should have a better understanding of:
What is running.
Whether the issue seems to be related to IO/hardware/networking or configuration (bad code, kernel tuning, …).
Whether there’s a pattern you recognize: for example a bad use of the DB indexes, or too many apache workers.



Автор говорит о том, что все эти предварительные исследования занимают 5-15 минут. Может быть, если заниматься этим не реже раза в неделю и не впадать в прокрастинацию.




original post http://vasnake.blogspot.com/2013/08/blog-post.html

1 комментарий:

Архив блога

Ярлыки

linux (241) python (191) citation (185) web-develop (170) gov.ru (157) video (123) бытовуха (112) sysadm (100) GIS (97) Zope(Plone) (88) Book (81) programming (81) бурчалки (81) грабли (77) development (73) Fun (72) windsurfing (72) Microsoft (64) hiload (62) opensource (58) internet provider (57) security (57) опыт (55) movie (52) Wisdom (51) ML (47) language (45) hardware (44) JS (41) curse (40) driving (40) money (40) DBMS (38) bigdata (38) ArcGIS (34) history (31) PDA (30) howto (30) holyday (29) Google (27) Oracle (27) virtbox (27) health (26) vacation (24) AI (23) Autodesk (23) SQL (23) Java (22) humor (22) knowledge (22) translate (20) CSS (19) cheatsheet (19) hack (19) tourism (19) Apache (16) Manager (15) web-browser (15) Никонов (15) happiness (14) music (14) todo (14) PHP (13) course (13) functional programming (13) weapon (13) HTTP. Apache (12) SSH (12) frameworks (12) hero (12) im (12) settings (12) HTML (11) SciTE (11) crypto (11) game (11) map (11) scala (11) HTTPD (9) ODF (9) купи/продай (9) benchmark (8) documentation (8) 3D (7) CS (7) DNS (7) NoSQL (7) Photo (7) cloud (7) django (7) gun (7) matroska (7) telephony (7) Microsoft Office (6) VCS (6) bluetooth (6) pidgin (6) proxy (6) Donald Knuth (5) ETL (5) NVIDIA (5) REST (5) bash (5) flash (5) keyboard (5) price (5) samba (5) CGI (4) LISP (4) RoR (4) cache (4) display (4) holywar (4) nginx (4) pistol (4) xml (4) Лебедев (4) IDE (3) IE8 (3) J2EE (3) NTFS (3) RDP (3) USA (3) holiday (3) mount (3) spark (3) Гоблин (3) кухня (3) урюк (3) AMQP (2) ERP (2) IE7 (2) NAS (2) Naudoc (2) PDF (2) address (2) air (2) british (2) coffee (2) font (2) ftp (2) messaging (2) notify (2) sharepoint (2) ssl/tls (2) stardict (2) tests (2) tunnel (2) udev (2) APT (1) CRUD (1) Canyonlands (1) Cyprus (1) DVDShrink (1) Jabber (1) K9Copy (1) Matlab (1) Palanga (1) Portugal (1) VBA (1) WD My Book (1) autoit (1) bike (1) cannabis (1) chat (1) concurrent (1) dbf (1) ext4 (1) idioten (1) krusader (1) license (1) mindmap (1) pneumatic weapon (1) quiz (1) regexp (1) robot (1) science (1) serialization (1) tie (1) vim (1) Науру (1) крысы (1) налоги (1) пианино (1)

Google+ Followers