在O2的项目的ccps too many open files 的issue调查的过程中,
出现很多问题,刚才做了一个测试:
1,如何查看当前的进程打开的文件个数(这个数是实时波动) 下面以ccps为例说明
1)取得程序对应的PID(进程号)
ps –ef | grep ccps 执行
[root@vvmocmp1 ccps]# ps -ef | grep ccps
root 5661 1 0 20:33 pts/2 00:00:00 /bin/sh /opt/OC/ccps/jboss-4.2.3.GA/bin/run.sh -c all -g ccpsgroup -b 0.0.0.0 www.zhishiwu.com
root 5685 5661 94 20:33 pts/2 00:00:17 /usr/java/jdk1.6.0_13/bin/java -Dprogram.name=run.sh -server -Dcom.sun.management.jmxremote -Djava.awt.headless=true -Xms1024m -Xmx1024m -XX:PermSize=64m -XX:MaxPermSize=256m -Djava.util.logging.config.file=/opt/OC/ccps/jboss-4.2.3.GA/server/all/ccps/applicationContext/logging.properties -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n -Djava.net.preferIPv4Stack=true -Djava.endorsed.dirs=/opt/OC/ccps/jboss-4.2.3.GA/lib/endorsed -classpath /opt/OC/ccps/jboss-4.2.3.GA/server/all/ccps/applicationContext:/opt/OC/ccps/jboss-4.2.3.GA/bin/run.jar:/usr/java/jdk1.6.0_13/lib/tools.jar org.jboss.Main -c all -g ccpsgroup -b 0.0.0.0
从上可知道,当前的ccps进程所有者是root, pid为5685.
2) 用取得pid号,来实时取得此进程打开的文件数
ls -l /proc/5685 /fd/ | wc -l www.zhishiwu.com
(注意:网上所说的用lsof -p pid,可以查询进程打开的文件数,但通过实验,不准确
真正的是前者可以实时反应,当超过ulimit -n值后,马上出现 too many open files 错误)
同时root用户也受到ulimit的限制
在当前session用ulimit -n可以查看当前用户的限制,默认为1024
2,查看当前用户的进程最大打开文件数限制(ulimit –n 默认为1024),也可以用limit –a查看
[root@vvmocmp1 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 143359
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited www.zhishiwu.com
3,修改用户的最大打开文件数
在/etc/security/limits.conf这个文件中添加,
root soft nofile 500
root hard nofile 500
说明: 实验目的是为了让root 的ccps进程出现too many open files.这里故意很小500
因为ccps所有者是root, 所设置root用户。
设置后,要重新用root用户登录server, 这时修改已生效,可以用ulimit –n查看
4,一定在3步骤重新登录的session中,启动ccps,这样ccps的同一时打开的文件一旦超过500就会出错。
[root@vvmocmp1 ccps]# tail -f ./log/ccps.log
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:270)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:224)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
at org.quartz.core.JobRunShell.run(JobRunShell.java:203)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 13 more
[12-22 20:38:20,564] WARN [DefaultQuartzScheduler_Worker-8] SystemResourceMonitor.updateDiskInfoMap(826) | Can't execute "/bin/df -lP /var/opt/OC/ccps/cdr/local/" command www.zhishiwu.com
java.io.IOException: Cannot run program "/bin/df": java.io.IOException: error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:431)
at java.lang.Runtime.exec(Runtime.java:328)
at com.hp.opencall.ccps.ccs.performance.SystemResourceMonitor.updateDiskInfoMap(SystemResourceMonitor.java:824)
而这个时候的ls –l查看的在500左右,而用lsof –p查看的却800多了没有出错。
[root@vvmocmp1 ccps]# ls -l /proc/5685/fd/ | wc -l
489
说明:测试完后,一定要把上面添加的两行去掉。
作者 RogerZhuo