I've noticed that the nodes in the cluster also have RAM usage creeping up over a much longer period of time, about 1 year. I had no idea how to check the ram usage on these device so I had to find out.
The cluster is checkpoints secure platform so its just the software installed on some HP servers. They run a sort of linux OS.
I was able to run top on the server. I could see the cpd process was taking up most of the RAM. I assume this is the check point daemon.
I ran the two following commands from checkpoint. Checkpoint documentation asks you to look for failed allocations. If you see that there is a problem. Otherwise it is most likely a memory leak.
# fw ctl pstat
Machine Capacity Summary:
Memory used: 1% (29MB out of 1620MB) - below low
watermark
Concurrent Connections: 0% (58 out of 24900) - below low
watermark
Aggressive Aging is not active
Hash kernel memory (hmem) statistics:
Total memory allocated: 20971520 bytes in 5115 4KB blocks
using 5 pools
Total memory bytes used: 3217916
unused: 17753604 (84.66%) peak:
504318
0
Total memory blocks used:
1013 unused: 4102 (80%)
peak: 1351
Allocations: 1213799129 alloc, 0 failed alloc,
1213766746 free
System kernel memory (smem) statistics:
Total memory bytes used: 43472216
peak: 55769708
Blocking memory bytes
used: 1403176 peak: 1440356
Non-Blocking memory bytes used:
42069040 peak: 54329352
Allocations: 220680 alloc, 0 failed alloc, 219982
free, 0 failed free
Kernel memory (kmem) statistics:
Total memory bytes used: 25670260
peak: 39394220
Allocations:
1214019254 alloc, 0 failed alloc, 1213986426 free, 0
failed
free
External Allocations:
5124 for packets, 0 for SXL
# cpstat os -f memory
Total Virtual Memory (Bytes): 4271108096
Active Virtual Memory (Bytes): 1696493568
Total Real Memory (Bytes): 2123681792
Active Real Memory (Bytes): 1696399360
Free Real Memory (Bytes):
427282432
Memory
Swaps/Sec:
-
Memory To Disk Transfers/Sec: -
To clear the leak you can run the "CPSTOP;CPSTART" or reboot the device
Make sure you have DRAC/ILO or physical access to the box
when logging a call with CP support they will usually ask for a cpinfo
cpinfo -o mycpinfo.tgz
See which node is active in a cluster
cphaprob stat
Logs are usually in /var/log on the active node
Make sure you have DRAC/ILO or physical access to the box
when logging a call with CP support they will usually ask for a cpinfo
cpinfo -o mycpinfo.tgz
See which node is active in a cluster
cphaprob stat
Logs are usually in /var/log on the active node
From checkpoint documentation:
Presence of hmem failed allocations indicates that the hash kernel memory was full. This is not a serious memory problem but indicates there is a configuration problem. The value assigned to the hash memory pool, (either manually or automatically by changing the number concurrent connections in the capacity optimization section of a firewall) determines the size of the hash kernel memory. If a low hmem limit was configured it leads to improper usage of the OS memory. See „Capacity Optimization‟ in the „Firewall Health Checks‟ section for further information.
Presence of smem failed allocations indicates that the OS memory was exhausted or there are large non-sleep allocations. This is symptomatic of a memory shortage. If there are failed smem allocations and the memory is less than 2 GB, upgrading to 2GB may fix the problem. Decreasing the TCP end timeout and decreasing the number of concurrent connections can also help reduce memory consumption.
Section 1 – Physical Platform Checks
Performing a SecurePlatform Firewall Health Check Page 10
Presence of kmem failed allocations means that some applications did not get memory. This is
usually an indication of a memory problem; most commonly a memory shortage. The natural limit is
2GB, since the Kernel is 32bit.)
Memory shortage sometimes indicates a memory leak. In order to troubleshoot memory
shortage, stop the load you need to stop the load and let connections close. If the memory
consumption returns back to normal, you are not dealing with a memory leak. Such shortage might
happen when traffic volumes are too high for the device capacity. If the memory shortage happens
after a change in the system or the environment, undo the change, and check whether kmem
memory consumption goes down.
For optimum performance there should not be any failed memory allocations.
No comments:
Post a Comment