JacksBlog: memory leaks on checkpoint R70

I've had an ongoing issue with a checkpoint R70, the RAM usage creeps up on the management node of the cluster and needs to be rebooted every 4 months. The device is currently out of support contract so I can't get any support/hotfixes/updates from checkpoint.

I've noticed that the nodes in the cluster also have RAM usage creeping up over a much longer period of time, about 1 year. I had no idea how to check the ram usage on these device so I had to find out.

The cluster is checkpoints secure platform so its just the software installed on some HP servers. They run a sort of linux OS.

I was able to run top on the server. I could see the cpd process was taking up most of the RAM. I assume this is the check point daemon.

I ran the two following commands from checkpoint. Checkpoint documentation asks you to look for failed allocations. If you see that there is a problem. Otherwise it is most likely a memory leak.

# fw ctl pstat

Machine Capacity Summary:

Memory used: 1% (29MB out of 1620MB) - below low watermark

Concurrent Connections: 0% (58 out of 24900) - below low watermark

Aggressive Aging is not active

Hash kernel memory (hmem) statistics:

Total memory allocated: 20971520 bytes in 5115 4KB blocks using 5 pools

Total memory bytes used: 3217916 unused: 17753604 (84.66%) peak: 504318 0

Total memory blocks used: 1013 unused: 4102 (80%) peak: 1351

Allocations: 1213799129 alloc, 0 failed alloc, 1213766746 free

System kernel memory (smem) statistics:

Total memory bytes used: 43472216 peak: 55769708

Blocking memory bytes used: 1403176 peak: 1440356

Non-Blocking memory bytes used: 42069040 peak: 54329352

Allocations: 220680 alloc, 0 failed alloc, 219982 free, 0 failed free

Kernel memory (kmem) statistics:

Total memory bytes used: 25670260 peak: 39394220

Allocations: 1214019254 alloc, 0 failed alloc, 1213986426 free, 0 failed free

External Allocations: 5124 for packets, 0 for SXL

# cpstat os -f memory

Total Virtual Memory (Bytes): 4271108096

Active Virtual Memory (Bytes): 1696493568

Total Real Memory (Bytes): 2123681792

Active Real Memory (Bytes): 1696399360

Free Real Memory (Bytes): 427282432

Memory Swaps/Sec: -

Memory To Disk Transfers/Sec: -

To clear the leak you can run the "CPSTOP;CPSTART" or reboot the device
Make sure you have DRAC/ILO or physical access to the box

when logging a call with CP support they will usually ask for a cpinfo
cpinfo -o mycpinfo.tgz

See which node is active in a cluster
cphaprob stat

Logs are usually in /var/log on the active node

From checkpoint documentation:

Presence of hmem failed allocations indicates that the hash kernel memory was full. This is not a serious memory problem but indicates there is a configuration problem. The value assigned to the hash memory pool, (either manually or automatically by changing the number concurrent connections in the capacity optimization section of a firewall) determines the size of the hash kernel memory. If a low hmem limit was configured it leads to improper usage of the OS memory. See „Capacity Optimization‟ in the „Firewall Health Checks‟ section for further information.

Presence of smem failed allocations indicates that the OS memory was exhausted or there are large non-sleep allocations. This is symptomatic of a memory shortage. If there are failed smem allocations and the memory is less than 2 GB, upgrading to 2GB may fix the problem. Decreasing the TCP end timeout and decreasing the number of concurrent connections can also help reduce memory consumption.

Section 1 – Physical Platform Checks

Performing a SecurePlatform Firewall Health Check Page 10

Presence of kmem failed allocations means that some applications did not get memory. This is

usually an indication of a memory problem; most commonly a memory shortage. The natural limit is

2GB, since the Kernel is 32bit.)

Memory shortage sometimes indicates a memory leak. In order to troubleshoot memory

shortage, stop the load you need to stop the load and let connections close. If the memory

consumption returns back to normal, you are not dealing with a memory leak. Such shortage might

happen when traffic volumes are too high for the device capacity. If the memory shortage happens

after a change in the system or the environment, undo the change, and check whether kmem

memory consumption goes down.

For optimum performance there should not be any failed memory allocations.

JacksBlog

Friday, 6 September 2013

memory leaks on checkpoint R70

No comments:

Post a Comment