Archive for the 'solaris' Category

Solaris multi-homed hosts on separate subnets

Solaris like most UNIX type hosts can have multiple network cards on their system. It gets tricky when have 2 interfaces on different subnets since you can only have 1 default router.

Consider this example:

A Solaris server has 2 network interfaces, bge0 and bge2. bge0 has an IP of 192.168.1.1, the router on that network is 192.168.1.254. bge2 has an IP of 192.168.100.1, the router on that network is 192.168.100.254. The default route on the system in the /etc/defaultrouter is 192.168.1.254.

When a packet comes in for 192.168.100.1, Solaris will process it and send the answer out to the default router. It knows nothing about the default router on the 2nd network. If you place the 2nd router in /etc/defaultrouter, then Solaris just round-robins the IPs. So a request comes in bge2 and goes out bge0 to the default router, from bge2′s IP. If the router is configured with anti-spoofing rules, then the router will ignore that packet. Thus, the answer never reaches the client.

In comes IPFilter. This is the Solaris firewall that’s built in. After exploring many different options to try to get it to route properly for that interface by checking the ‘route’ command I found this simple rule that allows it to work:

pass out quick on bge0 to bge2:192.168.100.254 from 192.168.100.1 to any

This rule says that any traffic going out bge0 from the IP 192.168.100.1 (bge2′s IP) should be changed to go out bge1 interface and be sent to 192.168.100.254 (the default router on bge2).

Now…introduce the Solaris multipathing. This allows you to use 2 interfaces, a primary and a backup interface. To do the probe-based failure detection, you’ll need to use 3 IPs, the primary IP and 2 test IPs (one for each interface).

Consider this:

Solaris server has 4 networks, bge0 has a primary IP of 192.168.1.1, bge1 is the 2nd backup network, so the test IPs would be 192.168.1.2 (bge0) and 192.168.1.3 (bge1). It look like this with an ifconfig:

bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 192.168.1.1 netmask ffffff00 broadcast 192.168.1.255 groupname backup bge0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2 inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255 bge1: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 3 inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255 groupname backup

The server has a second network. bge2 has a primary IP of 192.168.100.1, bge3 is the 2nd backup network, so the test IPs would be 192.168.100.2 (bge2) and 192.168.100.3 (bge3). It look like this with an ifconfig:

bge2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 192.168.100.1 netmask ffffff00 broadcast 192.168.100.255 groupname backup bge2:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2 inet 192.168.100.2 netmask ffffff00 broadcast 192.168.100.255 bge3: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 3 inet 192.168.100.3 netmask ffffff00 broadcast 192.168.100.255 groupname backup

The routing table looks like this:

# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 192.168.1.254 UG 1 59593 192.168.1.0 192.168.1.1 U 1 9 bge0 192.168.1.0 192.168.1.3 U 1 0 bge0:1 192.168.1.0 192.168.1.3 U 1 3 bge1 192.168.100.0 192.168.100.1 U 1 35 bge2 192.168.100.0 192.168.100.3 U 1 0 bge2:1 192.168.100.0 192.168.100.3 U 1 30 bge3 224.0.0.0 192.168.1.1 U 1 0 bge0 127.0.0.1 127.0.0.1 UH 2 14556 lo0

If the interface bge0 fails, the IP 192.168.1.1 will fail over to the bge1 interface. If bge2 fails, the IP 192.168.100.1 will fail over to the bge3 interface.

There are rules in IPFilter that can be used, but it makes it a little trickier to ensure the failovers still work:

# Normal condition, bge0 and bge2 are primarys
pass out quick on bge0 to bge2:192.168.100.254 from 192.168.100.1 to any

# bge2 has failed
pass out quick on bge0 to bge3:192.168.100.254 from 192.168.100.1 to any

# bge0  has failed
pass out quick on bge1 to bge2:192.168.100.254 from 192.168.100.1 to any

# bge0 and bge2 has failed
pass out quick on bge1 to bge3:192.168.100.254 from 192.168.100.1 to any

These rules in IPFilter should pass the traffic the correct way in the even of any multipath failovers. You’ll need those 4 rules for each IP on the secondary network you want to route correctly. This means any Solaris containers as well. One small thing with containers is that if you have a container on just the second network, you’ll need to add these commands to a startup script in the global zone in order to have a default router for the zone to see:

        /sbin/route add default 192.168.100.254 -ifp bge2
        /sbin/route add default 192.168.100.254 -ifp bge3

Using the ‘route -p’ does not work to keep it persistent in this case, as it only remembers one of the ‘default 192.168.100.254′ routes (it ignores the -ifp part).

5 Comments »

on November 12th 2008 in solaris

Solaris Zone memory capping

There are a number of documents out there that show how to create a Solaris zone (container) with resource memory capping. I’ll only show that quickly here, what this goes into more is how to change the resources on the fly without rebooting the zone.

First you have to have created a zone with memory capping enabled. This would be done during the zonecfg setup:

zonecfg:my-zone> add capped-memory
zonecfg:zone:capped-memory> set physical=50m
zonecfg:zone:capped-memory> set swap=100m
zonecfg:zone:capped-memory> set locked=30m
zonecfg:zone:capped-memory> end

Once you zone is configured installed and running, you can view the resources of a zone:

# /bin/prctl -n zone.max-swap `pgrep -z <zone> init`
process: 999: /sbin/init
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.max-swap
        privileged      100.0MB      -   deny                                -
        system          16.0EB     max   deny                                -
# /bin/prctl -n zone.max-locked-memory `pgrep -z <zone> init`
process: 999: /sbin/init
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.max-locked-memory
        privileged      30.0MB      -   deny                                 -
        system          16.0EB    max   deny                                 -
# rcapstat -z 1 1
id zone            nproc    vm   rss   cap    at avgat    pg avgpg
2 <zone>            -      48M   36M   50M    0K    0K    0K    0K

To change the max-swap resource do the following:

# prctl -n zone.max-swap -r -v 200M `pgrep -z  <zone> init`

To change the max-locked-memory resource do the following:

# prctl -n zone.max-locked-memory -r -v 100M `pgrep -z  <zone> init`

Changing the physical memory capping is a little different, you’ll need to use the rcapadm command:

# rcapadm -z <zone> -m 100M

Then to view all the resources again, you should see the changes:

# /bin/prctl -n zone.max-swap `pgrep -z <zone> init`
process: 999: /sbin/init
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.max-swap
       privileged      200.0MB      -   deny                                 -
       system          16.0EB     max   deny                                 -
# /bin/prctl -n zone.max-locked-memory `pgrep -z <zone> init`
process: 999: /sbin/init
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.max-locked-memory
        privileged      100.0MB     -   deny                                 -
        system          16.0EB    max   deny                                 -
# rcapstat -z 1 1
id zone            nproc    vm   rss   cap    at avgat    pg avgpg
2 <zone>            -      48M   36M   100M   0K    0K    0K    0K

That’s it. To make the changes permanent, you’ll need to go into zonecfg and adjust the resources that way.

# zonecfg -z <zone>
zonecfg:my-zone> select capped-memory
zonecfg:zone:capped-memory> set physical=100m
zonecfg:zone:capped-memory> set swap=200m
zonecfg:zone:capped-memory> set locked=100m
zonecfg:zone:capped-memory> end
zonecfg:zone:> commit

This will save the zone configuration file so the next time the zone boots the memory limit will be set, otherwise the changes are only temporary.

No Comments »

on October 22nd 2008 in solaris