IBM Numaserver


→Download the IBM Numaserver Flyer



NumaConnect Adapter Card N323


→Download the NumaConnect N323 Flyer




→Download the IBM x3755 Flyer


NumaConnect Adapter Card N323

→Download the N323 Technical Data Sheet
















Numaserver with IBM x3755

Numaserver for Large Memory and Big Data

Shared Memory Solutions at Cluster

IBM®, together with Arrow OCS and Numascale, have created a system building block for memory-intensive applications.


A number of IBM x3755 servers with NumaConnect adapters constitute a large shared memory system, reducing the time to solution for important HPC applications.

Scalable, Flexible Shared Memory

The System can Scale up to 256 Terabytes (TB) of Shared Memory and makes development and operation of applications easy to handle. Large Memory systems also improve run time performance and make it possible to analyze the entire big datasets in memeroy. Not having to decompose the data set saves a lot of time and effort and avoids the I/O bottlenecks of swapping data in and out from the storage system.


One Single Operating System


A NumaConnect shared memory system is operated by a single image operating system. This reduces the effort for maintaining operating system software and applications and leaves more of the combined memory space available for applications.



NumaConnect Adapter Card N323



The N323 is connected to the server motherbpard via a cable and pick-up module.


Adapter set is available for IBM x3755 M3 servers.


NumaConnect in IBMx3755 M3


N323 in a IBM x3755


NumaConnect Essentials

ccNuma and Numa low latency shared memory interconnect

Virtualizes Everything, Including Memory and IO

>10x price/performance benefit over proprietary solutions

Seamless Scaling of Application Size and Performance - NO Porting Efforts

Scalable, Cache Coherent, Shared Memory System Interconnect
AMD processor nodes with Coherent HyperTransport
Based on field proven design
Enables commodity cost level for high-end servers


NumaConnect Features

Converts between snoop-based (broadcast) and directory based coherency protocols

Write-back to Remote Cache

Non-coherent transactions (for optimized MPI)

Pipelined memory access (16 outstanding transactions + 16 non-coherent)

Remote Cache size up to 4GBytes (remote data)


NumaConnect RAS Features

ECC for single bit correction and double bit detection

Automatic scrubbing after single bit error detection

Automatic background scrubbing to minimize probability of soft error accumulation

Flexible micro-coded coherence processing engine

Watch-bus for internal activity observation in real-time

Built-in Performance Counters


NumaConnect Specifications

Bandwidth to the node-local CPUs
- 1 cHT link (16+16) @800MHz DDR = 6.4GB/s over HT Proprietary Cable


Latency for remote accesses
- Short time in node by-pass FIFOs
- Few "hops" on an average access patterns
- Only one or two dimension switch delays worst-case for 2-D or 3-D Torus topologies

Link Speed and capacity
- 4 lane SerDes 4Gb/s per link, 6 links => 96Gb/s = 9.6GBytes/s x2 => 19.2GBytes/s
- Net average throughput on a ring is ≈1.4 times, unidirectional link speed with random access patterns less link and fabric overhead, total for 6 links => 26.9GBytes/s (multiple senders can be active simultaneously)


Remote Cache (RMC)
- 2 or 4 GBytes per node, configurable
- System Performance expected to be more dependent on large size rather than faster access time => use of DRAM
- RMC access time will be close to neighbor CPU node-local memory access time

Address Range
- 12 bits Node ID = 4k nodes max. (Multiple sockets per node possible)
- 48 bits address (256 Terabytes)

- Local Node address range:

• N323-22 56 GigaBytes

• N323-44 112 GigaBytes

• N323-48 240 GigaBytes


The Change is on

From Cluster
shared memory with ccnuma by numachip
To Scalable ccNUMA SMP





Cache Coherence - ccNuma - Clusters - Coherent - Directory Based Cache Coherence - Hypertransport - InfiniBand - Numa - NumaChip - Numascale - Snooping