12/17/2018

FB: A Billion user load balancer

https://www.usenix.org/conference/lisa16/conference-program/presentation/shuff
  • A 2016 presentation by PE - Patrick Shuff
  • Tb of egress traffic leaving FB routers, 85% from outside US
FB Req Flow vs Architecture
  • Client - DNS AAAA req for facebook.com
  • Client - Router
  • Router - ECMP to L4LB(ipvs)
  • L4LB(ipvs) - Http GET to L7LB(proxygen)
  • L7LB(proxygen) - Http GET to HHVM (websever)
  • Router + L4LB + L7LB + HHVM = a cluster/DC
  • Data flow:
    • router/ECMP --> L4LB (ipvs) --> L7LB(proxygen) --> HHVM
    • L4LB = 10+             
    • L7LB = 100+
    • HHVM = 1000+
  • And L4LB/L7LB/HHVM are NOT dedicted servers or devices. All x86 servers dynamically allocated. 
L4/L7 LB:
  • L4LB (ipvs) + xBGP, a python BGP dameon to talk to TOR to announce 
    • ipv4:/32, 
    • ipv6:/64
  • Router to L4LB, just ECMP hash
  • L4LB to L7LB, hash + state table
  • 要考虑两种failures
    • 1) L4LB down, 新的L4LB用同样的hash,到原来的HHVM;
    • 2) L7LB down, TCP breaks, L4LB hash到不同的L7LB, 如果L7LB back up, 因为有state table,不会影响TCP
  • DSR = direct server access
    • return traffic bypass the L4LB, L7LB to router. 
    • L4LB ==ipInip== L7LB
    • 然后L7LB直接 original ipv4 packet back to client. 和微软很像
Speed up 
  • 用PoP early terminates客户TCP/SSL handshake,这样加快链接建立的速度,而不是到DC
  • Pop has direct SSL connections to DC. 
  • 需要看TCP (3-way), SSL (4-way)
DNS
  • Real time monitor


No comments:

Post a Comment