Sign in to view Peter’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Peter’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Pleasanton, California, United States
Sign in to view Peter’s full profile
Peter can introduce you to 10+ people at Meta
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
5K followers
500+ connections
Sign in to view Peter’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Peter
Peter can introduce you to 10+ people at Meta
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Peter
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Peter’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
5K followers
-
Peter Hoose shared this𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗮𝗻𝗱 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 @ 𝗦𝗰𝗮𝗹𝗲, 𝗹𝗶𝘃𝗲 𝗮𝗻𝗱 𝗶𝗻 𝗣𝗲𝗿𝘀𝗼𝗻 - 𝗧𝘂𝗲𝘀𝗱𝗮𝘆 𝗠𝗮𝘆 𝟲𝘁𝗵 - 𝗦𝗲𝗲 𝘆𝗼𝘂 𝘁𝗵𝗲𝗿𝗲! I’m very excited to see all of you back in person for the first time since 2019! We have a great agenda lined up with talks from Meta, Netflix, Pure Storage, Nvidia, AMD, Google, Pinterest and many more. Covering the latest challenges, and advances at the forefront of AI systems research and reliability engineering. From practical and pragmatic, to theoretical and inspiring and all points between. I’m personally excited to hear about the exciting work going on across the industry, including a fire side chat with Jay Parikh of Microsoft, led by Francois Richard. As well as share all the innovation happening at Meta from Data Centers, Servers, Systems up to the latest products serving nearly half the world’s population on a daily basis. If you are in the Bay Area 5/6, come check it out in person, and say hi! Sign-up here: https://lnkd.in/ghRJDnBU My colleague Surupa and I will get us started covering the innovations of the past twelve months, and what the future holds. Looking forward to seeing you in person Learn more: https://lnkd.in/g3gm2qyF
-
Peter Hoose shared thisWhen you scale, things fail. Building systems that react gracefully to failure is key. Next week the @Scale conference will be focused on Reliability, and learning from major incidents across many of the biggest infrastructure teams out there - AWS, Google, Fastly and Meta. I'm personally very excited to see Avery Berchek from the Core Systems PE team who will be talking about the work the team has done to improve the safety and reliability of configuration changes! Tune in next week: https://lnkd.in/gYHxWHBAPeter Hoose shared thisJoin us for Reliability @Scale on August 31, 2022. The event will be hosted virtually with talks themed around large-scale outages, incident response and learnings, and measuring reliability at scale. Joining us are speakers from Akamai, Fastly, Google, Meta, and Roblox. #ReliabilityatScale22 Learn more here: https://lnkd.in/ea8bu_jQ
-
Peter Hoose shared thisThis is really cool, some of the folks in the Dublin Ireland team kicked this effort off with partners in our Menlo Park CA hardware team. A great example of serendipitous encounters leading to great engineering outcomes.Peter Hoose shared thisGreat job by the team - opening up time devices for better scale-out architectures is a major step for DC's and also for WAN's/Wireless networks in the future: https://lnkd.in/g4UsFg6x
-
Peter Hoose shared thisNetwork @ Scale is coming up Monday 9/9 - with some interesting talks from folks on the team: Secure Reliability: Tales from mysterious platforms - Jose Leitao & Jade Auer Operating Facebook’s SD-WAN Network - Palak Mehta If you're attending, hope to see you there! https://lnkd.in/gsSBJin
-
Peter Hoose reacted on thisPeter Hoose reacted on thisNeha Narkhede 𝗥𝗲𝗰𝗼𝗴𝗻𝗶𝘇𝗲𝗱 𝗮𝘀 𝗮 𝗙𝗼𝗿𝗯𝗲𝘀 𝟮𝟱𝟬 𝗚𝗿𝗲𝗮𝘁𝗲𝘀𝘁 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗼𝗿 Oscilar Co-founder and CTO Sachin Kulkarni has celebrated a high-velocity milestone for his co-founder Neha Narkhede, who was recently named to the Forbes 250 Greatest Innovators list. Recognized alongside titans like Jensen Huang and Sam Altman, Narkhede is credited with architecting the #ApacheKafka data infrastructure that powers the AI-enabled internet - a mandate she has now transitioned into the fintech sector to bridge the gap between legacy risk management and real-time AI decisioning. The mandate centers on "Day One Intensity and Zero-Base Credibility," providing a primary framework for how the duo built Oscilar from a startup booth into an industry disruptor. By integrating her elite background - having previously built #Confluent into a public powerhouse - with a "no shortcuts" approach to customer acquisition, Narkhede remains the lead architect for #Oscilar, delivering the technological depth required to outrun fraud and credit complexity while solidifying the firm’s position as the foundational intelligence layer for the modern financial ecosystem. Megan Shirley | Jessica Jönzzon| Colin Cusa | Nick Schultze | Joe Zhou | Natalie Najarian | Brittany Cates | Xavier H. | Ben Arnstein #NehaNarkhede #Oscilar #Forbes250 #ApacheKafka #FintechInnovation #StartupLeadership #AIInfrastructure #WomenInTech #RiskManagement #Entrepreneurship
-
Peter Hoose liked thisPeter Hoose liked thisThe best days at work are the ones that don’t feel like work! I hope we made the stroll through Southgate Park in Hayward a little more enjoyable. Prologis #PLDIMPACT
-
Peter Hoose liked thisOne of the hard-won lessons of distributed systems is that techniques that improve steady state reliability can counter-intuitively increase the chance of a total meltdown. Caching, retries, failover, load balancers, auto-scaling, queues, ... cause systems to have complex dynamic behavior, and that leads some failures to be amplified and persisted. Marc's tool looks like an awesome way to understand these metastable failures. Nice!Peter Hoose liked thisOver the last couple weeks, I've been experimenting with a new way to teach folks about stability and metastable failures in distributed systems. Following some positive early feedback, I'd like to share https://lnkd.in/gyeGM6Vz Stability sim is a simple, interactive, simulator that allows you to explore some of the behaviors that cause long outages even in simple distributed systems, and understand the pitfalls of caches, simple retry strategies, round-robin load balancing, and other common design patterns. Some ways to get started: 📺 Watch a basic demo: https://lnkd.in/gH6nxs2E 📈 Check out one of the built-in examples: https://lnkd.in/gYQPSG8p 🛝 Or just get playing: https://lnkd.in/gyeGM6Vz This is very much a v0.1, and I'm sure there are tons of bugs and missing features. If you'd like to help improve it, or build your own: https://lnkd.in/gHx3EB3b
-
Peter Hoose liked thisPeter Hoose liked thisToday we shipped Muse Spark and a major upgrade to Meta AI. Muse Spark is the first model from Meta Superintelligence Labs. We've been working towards this as a team for the past nine months. Muse Spark is our first offering: a natively multimodal reasoning model that's small, fast, and quite capable. It powers a smarter Meta AI starting today. Here's what's different: - You can experience the reasoning of Muse Spark in Thinking mode. This is what I use the most to build artifacts, reason through some hard problems. We're also working on releasing a new Contemplating mode that spins up multiple agents working in parallel, like having a team of researchers tackling your question simultaneously. - Muse Spark has great multimodal performance. I especially love visual grounding - go take a picture of a snack shelf and ask it to build an artifact for you on which ones are the healthiest (or not). - Shopping mode pulls from real people and surfaces styling inspiration and brand storytelling. - Visual coding. Ask Meta AI to build a dashboard, spin up a retro arcade game, create a flight simulator - incredibly fun. Muse Spark powers the Meta AI app and meta.ai today, and will be rolling out to WhatsApp, Instagram, Facebook, Messenger, and AI glasses in the coming weeks. Try it now: meta.ai
-
Peter Hoose reacted on thisPeter Hoose reacted on thisRecently our partners at Major League Hacking hosted their first Production Engineering-focused hackathon. Production Engineering is a specialty software engineering field focused on making large distributed systems work in the real world, and it's where I've spent most of my career. Over 600 hackathon participants, spread out over 100 teams, were tasked with taking a basic URL shortening app and turning it into a hardened production grade service capable of gracefully withstanding high load, along with monitoring, alerting, and continuous deployment. I'm really proud of what the teams accomplished in a short amount of time and what they built. I'm also proud of my 12 Meta colleagues -- many of them MLH alumni themselves -- for helping out with the hackathon in judging, speaking, and mentoring roles. I wanted to signal boost our the winning teams and hey, companies, if you're looking to hire the next generation of talent, give some of these folks a chance. And yes! Meta will be running the Production Engineering Fellowship for our 6th consecutive year in 2026 -- details to come soon. * Best All-Round Team: Five Nines Club (https://lnkd.in/gjqwvFus) * Best Team - Reliability: PipeLie (https://lnkd.in/gt_EYcUM) * Best Team - Scalability: Curtain (https://lnkd.in/gQvBCAXK) * Best Team - Incident Response: kadi (https://lnkd.in/gT-5YvPA) * Best Beginners’ Team (≥50% first-time hackers): SREnity Squad (https://lnkd.in/gMmTcE3K) Huge thank you MLH and thank you to all the participants, looking forward to the next one!SREnity Squad - URL Shortener + Incident Response DashboardSREnity Squad - URL Shortener + Incident Response Dashboard
-
Peter Hoose liked thisPeter Hoose liked thisFounders Club hits different. When you're deep in the day-to-day, it's hard to see progress. Then you take a step back and realize how much has actually changed. Just few years ago, Oscilar started with a hunch that felt obvious to us but wasn't widely shared: risk infrastructure was broken because it was fragmented. Fraud teams, compliance teams, credit teams, onboarding teams — all running separate systems, separate data, separate models. We believed the answer was a unified decisioning layer, AI-native from the ground up, where every risk signal could be evaluated together in real time. Today, we're processing tens of billions of real time decisions, with 100+ data integrations and an AI agent platform running in production. We've grown our footprint across the U.S., Europe, MENA, and Latin America. We get to work with companies shaping the future of financial services like SoFi, MoneyGram, Nuvei, Payoneer, and Clara. Zoom out a bit more, and what stands out is who made it happen. The team that keeps raising the bar. The customers and partners who trusted us early and keep pushing us forward. The builders and operators shaping this category alongside us. Founders Club is about recognizing that. This group represents people at Oscilar who've driven real impact for our customers, our partners, and the platform. To this year's winners: you've set the standard. And to everyone who's been part of this journey, thank you. Everything we've built so far is because of you. And we're just getting started.
-
Peter Hoose reacted on thisPeter Hoose reacted on thisIn honor of The United States' 250th birthday, Forbes released its 250 Greatest Innovators list. My co-founder Neha Narkhede is named alongside the likes of Sam Altman, Warren Buffett, Taylor Swift, Jensen Huang, and Jeff Bezos. She is recognized for creating the data infrastructure that makes an AI-enabled internet possible. Impressive company, but you wouldn't know the most compelling thing about her just from seeing her name there. A few years ago, Neha and I had our first Oscilar booth at Fintech Nexus in New York. Just a small startup table with a monitor on it. People who recognized Neha stopped by, quite confused. The person who co-created Apache Kafka, who took Confluent public, standing behind a folding table at a fintech conference. Some of them couldn't quite process it. But you know what? We signed our first customer from that booth. That moment says more about Neha than any list does. She didn't show up because she had something to prove. She showed up because she believed in what we were building, and she understood that building a company means starting from zero. No shortcuts. No borrowed credibility. Just the work. She didn't treat Oscilar like a victory lap. She treated it like day one. Same intensity, same willingness to do the unglamorous work, same belief that you earn credibility one conversation at a time. When your co-founder has already built something used by 90% of the Fortune 100, you might expect a certain kind of energy. With Neha, it's always been: what's the next hard problem, and how do we solve it better than anyone else. What I've seen consistently is someone who operates the same way whether the room has 5 people or 5,000. Congratulations, Neha. The list is a tremendous achievement. But the work is even more impressive. It's an honor to build alongside you.
-
Peter Hoose reacted on thisPeter Hoose reacted on thisAfter seven years at Meta, I've decided to leave. It was a pretty incredible journey, and the amount of learning that happened, as a leader and as an infrastructure engineer, in my role leading Production Engineering for online Datastores and eventually for all of the Core Data PE org was a once in a lifetime opportunity. The sheer amount of brilliant and kind and generous people I got to work with was a paradigm shift for me, even in the face of COVID, AI, layoffs, and significant cultural and product changes that I simply could not align with. So, here I am, getting back to my roots in databases; consulting and partnering with companies and orgs just hitting their growth hockey stick, not to mention I wanted more time for my family, my own projects and writing the sequel to the Database Reliability Engineering book. Its a new world and I'm very excited to contribute. And so, here we are. I've moved to Porto, Portugal with my wife, and am cycling, coding, writing and getting a moderate amount of sleep! I'm very excited about this next book on DBRE, and I hope you'll all follow along as I announce more soon.
-
Peter Hoose liked thisPeter Hoose liked thisOver a year ago, we started construction on what will be our largest data center to date in Richland Parish, Louisiana. This site will be home to our Hyperion AI supercluster and one of the most ambitious data center projects in the world. As Mark shared last year, this site has the ability to scale up to 5GW IT capacity — the kind of massive compute capacity that is essential to powering the future of AI. Developing a data center at this scale requires aligning multiple critical factors: energy, land, strong local partners, and a skilled workforce. In Louisiana, we undeniably have all of these. An important part of this work is securing reliable energy to support our growth. We've been working closely with our energy partner, Entergy, since the first days of site planning to intentionally ensure our power needs are met without consumers paying our costs. Entergy's filing today for new utility infrastructure represents one of several factors needed to move an expansion of this project forward, and demonstrates the business-friendly environment in Louisiana that makes projects like this possible. Our agreement is expected to deliver approximately $2 billion in customer savings to Entergy Louisiana customers; this is in addition to the $650 million previously announced. Along with this new energy agreement, we’re also expanding our support of Entergy's affordability initiatives with $200 million in funding through annual contributions to Entergy's Power to Care and residential energy efficiency programs and committing to funding 2.5GW of clean and renewable energy. Earlier this month, I had the opportunity to visit Richland Parish and see firsthand the remarkable progress our teams are making to bring this data center to life. What stood out just as much as the construction milestones was the strength of the partnerships we've built on the ground. From Office of Governor Jeff Landry, Entergy to Louisiana Economic Development and so many others across the community, the collaboration has been extraordinary. This project is a testament to what's possible when we bring our #OneTeam approach to the communities where we operate. I couldn't be more proud of all of our partnerships in Louisiana that continue to turn this vision into reality. We look forward to sharing more in the weeks ahead. https://lnkd.in/gHBUKWyW
Experience & Education
-
Meta
**** ********** ********** ***********
-
*** *******
****** ******* ******** * *********
View Peter’s full experience
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Publications
-
Traceflow
IETF
See publicationProposal for end to end inband path detection and identification for the purposes of verification and fault isolation.
Recommendations received
1 person has recommended Peter
Join now to viewView Peter’s full profile
-
See who you know in common
-
Get introduced
-
Contact Peter directly
Other similar profiles
-
Vijay Gill
Vijay Gill
Great engineering organizations as a product: Culture, recruiting, leadership, execution and communications architecture.<br>My job is to make everyone in the engineering organization successful and unblock issues that are getting in the way of success of the team.<br><br>Our mission: Empower Developers Around the World to Create Transformative Software Applications
12K followersSan Francisco Bay Area
Explore more posts
-
Ihunna Peterpaul
Federal University of… • 757 followers
This morning’s deep dive into Quality of Service (QoS) was a clear reminder of how essential it is to maintain performance and reliability in networked systems. what i deducted · QoS isn’t just about bandwidth—it’s about intelligently prioritizing traffic to ensure critical applications run smoothly, especially under load. · Mechanisms like traffic classification, queuing, and policing help manage delay, jitter, and packet loss. · Without QoS, voice, video, and real-time data can degrade, impacting user experience and business operations. In today’s hybrid and cloud-heavy environments, implementing thoughtful QoS policies is no longer optional—it’s foundational for consistent service delivery. What strategies are you using to manage network performance? #QualityOfService #QoS #Networking #ITInfrastructure #NetworkManagement #TechLearning
1
-
Brian Gero
3K followers
When failure isn’t an option, infrastructure has to deliver. Flexential's Tier 3 Chaska facility is built for high-density performance, advanced security, and carrier-neutral connectivity to help keep critical operations online. Take a behind-the-scenes look with Chris Woolsey to see how Flexential makes it possible. Learn more: https://ow.ly/f8r930sO2um #FlexAnywhere #DataCenters #Colocation #Cybersecurity #Minneapolis
21
-
Brian Gero
3K followers
Flexential Interconnection Mesh is here. 🌐 This new addition to Flexential Fabric gives enterprises direct control of multi-site connectivity from a single port—fast, flexible, and built for scale. Proud moment for the team as we continue to strengthen the FlexAnywhere platform. 💪 Learn more: https://ow.ly/7bYJ30sPtmo #FlexAnywhere #Connectivity #DataCenters #HybridIT
19
-
Rob Mullins
Everpure • 2K followers
This is a huge deal for AI in enterprises. Architecting a global KV cache with Pure Storage eliminates redundant prefill, overcomes HBM limits, and delivers faster, more cost-efficient enterprise LLM inference at scale. See how you can keep your GPUs busy with KV Cache.
1
-
Randy Bias
Mirantis • 78K followers
There is a cool new capability in the latest MCP spec called "tasks". I think it has real promise, but it's still broken in that it relies on polling. An MCP client has to connect and get status. https://lnkd.in/gERGDMaR This is one of the kinds of fundamental differences between the Developer interaction model with agents and the Operator interaction model with agents. Operators know that polling doesn't scale. Sure, it's great on your laptop! Try moving it to a shared endpoint and see what happens. We need MCP to move beyond these older school patterns into more modern cloud-native patterns. Loose-coupling, circuit-break, fan-out, etc. etc. There are already some moves in this direction: https://lnkd.in/gYv8Caif
6
1 Comment -
Carmella (Surdyk) Weatherill
3K followers
With Google Cloud Axion processors, our custom Arm-based silicon is redefining the price-performance bracket—delivering up to 250% better performance than previous generations. 🤩 For IT leaders, this isn't just about faster chips; it’s about reclaiming hundreds of engineering hours every month. By reducing infrastructure complexity, your team can pivot from maintenance to high-value innovation. 🚀 See how Google Cloud Axion compares and hear the latest customer success stories in our technical deep dive → https://lnkd.in/gZJ8tG8U
2
-
Jason Livingood
Comcast • 5K followers
Final slide from my recent latency-related presentation - this one very much my own personal opinion with plenty of speculation. Fixed ISPs continue to deliver ever-better reliability, more bandwidth, better latency - at least to a demarcation at the edge of the home. But IMO user wants & needs have shifted and are now broader than the industry has fully recognized (though companies like CUJO AI® are exploring it). For example, a user may not be fully satisfied until they know that X application session on Y device and a point in time T is consistently great - across the full day, across all the devices they use, across all the apps they use, across the entire footprint of their home (or travels). Food for thought... #Bandwidth #Latency #NetworkResponsiveness #TemporalInternet #HolisticInternet #Broadband
73
5 Comments -
Lionel Touati
Google • 9K followers
This episode features Peter Pellerzi, a Distinguished Engineer at Google. Peter and the hosts, Matt Siegler and Steve McGhee, focus on the physical infrastructure side of SRE, discussing topics such as the scale of Google's data centers, handling incidents like power outages, testing and preparedness strategies, the use of AI for optimizing cooling plants, and more. Peter also emphasizes the importance of community support, proactive planning, and learning from real-world testing and incidents to ensure high availability and resilience in data center operations.
2
-
Jose Reinoso
2K followers
Dr. Bastian Koller, managing director of HLRS and lead coordinator of HammerHAI, added: "The contract signing for this new, AI-optimized system marks a new chapter in HammerHAI's development. We invite future users of the system to begin preparing their datasets, algorithms, and workflows now.
83
-
Gauravh J
2K followers
Great to see the MCP spec and community evolving with a focus on production scaleability and laying the groundwork for collaboration that can actually scale. Good to see server identity (like .well-known) and async ops called out as priorities for the next release. https://lnkd.in/g9kQjmy4 #mcp #genai #sigs
51
2 Comments -
Eric Miller
GiGstreem • 578 followers
Cloudflare wrote the classiest line in all of outage management: "We’re deeply sorry for this outage: this was a failure on our part, and while the proximate cause (or trigger) for this outage was a third-party vendor failure, we are ultimately responsible for our chosen dependencies and how we choose to architect around them."
18
4 Comments -
Varun Dewan
Google • 827 followers
With AI Overviews and Ask Gemini, you can now summarize, analyze, and query your files without ever leaving Drive. No extra tools, no security trade-offs—just seamless AI power built on a foundation of trust. Huge congrats to the entire Drive team on this milestone. Amazing to see this come to life! Onwards! 🚀 #GoogleCloud #GoogleDrive
15
-
Mike Houston
2K followers
⚡ Microsoft Builds Next-Gen AI Superfactory with NVIDIA Spectrum-X Ethernet. Microsoft is deploying next-generation NVIDIA Spectrum-X Ethernet switches in its Fairwater AI superfactory, built on the NVIDIA Blackwell platform. Built on NVIDIA’s commitment to open Ethernet and SONiC, Fairwater uses Spectrum-X Ethernet to deliver the scale and performance required for advanced AI workloads. Learn how NVIDIA Spectrum-X Ethernet is helping Microsoft expand AI infrastructure at scale 👇 https://bit.ly/3L2H4Bf
82
3 Comments -
Ugur Kaynar, PhD
Dell Technologies • 1K followers
Excited to be heading to #GTC26! 🌸 We’ll be showcasing 11 amazing demos on the AI Data Platform (AIDP) at Dell Technologies, including some exciting work around AI inference acceleration and kv cache storage optimizations. If you’re attending NVIDIA GTC, stop by the Dell Technologies booth to check out the demos, chat with us about AI and storage innovations, and explore how these technologies are helping power the next generation of AI. Looking forward to connecting with everyone there! #NVIDIAGTC #AI #Inference #AIDP #DellTechnologies
89
5 Comments -
Sree Chadalavada
Open Compute Project… • 6K followers
Like most in my network, I spent most of my career in traditional Ethernet (front-end networks). With AI, there are three new networks are emerging - Scale up: GPU-to-GPU Scale out: Node-to-Node Scale across: DC-to-DC The following 3 white papers offer a good summary of “why”, “what”, and “where do we go from here” for AI backend scale-out networks: 1. Datacenter Ethernet and RDMA: Issues at Hyperscale: AI workloads require lossless networks because they depend on extreme levels of parallelism and synchronization. The industry adopted RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE) as a protocol for AI backend scale-out networks. This paper discusses key challenges and inefficiencies with the RoCE standard. Link: https://lnkd.in/gKqSb3XB 2. I’ve Got 99 Problems But FLOPS Ain’t One: As AI Infrastructure deployments is a core competitive advantage, Hyperscalers tend to be secretive about the challenges they face (and the solutions derived to address the problems) in building global-scale network infrastructures. The authors of this white paper used a research-based approach to model network challenges in supporting mega clusters and identified potential research areas for further exploration. Link: https://lnkd.in/g_V59r2A 3. Ultra Ethernet’s Design Principles and Architectural Innovations: The 562-page-long Ultra Ethernet Specification 1.0 is intimidating. Ultra Ethernet specification 1.0 was designed to address AI and HPC workload network requirements. Key guiding principles include: Massive Scalability, High Performance, Compatibility and Vendor Differentiation. This white paper provides an easily digestible summary of key advancements proposed in Ultra Ethernet specification. Link: https://lnkd.in/gBFr-Vtv Ultra Ethernet Specification: Ultra Ethernet Specification v1.0 June 11, 2025: https://lnkd.in/gTmX73rV I would be remiss if I did not mention Torsten Hoefler 🇨🇭 who not only made contributions to the specification, but is also democratizing the specification. Thank you, Torsten! Torsten Hoefler Talk @ 2025 Swiss Conference: Next Generation AI and HPC Networking with Ultra Ethernet: Torsten Hoefler Talk: https://lnkd.in/gc9kZYB8 Are there other AI Backend Scale Out resources that you came across? #AINetworking #UEC #ScaleOutNetwork
178
6 Comments -
Jayank Bhalod
1K followers
Excited to share that today we launched a new Agentic Observability module for NGINX. This module enables users, customers, and the broader community to monitor MCP-based agentic traffic, helping bring greater visibility into emerging agent-driven systems. Check out our blog post (https://lnkd.in/gfGHzst6) for more details, use cases, and examples. Also, feel free to explore and star the newly open-sourced repository (https://lnkd.in/guivj5B8) We’d love your feedback and contributions! #F5 #NGINX #Opensource #MCP
67
1 Comment
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content