Join Delphi Research today and immediately get access to our full Member Portal!
Join Delphi Research today and immediately get access to our full Member Portal!

Sam Williams: Pay Once and Store Data Forever on Arweave

Apr 12, 2021 ·

By Tom Shaughnessy, and Can Gurel

The Delphi Podcast Host and GP of Delphi Ventures Tom Shaughnessy and Can Gurel, analyst at Delphi Ventures, host Sam Williams the Founder of Arweave. 

Arweave is a new type of storage that backs data with sustainable and perpetual endowments, allowing users and developers to truly store data forever – for the very first time.

As a collectively owned hard drive that never forgets, Arweave allows us to remember and preserve valuable information, apps, and history indefinitely. By preserving history, it prevents others from rewriting it.

Every Delphi Podcast is dropped first as an audio interview for Delphi Digital Subscribers. Our members also have access to full interview transcripts. Join today to get our interviews, first.




 Music Attribution:

  • Cosmos by From The Dust |
  • Music promoted by
  • Creative Commons Attribution 3.0 Unported License


Interview Transcript 

Tom (00:01):

Hey everyone. Welcome back to the podcast, I’m Tom Shaughnessy, your host, and I’m a GP at Delphi Ventures. Today I’m thrilled to have on Sam, who’s the founder of Arweave, which is a very interesting storage play, I’ve been following for a long time. I’m also joined by Can who’s our analyst here at Delphi who’s done a ton of work on Arweave.

Tom (00:19):

So Sam, why don’t we start with you? Give a brief introduction on yourself and we’ll go from there.

Sam (00:25):

Sure. Thanks for having me on. I’m pretty excited to speak to you guys and this audience. We hear really great things through our network, so it’s exciting. Yeah, okay. So, I’m Sam, I guess you could call me the inventor of the Arweave network and still kind of its chief architect as much as it has one nowadays. We started the project around four years ago when I was doing a PhD in distributed operating system design and since then, yeah, for about a year we built out the base version of the protocol and we launched it on the 69th anniversary of the release of 1984, which I guess is maybe something we’ll get into later on why that’s relevant to us.

Sam (01:06):

And since then, we’ve just been building out the community around it for the last three years. And some work on the protocol, at this point it’s starting to mature and it doesn’t really need very much maintenance or changes. Yeah, that’s kind of where we are. So, I’m not sure how to proceed from that.

Can (01:23):

Yeah. Great to hear. Actually, the first time I met the project, I mean I remember thinking to myself, such a cool project, to permanently store data. So, it’s like, so I wonder actually, how did you come up with this idea in the first place? It’s a really interesting idea to have to carry information to further generations.

Sam (01:48):

Yeah. Well, so bringing it back to 1984, we were really inspired by the idea of closing the memory hole, as Orwell described it in that book. You see, 1984 for those that haven’t read it, is a vision of this dystopian future that was written 1948, largely based on the experiences of people in the Soviet Union, small amount in China at that time. But it’s 70 years old now and it’s startlingly realistic of the modern world we live in as well as the dystopian world that is emerging in particularly modern-day China, Turkey, and some other states that are really taken to what you might call digital authoritarianism.

Sam (02:30):

So we were looking at some of the trends in the world and we were seeing that you could say something like the pace of history was picking up. So I think it was Lenin that said something like, “There are decades in which nothing happens and there are decades in which everything happens.” And it seems very much like we’re heading into one of those periods right now where everything is happening.

Sam (02:54):

And there’s this other, what would you say? Dynamic that occurs in authoritarian regimes where control over information and control over access to the past is a prize, you could say, for the regime which they use to control the way that people think about the future, or rather, think about the present and then subsequently how they act in the future. And Orwell put this nicely, “The person that controls the present controls the past. The person that controls the past, controls the future.”

Tom (03:25):

I love that quote.

Sam (03:26):

Right. Right. It perfectly epitomizes the problem that we’re trying to solve. And so what we thought we could do is basically make it so that there was group ownership across the whole world, democratic ownership of records of the past. And that is closing the memory holes, it’s described in the book, and that’s why we got into all of this in the first place.

Sam (03:49):

We do that by essentially scaling up a blockchain from the size where in Ethereum and Bitcoin right, we argue over bytes, single bytes that we add to the ledger because each one costs so much. Yes. Taking it from that sort of data storage sizes to essentially arbitrary quantities inside a single transaction. And then at the same time, pairing that with an economically sustainable mechanism for perpetuating that data. This works, broadly speaking, like an endowment. You put in enough fees to cover 200 years’ worth of storage upfront, and then over time, the cost of storage declines, you get interest in the form of storage purchasing power and you use that storage purchasing power without breaking into the principle, that first 200 years, over time as necessary.

Sam (04:37):

And essentially what happens is you end up with more storage purchasing power at the end of each given year than you had at the beginning. In that way, we can make information permanence truly sustainable.

Can (04:48):

Perfect. Thank you, for [inaudible 00:04:52]. Go ahead, Tom.

Tom (04:53):

Yeah, sorry, no, I want Can to get into the technicals with you, but just one follow-up question there, I mean, the 1984 example is incredible and I mean, the ability to store data for the world in a history that doesn’t change is obviously massive, right? So that people have a real view of history as we go down the line. Did you start this under that altruistic vision? Or did you also start Arweave to store, say, storage for just traditional web apps, for potentially new apps that we just use, maybe a step below protecting the world’s history.

Sam (05:26):

No. I would actually say we were more incentivized around protecting the world’s history than the commercial aspects, which really just developed along with us. It started with this question of, okay, how do we archive newspaper articles? In fact the reason that there are 66 million Arweave tokens in total is we did some sort of back of the napkin math. It’s always difficult choosing your token supply quantities. It’s basically arbitrary, right? We were trying to make it so that some reasonable valuation archiving the New York Times homepage would cost you one Arweave token.

Sam (06:01):

Yeah, so that was really the, what would you say, initializing force for the whole thing. And then over time of course we realized, well, if you’ve got an immutable archive of newspaper articles, then actually you’ve got an immutable archive over everything. And that’s much, much more powerful. And very shortly after that we realized, okay, well, if you can store webpages in this thing that are archived, why can’t you store primary sources webpages? And now you’ve got yourself a web and if you build on top of that just one more layer and say, “Okay, how about we can tag all of that data as well, which obviously if this thing is an archive, the tagging and metadata associated with new information is really valuable too. So it’s a sort of a natural continuation of the idea. Yeah, if we have that, then why don’t we make it queryable?”

Sam (06:50):

And once you have querying of the data inside the system, well all of a sudden you can build applications on top of it, like truly decentralized permanent web applications and that’s super powerful it turns out. That’s essentially what we call the permaweb nowadays. Please go on.

Can (07:07):

Perfect. Perfect. Good to hear. So that brings us to a good point, because I wanted to ask you about it. When you have a permanent decentralized network, it has some emergent properties, right? It has a data provenance, data integrity, availability, proof of existence, and many more that I’m missing right now probably. So, I’m interested, what use cases are there? Obviously archiving is the majority one, but I want to understand, are there any very innovative use cases that you’re particularly excited today or maybe can be built in somehow?

Sam (07:51):

So many. For what it’s worth, I think the idea of differentiating between the web and an archive has become blurry. There web is just where we put our knowledge, and it just makes sense that all of that knowledge persists forever. Really you don’t want to put knowledge out into the world and then have it disappear. That’s not the way that people ideally want things to work. Generally if you’ve published it in public, you want it to be around, full stop.

Sam (08:19):

So, I don’t think that the use case is really just archiving per se, it’s really just publishing of information that you want to make sure it goes as far and wide as you can. That is essentially a web, I think. That is what we’re trying to disrupt largely. So there’s obvious stuff there like storing large datasets that should be public. For example, scientific journal articles, it’d be great to see a system built on top of Arweave that makes sure that science is perpetuated for excessively long periods of time. There’s really no reason that we’d ever want to forget that.

Sam (08:55):

So the weirder stuff I would say, if you’re thinking about it from the point of view of an archive is things like applications themselves. But from a point of view of the user, this is extremely powerful, because you can essentially build web applications on top of Arweave that answer to no one. So they run under the same principles as code is law on Ethereum except they’re entire web apps, like things the size and complexity of YouTube or Facebook can be built on top of the system today, without any modification. It’s really that scalable and it just works.

Sam (09:28):

And those applications have really, really interesting properties. One that’s, I think, critical is that they allow users, or developers first to define how the content is going to be moderated on top of the system through the code. And one of the ways they can do that is they can delegate it to a community, so something like, for example, is a permaweb application where each community, which is broadly speaking a subreddit, gets to vote on the content moderation in their part of the internet, part of cyberspace, which is something that you just fundamentally can’t do in the Web 2.0 world, because in the permaweb, the community is the highest, what would you say? The highest authority. They have all of the power.

Sam (10:18):

But in the Web 2.0 world, there will always have to be a company sitting on top of them that actually decides what’s said. So they can moderate somewhat, but the company’s going to moderate on top of that. Yeah. It’s just a completely different power dynamic. And that’s just one of the things. Another is, for example, if you have permanent web applications, you as a consumer can be sure that they’re never going to decline in quality, which at initial glance doesn’t look that important, but I would actually argue that most of the problems with the Web 2.0 world come down to this.

Sam (10:55):

So the normal flow for getting a Web 2.0 application adopted, it goes like this, right? So you built something interesting on a weekend. You get it to some adoption, well you just put it out into the world and suddenly people like it and it’s like, “Wow, that’s actually a cool thing.” And now suddenly you’ve got yourself significant server costs associated with it. So, yeah. Now we need funding of some kind, we can’t just give it away for free. Okay, fine, so you raise VC capital. Okay, cool. But now there’s people that have bought into extracting value later by putting capital upfront. So you’re bankrolling the adoption process and at some point later, after you’ve dominated the market, you’ve got to turn on the tap and make some money somehow.

Sam (11:39):

And this is fine. This is a normal business model. But the problem is that from the user’s perspective, they can sign up to a service, which eventually at some point is going to pivot to using them, or extracting value from it in a way that it wasn’t previously. And particularly in the Web 2.0 world, we see that a lot of this comes down to, yeah, gaining a moat around those users, so making it so that they can’t leave after they’ve signed up. And now, value is being extracted from them, which is from the user’s perspective highly suboptimal.

Sam (12:14):

Whereas a permaweb application, that just can’t happen to you. The code is permanent, it’s always going to be available like this. The developer can release a new version that is better if they want, but they can’t make it worse. Now the power relationship between developer and consumer is completely, completely different.

Can (12:30):

Yeah. I feel you. I understand. Sometimes it annoys me a lot when I have some app and that forces me to upgrade to the newer version, I don’t like it. So yeah, it’s a good point. So, you mentioned the endowments and the payment structure, so that I find, obviously, very interesting. You pay once to store forever. And then how does the network actually ensure that for people who don’t know who are not familiar, could you explain it to us?

Sam (13:05):

Yeah. Absolutely. So it sounds impossible upfront. I can understand. Yeah, if I were to look at it from the outside, if I knew nothing about Arweave, I’d be like, “How on earth does that work?” But actually in practice, it’s really quite simple. You put aside 200 years’ worth of data storage, cost, at the beginning which sounds expensive but is actually 0.4 cents per megabyte, something like that. So for something like storing a web application or even storing your post, blog posts, Tweets, whatever, it’s actually quite cheap. This is just the nature of storage being as cheap as it is today relative to the types of data that we want to store.

Sam (13:40):

Okay, cool. So we’ve got that. And you can imagine base version, then you should release those tokens over time for the next 200 years and that’s 200 years’ worth of storage covered, right? But actually over time, the cost of storage declines at a fairly predictable rate. Over the last 50 years, it’s been about 30.5% on average per year. But the network assumes a declining rate of 0.5% per year.

Sam (14:06):

And if you factor into your calculations of how much storage you can buy with your 200 years’ worth, actually over time in practice, yeah you find that as long as it stays above that 0.5% rate, you just never run out of tokens to pay for it. It’s kind of like, the cost is exponentially decaying and you can sum the area under that curve to give you a finite cost for an indefinite period of storage.

Sam (14:33):

Obviously questions there, like yeah, okay, so in the past it declined at this rate, but how do you know it’s going to do that in the future? So we looked at this in a lot of detail and we don’t like to make predictions about individual technologies because that misses the point. It’s not really clear when they’ll come about and so on. Instead, we chose to look at the theoretical data density maximums and the data density maximums that we’ve actually reached in the lab, so in practice already. And we can see that where we are right now is somewhere like one times 10 to the power of 12 bits, or maybe 13, bits per cubic centimeter of data density. And in the lab we’ve already reached one times 10 to the power, I think, of 24 or 25. And in the theoretical maximum, so like moles law limit, except on the storage side, is around one times 10 to the power of 63.

Sam (15:27):

So it’s an enormous amount of room for growth on just the data density side, but of course, there’s another component to what we call the gigabyte hour cost, so the cost of storing a gigabyte for an hour, which is what drives all of the economics here. And that’s data reliability. And we use magnetic hard drives at the moment typically so there’s an electron basically representing the state of the bit, or a set of electrons representing the state of the bit somewhere in the system. Now, that makes sense because these materials are [inaudible 00:15:59]. So you want to write many times.

Sam (16:02):

And you don’t want to physically change the state of the atoms, so what you want to do instead is… Yeah, what you want to do instead is have is so that you can do that with a magnet basically. It makes a lot of sense in this, what would you say, paradigm that we currently have, but for permanent storage it doesn’t make any sense at all. Actually for permanent storage you should really be etching this into the chemical structure of the drive medium, because it’s write once, read many, right?

Sam (16:34):

And so on that component, the data reliability component, there’s even higher amounts of, what would you say? Efficiencies to be made. It’s actually much, much harder to predict the end game of where that goes to. But, just to take you back to the one where we can, the factual where we can sort of predict this. We think that at the 30.5% rate it will take 430 years, approximately, to reach that terminal data density limit. And at that point, we’d probably just start building bigger hard drives. It’s hardly like hard drive size is something we really care about. Yeah, so that’s all about the imbalance structure.

Tom (17:23):

No, it’s magical from the user perspective, right? I mean, pay once, store forever and you’re basically do a DCF of what storage prices are going to cost and you’re [inaudible 00:17:32] that and it’s magical. I guess one question on that point is, just plays into your native token, you guys have your own token, so what happens to the price of storage when the price of your token goes way up? How do you prevent wild swings and so on?

Sam (17:48):

Right. So we have a heuristic-based fiat price stabilization mechanism that basically says, “Okay, so at time zero,” and we hard code this into the protocol, “At time zero, the price of Arweave is this many. The number of tokens being released per block is this many. And the amount of work going into the block production is this much.” Take these three factors and then at another time, so time one we call it, we know there’s this much work going into a creation of a block and there’s this many tokens being released. And because the miners are economically rational actors, we can then refer them the price of a token.

Sam (18:26):

Because essentially what the miners do, or what these networks do really, is they attract more miners to come and put their work, whatever it happens to be, in this case it’s some CPU work, but ideally mostly hard drive work, yes, in mining the network until the profit margin is low enough that it’s not actually that interesting for someone else to come along and join a network and continue mining it.

Sam (18:50):

And so you always have this effect where the miners are sort of collectively exerting slightly less than the amount of value that is omitted in the blocks in a network. So at the moment in Arweave, we made about $150 worth of value per block, and the miners I think are putting in probably like $130, something like that, collectively per block into the work. So because this is predictable, we can make a heuristic-based system that estimates the price of Arweave tokens in fiat form, or at least in some generic value units. It’s actually kind of elegant because it’s not really truly fiat, it’s not going to have a problem if the US dollar has a problem for example. Yeah.

Sam (19:37):

And some generic value unit is kind of keeping track of it and then stabilizing the price that way. Does that make sense?

Can (19:44):

Yeah. Yeah. That makes perfect sense. So instead of using Oracles maybe to have an exact figure, then you use hash rate or network difficulty as a proxy, good enough proxy for what’s going on in the fiat world basically.

Sam (20:00):

Exactly. Exactly. It’s much more robust we think.

Can (20:04):

Perfect. So, you’re pioneering in this industry. So obviously there are some challenges, etcetera, and new problems you’re tackling that nobody else tackled before. So that requires some adaptivity and I know you recently switched to a new consensus mechanism that is Spora.

Sam (20:31):

Yeah. Right.

Can (20:32):

So how do you see effects of it, what have you witnessed since you changed the consensus mechanism? What was the intention there?

Sam (20:40):

Well the intention was just to increase the number of data replicates and that has certainly happened. I think, as of today, there’s probably like 1,200 or 1,300 replicas of the data all across the world, which is actually pretty significant. It’s hard to get proper numbers on this. I tried and so I can’t be sure. But I think it’s probably the most replicated dataset in the world at this point, which is pretty cool. There’s a lot of room to grow as more miners come online.

Sam (21:08):

One of the early effects we saw was hilariously profitable mining for the first few days. Because Arweave token price is whatever and it’s emitting like $150 of value, but when we swap the mining process, most of the pools just went offline. And so if you’re a small miner, now suddenly your making, you had like 150 miners at the beginning with $150 of value being emitted per block, and none of those were big, professional miners, so that’s just $1 per block for everyone that’s taking part. It was pretty crazy.

Sam (21:46):

Longer term, we’re just seeing this pattern towards larger numbers of replications of the dataset, which is exactly what we wanted. So really glad to see that.

Can (21:53):


Tom (21:58):

And just, I guess in line with Spora, it’s pretty technical, we’re going to link to it in the show notes of course, but one of the goals with that was to kind of reduce remote storage and one of my concerns with decentralized storage is that a lot of these players are just reselling AWS or Google and Microsoft and we’re kind of back at stage one, right? How does Arweave tackle that so that we move away from the Web 2.0 paradigm there?

Sam (22:26):

Yeah. That’s an interesting question. I don’t see it so much like that. I don’t care about competing with AWS on temporary storage costs. And so if people want to arbor charge on AWS storage costs in the period in which they are mining, then so be it. That’s fine by me. They’re not going to all do that because there’s more efficient, more effective ways. And so they use lots of different providers and lots of them run their own rigs and also sorts of stuff.

Sam (22:57):

We see Arweave as addressing something fundamentally different. It’s just permanent information storage for the first time, full stop. It’s the like zero to one change in the same way that Bitcoin was a zero to one change in digital scarcity, Ethereum was a zero to one change in smart contracts. Arweave is not just trying to decentralize an existing file storage paradigm that already basically largely almost everyone works. Instead, we’re trying to do something completely, completely new. And so the way the network is made up right now, there are replicas of it, of the dataset across basically every major cloud provider, many of the minor ones, and in lots of people’s homes. And that’s great, that’s perfect, that fulfills what the, what do you say? What the network needs to do in the short term in order to fulfill its mission in the long term. At any point in time, there need to be lots of replicas of it everywhere in the world.

Tom (23:50):

Yeah. That makes a lot of sense. And I mean just to follow up to close the loop for me there, I guess, how do you envision Arweave 25, 50 years down the road? Does it matter to you whether the state of your chain is actually basically 100% in everybody’s homes? Or do you care if it’s 100% on the cloud providers? I guess, we always go back and forth on this vision because everybody wants to see the move off cloud providers. I really don’t see the need to. But I’d love to get you answer there.

Sam (24:19):

Right. Yeah. I say 100% of anything is probably not what we want, right?

Tom (24:22):


Sam (24:23):

So, 100% cloud providers, that’s not a good situation. 100% house, probably okay, but yeah. I would rather there was some super fast, high reliability back ups of the network, that would be better, a mixture. Really the network is to some extent a virus that just wants to make replicas of the dataset across as many hard drives as possible. And this plays out in the fact that when people turn off their Arweave miners, they never delete the data. We’ve actually spoken to quite a few miners that, as is natural churn, so they weren’t quite as efficient as other miners, so they weren’t being profitable when some were… They’re leaving their work for a while.

Sam (25:06):

We speak to them and I always ask the question, “Hey, did you delete the data?” Because no one ever thinks about this. They’re like, “No, no. I just left it there.” And so now the network has made another copy of that dataset. Yeah, so it kind of spreads as a virus across all these old hard drives all over the world, which is awesome, because the only thing we care about is just replicating that dataset as far and as wide as possible. And the elegant thing is, if there’s a piece of data that is, there’s very few replicas left for some reason, and it’s on some hard drives that are offline, well now I just need the incentive for someone to bring it back online or to sell that copy to someone else that’s part of the network, is higher because the data is rarer.

Sam (25:50):

And so there’s this sort of auto-leveling effect for data replication as well, which is pretty… I don’t want to say elegant because I designed the system, but I’m pretty happy with it. And it’s quite nice.

Tom (26:02):

One last question there, and I’ll throw it back over to Can to dive in as well, but and I hate to lob this question because it’s the extreme, right? But let’s say World War III happened and some country launches missiles at all the AWS availability zones, right? And let’s be honest, everyone kind of knows where they are and it wouldn’t be too hard.

Sam (26:24):

Right. For sure.

Tom (26:25):

What Arweave survive?

Sam (26:27):

Mm-hmm (affirmative). Easily. Yeah. No problem at all. Yeah. That’s not a concern. I might be more concerned actually it’s like a super, super bad pandemic. Just a atrociously bad pandemic, you have this problem that the lines of the internet need to be maintained by humans that have to turn up for work. Now, that is a trickier problem than nuclear war where the whole thing is replicated everywhere, unless literally the whole surface of the world is destroyed, then yeah. Then that balance of-

Tom (27:08):

Sam, we’re ending your example because I cannot deal with another pandemic any time soon.

Sam (27:11):

I’m sorry. Okay. Fair enough.

Tom (27:14):

I’m kidding. It’s a good thought experiment.

Sam (27:16):

Yeah. Well I mean, it was a thought experiment until it was very real and at the beginning of this pandemic, we obviously didn’t know how much truth there was to the information coming out of China right at the beginning. So it wasn’t clear what the fatality rates were. And then suddenly it occurred to us, “Wait, the one thing that we can’t really rely on the internet for is if no one turns up because no one wants to get sick.” Pretty interesting. Yeah. But fortunately there are people working on things like Arweave over HF radio, so the network can still continue producing and broadcasting blocks, even if that were to happen.

Can (27:55):

Perfect. Perfect. Awesome. Sam, as I was reading through your content, watching your videos, I quickly noticed that you’re a mastermind in game theory. So, drifting, this question not only doesn’t particularly address to Arweave, but when we talk about networks, decentralized networks, we always end up in a Pareto distribution, whether it be miners, validators, block grid users. So I actually wanted your thoughts, will we be able to beat that system with a design where we prevent people to pool their resources and dominating the network basically?

Sam (28:46):

Look, I think that most of the uses… Okay, I’ll go back, that’s an interesting question, but just a side note on it first. I think most of the use cases for these decentralized networks are not really about hyper decentralization, like one person, one node. It’s sufficient to be Bitcoin style decentralized, which as you correctly point out is basically just a Pareto distribution. But as long as the network is enforcing the rules in an elegant way, and there’s 13 to 15 core miners, everyone can see if they were to start to do something wrong, then actually basically, they typically fulfill the needs. No one has to worry that Bitcoin is going to have more than 21 million Bitcoin tokens, because there’s only 15 people broadly that control the hash rate. That’s not a concern.

Sam (29:41):

The way the protocol is enforced, if 14 of them tried to go and change it to 22 million, then they’d just have a separate protocol and it’s not called Bitcoin and then we continue our day. Ethereum is largely the same. And I think Arweave is too. Obviously, it would be preferable I think if we could have one person, one node, but that’s not easy to create of course, because civil identities can never really be that expensive. It’s basically the problem.

Sam (30:14):

There is some interesting stuff happening on the Arweave network in the permaweb arm, they are on top with a project called ArVerify, which was theorized by Albert from USV who spoke to a couple of people in our ecosystem and then they went and built it. And now, 5,000 or 6,000 verifications later, they bootstrapped it and it’s making like $15,000 revenue a month. It’s kind of neat for them. But broadly the principle is this, if I know you, I verify you. And then you verify the people you’re friends with. And so on. And then you can make a kind of page rank for humans.

Sam (30:51):

So you get a civil resistance score. So on the system, because lots of people know me, I luckily have like 100% civil resistance score, people are very sure that I am not a robot. But if you’re new to the network and you don’t know so many people, then you might have a score of like 20%, something like this. And over time, it will increase.

Sam (31:12):

Yeah. So, you can use systems like this with social verification to create civil resistance that obviously the end in question is, “Hey, can we apply that to mining?” To which, I think the answer is maybe, but not clear yet. There’s also stuff like Circles UBI, right? Which is broadly taking a similar approach and actually printing money to give to people based on that civil resistance score. So we can see, well I would say, let’s watch what happens to Circles. And if it works, then that’s great, we should start applying it to mining of actual base layer currencies. Does that answer your question?

Can (31:53):

Yeah. Yeah. Yeah. Definitely. I mean, yeah, we shouldn’t forget the end goal. The end goal is to, I mean if you’re going to have a censorship resistant network, as long as you’re decentralized enough to have that censorship resistance, then you’re good to go basically. But yeah.

Sam (32:09):

Yeah. I was good friends with Joe Armstrong, who’s the founder of Erlang, the language that we actually wrote Arweave in. And he had this, he was known as this cult figure who had this motto that went something like, “Well, it’s fast enough.” Erlang was supposed to run on these enormous machines, enormous networks of machines doing [inaudible 00:32:35] switching. And it was never really a good number-crunching language, but they kept improving it piece by piece until it got to what he called, “Fast enough.”

Sam (32:45):

Like, “Okay, you might be paying a little bit more to do whatever number crunching, maybe 20% more than you would if it was written in C or something like that, but that doesn’t really matter because you get all of this other cool stuff for free.” I think there’s [inaudible 00:33:00] to be made in the centralized world. It’s decentralized enough to do the job is basically what we’re looking for. It shouldn’t be a case of… Again, 100% anything is bad maybe.

Can (33:12):

Yeah. Yeah, yeah.

Tom (33:14):

Sorry. One quick follow up there. I mean, just to go back to the altruistic, the history of you, I think that’s incredible. I guess the other question there though is, do you guys solve access? If I’m in another country and my internet’s restricted, how do I get to an Arweave URL? How do you guys prevent that from being cut off? All the [crosstalk 00:33:35] around that?

Sam (33:35):

Yeah. Absolutely. So there’s a pretty basic system in the network for sharing bandwidth, it’s just like optimistic tit for tat from BitTorrent, which is very unknown, but it’s actually probably the most successful mechanism designed to be deployed in a protocol ever. At one point it was passing like 35% of the internet’s traffic. Or actually, that’s probably not true. Bitcoin probably has a bigger market cap than this ever accrued fees for, but that’s only happened in the last few months or so. Before that, it was true that BitTorrent’s system was using up 35% of the bandwidth on the internet for 10 years. And it was purely this very basic mechanism that said something like, “If you give data to me, I give data to you. And then occasionally we give data to people at random.”

Sam (34:27):

And yeah, this creates a Nash equilibrium where everyone is incentivized to give data to everybody else all the time. It’s a very elegant game. So we had that baked into the base layer of the protocol with some changes and actually with some liberalism. So we don’t really care whether people share bandwidth or whether they actually just pay each other for access to the dataset for mining purposes, but that’s totally fine by us. If they want to, instead of paying and swapping bandwidth, if they want to swap tokens, what do we care? It’s all the same.

Sam (35:01):

I think that swapping bandwidth is easier and nicer because it’s easy for people to just plug into and get started. And it’s also how peering relationships on the internet have worked for dozens of years at this point. Yep. So there’s that side to it. And then if you’re a user that wants to get access, if you don’t want to plug into the base underlying layer of the network, you’ll go through a gateway. And there’s lots of gateways at this point which is great. And also this project called Amplify, which is a profit-sharing community built on top of Arweave, which incentivizes people to run these gateways and then they have probe nodes all over the internet probing each other’s gateways to see if they’re acting properly.

Sam (35:42):

You can imagine the staking and slashing mechanisms that go into making that work. That’s pretty much what you’d expect. So there’s that from the user’s access perspective as well.

Tom (35:52):

Makes sense. That’s awesome. Can, I’ll let you go next.

Can (35:55):

Yeah. So you mentioned a profit-sharing community, could you elaborate on that? What is that? Yeah. It’s a big part of permaweb, right? Yeah.

Sam (36:06):

Right. For sure. It’s become so. I mean, basically they are decentralized, autonomous organizations that govern the running and building of web services. And on top of that, they also share profit from those services. And typically this comes in the form of tips, so for example, if you’re creating a decentralized medium, you might have a tip that goes along with the fee when the person uploads a blog post and the tip sends 0.01R or something to members of the profit-sharing community, relative to their ownership stake. So if we all own a third, a third of the time I get a tip and two third of the time you get the tip.

Sam (36:50):

Yeah. That’s pretty much how it works. But what it means is that now the assets that back these web services are what you might call hard. They have a real underlying value to them. Well I mean, actually this is just one of the ways they’re better than traditional startups I think. But it’s basically this principle that says, “Okay, so now if you hold this token, over time as people use the application, value will accrue to your wallet.” And it’s just like an APR basically, it micro-dividends with every user of the application broadly speaking.

Sam (37:28):

So this is obviously a big step up on equity in a traditional startup, which might actually never issue any kind of dividends, even if it’s wildly successful. An example of this would be Facebook. Never issues a dividend, or it never has issued a dividend, has a stated policy that it won’t, so your value, being in Facebook, what does it really do? Well all those profits just get recycled into make Facebook bigger. Which, if it’s never going to issue any value outwardly, then it’s kind of hard to identify where the value of those tokens, in this case, securities, equities, actually come from.

Sam (38:07):

And similarly they have a more elegant, I think, governance model, which is, we don’t just say, “Okay, you won this many, or this percentage of the organization. You get that percentage of the voting power.” Instead, people that are really staked in for long periods of time get a larger stake even if they have a smaller quantity of tokens. So you get quantity of tokens, you multiply it by the amount of time that they are willing to stake those tokens.

Sam (38:35):

And so you can generate what we could call incentives for good governance using this, because the idea is, if my tokens are staked for two years and I’m accruing value in those tips all through that length of time, then obviously I have some kind of incentive to make sure that the tips continue to accrue and also, ideally, increase. And so now I’m trying to do the best for the organization, rather than just holding the tokens and being able to vote on stuff.

Can (39:08):

Yeah. I love this web tree stuff, like you use money as a language to communicate messages. It’s fantastic.

Sam (39:15):

Huh, interesting.

Can (39:16):

Still [inaudible 00:39:18] me. So, you mentioned about content creation before and I want to dive a bit deeper there, because that part is actually, I’m having a little confusion. Because data on permaweb always goes up, right? So you cannot delete something that you have uploaded on web. So, how does content, what happens if there’s a illegal content upload? How do you deal with that? How does the network deal with that?

Sam (39:48):

Yeah. It’s actually better on Arweave than it is on normal block chains, because we’ve had to deal with the problem of, well, okay, the network is going to be so large it’s not possible to store all of the data. Sorry, it’s not possible for every node to store all the data. So that means naturally it must be okay for people not to store certain pieces of data. It’s kind of built into the hard requirements technically to make something like this work.

Sam (40:14):

And one of the outputs of this is that the network simply never forces anyone to store any piece of data if they don’t want to. So while it does create a small incentive to store every piece of data, obviously if the content is illegal locally to where you’re storing it, and you’re putting it online with your IP address, you’re saying, “Hey everyone, come along and download this clear text, unencrypted data from me.” Then you’re pinning a target on your back if that thing is illegal, which is a huge disincentive to do that.

Sam (40:42):

So the weighing of the incentives and the disincentives very clearly comes out on the disincentive side and you just won’t store data that is illegal. Similarly, you won’t store data that is highly against your conscience. And so these things sort of keep it in check. And then there’s this system whereby, essentially it’s a global network, so data that is illegal in more places, much harder to get access to, and there’s some forms of data that I think we can all agree shouldn’t be replicated permanently forever. And fortunately, that information is actually illegal basically every single place on the earth.

Sam (41:16):

And subsequently, there’s just no incentive to store that and that’s the one time in which the data just won’t be available even if you pay to store it permanently in the network. So if no miners anywhere will store it for you, yeah. So, that’s how the network deals with content moderation at the protocol level. So I would call this basically volunteerism. The idea is, if you were to boil it down to a sentence, nobody is forced to store anything the don’t want to or they’re not allowed to, full stop.

Sam (41:46):

Then on top of that, you can obviously have content moderation inside your application and that’s just like in Ethereum code is law, you decide how that’s going to work. So whether your community is going to vote on the content moderation or whether there’s going to be moderators that allow and disallow certain pieces of content or you can elect them and revoke them. That’s a common them. Or you just have one address and it says, “This is the super admin and they’re allowed to do whatever they want.”

Sam (42:14):

Yeah. All of that’s up for debate and what do you say? Down to the developer to decide. The one further complication is that gateways can also apply their own content moderation policy. This is actually really cool, because it means that if you are, for example, a devoutly Christian person and you don’t want to see a web that’s full of swear words, just to caricature an example, then you actually just go to a gateway that has similar political leanings to you and you see a web, so the same applications will be served to you without the content that you find offensive. This is really interesting, because it allows the users to have, just say, to have the ultimate say in the content that is presented to them. They become the ultimate arbiters of it. That’s really cool we think. Ask people to basically take part in the web they want to see without having to enforce that on others.

Sam (43:18):

And of course, this frees the developer from the burden of having to do content moderation themselves. You see us basically ripping apart the main social media networks right now, but actually, it’s funny, it’s like one of those cases where the wrinkles at the edges of the design at the beginning become the fundamental flaws later. So arguably, one megabyte block limit in Bitcoin was just something Satoshi threw in one day. The commit log basically says nothing of interest. It’s just like, “Ugh. One megabyte block. Doing it now.” And that suddenly became when 2017 came around, the thing that defined the system.

Sam (43:59):

But I think with social media networks it’s like they’re building all this amazing stuff, arguably very, very good for humanity in a lot of ways, which we don’t think about much nowadays, but I think 10 years ago we were noticing this, and it’s true still. But at the middle there was this wrinkle which was like, “Okay, but how do we decide what people should or should not be able to say?”

Sam (44:21):

So freeing developers of that responsibility and burden actually lets them get on with the stuff that they’re great at, building this amazing product while pushing that responsibility back to the users, back to communities, and then back to the hosts of the nodes in this hierarchical structure if you see that.

Tom (44:39):

That’s a really good take. I mean, we’ve seen a lot of developers do wild things when, and I’m not telling everyone to do anything illegal, but we’ve seen them really push the bounds when they can design in whatever space they want, which is super interesting. And just to bring it back to the space right now, how does Arweave handle NFTs? Because there’s obviously a lot of discussion going on on who owns the IP? Who owns the metadata? How does Arweave work with NFTs today?

Sam (45:05):

Right. I mean, NFTs in Arweave are such an obvious [inaudible 00:45:09], which it’s cool to see the industry has also started to realize this en masse as well. Because if you buy an NFT, there’s absolutely no reason you want it to be anything less than permanent, particularly if permanence of that NFT costs a fraction, tiny fraction, maybe a 10th or 100th of the Eth gas fees required to buy the NFT. Yeah.

Sam (45:33):

So, people are now just starting to embed transaction IDs from Arweave that contain the data, the asset into the Ethereum blockchain. That’s one way of doing it. But people are also going a step further now and creating what we’re calling Atomic NFTs to basically, it’s strange to describe because it’s kind of what people think NFTs are anyway but aren’t really, but just hear me out. The idea of an Atomic NFT is it’s an NFT where the contract, the metadata, and the data are all found in the same place permanently and behind a single address.

Sam (46:10):

Whereas what’s normally happening at the moment in Ethereum for example is that you have an NFT contract address which contains a link to the metadata which, scarily, is sometimes stored on centralized servers that can be changed or forgotten to be paid for by the author at any point later. And then that metadata points to somewhere else where the data is stored. So when people are using Arweave, normally they’re linking to data or metadata on Arweave, and then the metadata links to another transaction that contains the image, which is good step up I would say, but the ultimate… Well, that introduces a problem, which is like, okay, that’s cool, but what if we have multiple blockchains or just multiple contracts that point to the same piece of data?

Sam (46:55):

But with an Atomic NFT, you simply cannot do that, because the identifier you have in itself contains the contract which says who owns it. And so everything can be labeled as one and it’s never going to be lost, which I think is, yeah, it’s kind of the ultimate design for something like this. And people are moving to that now, it’s pretty neat to see.

Tom (47:19):

I’d pay up and see more money if I could see the benefits on Arweave or something like that.

Sam (47:25):

Right. Yeah. I think OpenSea are integrating Arweave right now, but just for storage of data. I think it’ll be a while before people start to do Atomic NFTs on Arweave en masse, although I do see the beginnings of it starting to take shape.

Tom (47:40):

For sure. We’re running low on time, but Can, throw in one more and I’ll throw in one more as well.

Can (47:45):

Okay. Yeah, sure. So, recently I also read about Arweave becoming the backbone for different blockchain projects, Solana is one of them, and there are many others too. So, can you explain to us how you helped those projects and what’s the basic use case there?

Sam (48:11):

Right. Yeah.

Can (48:13):


Sam (48:13):

Yeah. So, blockchains typically have this problem which is, they’re an ever-growing ledger and there’s no incentives to store the old data. Okay? They’re adding at a slower rate than Arweave, but eventually if there are one of these scalable blockchains, they’re starting to add hundreds of terabytes per month, it’s a problem. Hundreds of gigabytes at least. And they span sometimes multiple terabytes at this point.

Sam (48:38):

So, Arweave obviously has those incentives to make permanent storage actually possible. So if you store the data inside Arweave, you can be confident you’re not going to lose an old copy of the blockchain. And you can also be sure that you can sync it down to your nodes much faster because you can take part in what is essentially akin to a BitTorrents form for the blockchain data itself.

Sam (48:59):

Yeah, this started with Solana about, no it was Scale first. Yeah, Scale, we spoke to them about this and they were super interested, so we put together a grant and then they integrated it. And then Solana not long after and then suddenly it was Polka Dot, and after Polka Dot, then Cosmos and also Avalanche and who else now? Basically all of the large blockchains have now signed up to do this and at the time of the Polka Dot grant, which was course-funded between us and Polka Dot, a team was formed in the Arweave ecosystem when they realized that the basic principle of like, take the data from the other blockchain, put it on our Arweave, is good, but what’s much better is if you can also validate that that data is correct.

Sam (49:44):

And so they started building a whole profit-sharing community around this called [Kive 00:49:48], which is now ready for test net and has these integrations with all these other blockchains, they’re actually all co-funding it, it’s really cool. They all took convertible notes on Kive’s first funding round. Yeah. So that’s really elegant to see. I think… Let’s see. Yeah. I think all of the major blockchains in the crypto space are now doing this, apart from Ethereum and Bitcoin themselves. So it’s cool stuff.

Sam (50:18):

And for example, you can use Kive now to build fully decentralized permaweb-based block explorers for all of these networks that are agnostic and they have a demo of this working. It’s so amazing to see. It’s real decentralized access to all of the data in the crypto ecosystem. Just put in a TXID anywhere and boom, there you go, now you have all the information about your TX no matter what blockchain it’s stored on.

Sam (50:44):

And of course, all the information is unified in format, so you can access it in your applications without having to write different drivers for different chains. So it becomes like the aggregator service, which is pretty cool. And now the graph are starting to use this for fast syncing of old data and I think they’re also going to potentially use it just to index that and then index all the chains at once without having to basically build different implementations and plug-ins for their network for every chain, just do one. So, that’s another cool use case of Arweave.

Tom (51:24):

That’s awesome. And Sam, just to switch gears, just two final questions from my end. One quicker one, what do you think it will be easy, or does it exist now to just store Arweave on my computer or phone, right? Is it stupid to think we’ll get a Google quick link to allocate some of my space? And sorry for the stupid question here, just seems like such an easy way to add supply to your network.

Sam (51:51):

Right. The big question is, when will you be able to do that and then make a reasonable profit? And I think the answer is when people start to create decentralized pooling software on top of Arweave so that you can submit partial proofs of work in the same way that you would with a normal centralized pool, but have, yeah, the decentralized network basically work out who rewards should go to and then anyone from that network that mines a block, they also send a transaction distributing the profits from that block to members of the mining pool.

Sam (52:26):

I think that’s the solution to this problem. I’m actually talking to someone in 30 minutes from the ecosystem that is looking to take on building a decentralized mining pool like this. So it’s pretty cool to see as well.

Tom (52:38):

Sam, if they’re raising money, send them our way. We’d love to [inaudible 00:52:41].

Sam (52:42):

Amazing. Yeah.

Tom (52:44):

And just one last question, so you started this a couple years ago and if you have to… Your day to day is probably insane because you’re just so focused on building and driving Arweave forward, but if you had to look at the entire life of Arweave, what’s the most unforeseen positive and I guess negative event that you’ve overcome on both sides that you didn’t see coming?

Sam (53:12):

Not an easy question. To some extent it’s all felt fairly predictable. It’s taken a long time, but when I started this I got this weird feeling that this thing, people were going to like. It’s one of those things that if you looked at the world at that time, just after the election of Donald Trump, this wasn’t really a reaction to that, but in that realm where we all were, mentally, I think a lot of people were interested in something like this existing.

Sam (53:53):

And since we’ve started speaking to people, sure enough, they kind of started coalescing around it. And it’s been a bumpy ride for sure, but I don’t know, largely nothing sticks out as like, “Wow. I did not expect that. That was a complete…” I mean, actually no, lots of interesting stories. In the short term where very, very surprising things happened both up and down, like random coincidences and all sorts of stuff. But in the stuff that matters, over the long term, it’s all been kind of what we expected.

Tom (54:29):

Well, I mean that speaks to you guys and yourself personally spending a lot of time to plan this out the right way, instead of winging it, right? So kudos to you.

Sam (54:38):

Thanks. Well, yeah. Not sure I can take credit for that, but anyway.

Tom (54:44):

Appreciate that. Can, any last questions on your end or are you good?

Can (54:47):

I’m happy to meet you. No further questions. I mean, this is a very holy mission that Arweave’s after, and it’s such a cool project. So yeah. Congratulations.

Sam (55:01):

Thanks so much. It was a very fun chat. Great to have the opportunity to meet you guys as well.

Tom (55:06):

Yeah, Sam. Really appreciate your time and shout out to Can. We’ll link his report on Arweave in the show notes. Did a phenomenal job there. And Sam, thanks so much for coming out. Really appreciate the time and hope you can come on again soon.

Sam (55:19):

Thanks everyone.

Show Notes:

(1:45) – (First Question) Sam’s Background.

(2:55) – Arweave’s Elevator Pitch.

(6:01) – What incentivized Sam to start Arweave.

(8:07) – Arweave Use Cases.

(13:26) – Endowments / Payment Structure.

(20:27) – Thoughts on Spora.

(23:59) – How Sam envisions Arweave in 25, 50 years.

(35:35) – Arweave, a profit-sharing community.

(39:06) – Thoughts on dealing with illegal content.

(44:20) – How does Arweave handle NFTs.

(47:01) – Thoughts on Arweave in blockchain projects.

(51:37) – The most unforeseen positive & negative event that Sam has overcome.