Search Optimization in the Distributed Networks.

S. Osokine.
osokin@osokin.com
15 Oct 2002.

Abstract

Superpeers (also sometimes called ultrapeers) are generally considered to be a promising approach to content search improvement in the distributed networks. This study analyzes the performance of several superpeer architectures from the practical standpoint, comparing the different architectures and superpeer role assignment algorithms when the goal is to to maximize the search query reach in a peer-to-peer network. The study also investigates the performance of the 'Local Indices' mutual index caching architecture from the same standpoint, and suggests an optimal caching approach that tries to maximize the peer-to-peer search performance by distributing the superpeer functionality across all network nodes. This approach can significantly improve the query reach in comparison with the superpeer architectures, since it allows all network hosts to maximize their contribution to the overall network resource pool, potentially approaching the theoretical limits of the peer-to-peer network search performance. The study shows that the search performance of this hybrid approach is directly tied to the average host bandwidth and to its hardware (RAM and CPU) resources, as opposed to the normal superpeer case, when the search performance is tied to the bandwidth and resources of the 'best' network hosts. Thus the suggested algorithm does not require careful superpeer selection and does not present the high-profile superpeer targets to the attacker, which makes it less fragile than the conventional superpeer architectures.

Contents

1. Introduction
2. Basic Assumptions
     2.1. Full query message utilization
     2.2. Infinite network size
     2.3. Reach maximization for all queries
     2.4. Full host bandwidth utilization
     2.5. Host behaviour model
     2.6. Infinite CPU speed
3. Flat Network Architectures
     3.1. Identical-host symmetric link network
     3.2. Identical-host asymmetric link network
     3.3. Heterogeneous symmetric link network
     3.4. Arbitrary bandwidth case
4. Superpeer Architectures
     4.1. Plain superpeer network
     4.2. Plain superpeer network: explanations
     4.3. Non-routing superpeers
     4.4. Redundant superpeer clusters
     4.5. Optimal redundancy level
     4.6. Superpeer network problems
5. Mutual Index Caching
     5.1. Mutual index caching performance
     5.2. Optimal search conditions
     5.3. Index transfer bandwidth
     5.4. Host search value maximization
     5.5. Asymmetric link case
     5.6. Decentralized network control
     5.7. Suboptimal decentralized control
6. Conclusion
     6.1. Acknowledgements
7. References
Appendix A. Superpeer query failure probability
Appendix B. 'Napster meltdown' and why Gnutella failed to die
Appendix C. Index transfer traffic averaging


1. Introduction.

   Peer-to-peer networks gain more and more popularity on the Internet. It is difficult to provide the precise numbers - in part due to their explosive growth - but it would be fair to say that at the time of this writing (October, 2002) the total number of P2P network users is probably well over one hundred million, and the number of simultaneously connected hosts in such networks as Gnutella [1] and Kazaa [2] is measured in millions.

   One of the most important components of any peer-to-peer system is its search engine, and many projects are being actively developed in this area. The goal of the search engine is to discover the resources (files, hosts, Web services, etc) scattered over the net and to present the list of these resources to the requestor. All search approaches can be roughly divided in two groups:

   The first group includes the approaches that seek to route the search queries to the nodes that are more likely to satisfy these queries. This group includes such projects as CAN [3], Pastry [4], Chord [5], Tapestry [6], PNRP [7], and other similar projects. Typically the projects in this group seek to design the search mechanisms with per-request load proportional to the logarithms of the number of nodes in the distributed network and/or of the number of content items in it. The goal here is to make sure that every item of content is guaranteed to be located within a bounded number of operations (request forwarding hops, comparisons, etc).

   This guaranteed search comes for a price - the 'foreign data' storage requirements may have to be forced on the hosts, the request has to specify the precise identifier of the data item and so on.

   The second group of the search algorithms includes the ones that require the search request propagation to the number of nodes comparable to the number of nodes in the network for the search to be guaranteed. Since this is often not achievable in practice, these algorithms typically do not guarantee the search success, but rather try to achieve the most extensive search given the current network hardware resources and architecture.

   This group includes Napster [8], Gnutella [1], Kazaa [2] and other similar systems. It might seem strange that a centralized Napster search is paired with pure peer-to-peer Gnutella one. After all, Napster seems to symbolize the centrally controlled peer-to-peer systems, which use the central server to organize the behaviour of the clients. Gnutella, on the contrary, is a fully decentralized self-organizing network without any central control points, which is cooperatively built by the independent interacting networked software agents. An attempt to control the Gnutella network is not unlike an attempt to control the growth of the coral reef.

   But here we are comparing the search mechanisms, not the control architectures. And Napster - exactly as Gnutella - can find the file only after looking through the local index of the host that stores that file. The fact that this index is physically stored on one of the Napster's central servers does make the search more effective; nevertheless, Napster is constrained by its central servers' capacity, and has to split the network into several semi-independent 'slices' in order to carry that load.

   Of course, the descriptions above invite the question: who would want to use the non-guaranteed search systems when the guaranteed ones are available? Well - as luck would have it, in the past years the guaranteed search systems were being adopted much slower and currently have several orders of magnitude smaller user base than the non-guaranteed search ones. There are many reasons for that - for example, the non-guaranteed search peer-to-peer architectures are much simpler to implement, allow the 'fuzzy' search queries ('brit*.mp3', etc), and do not unduly burden the user, not requiring him to dedicate part of his hard disk to store items that he has no personal interest in. However, all these attractive features would not be able to ensure such a huge adoption rate advantage over the guaranteed-search systems, if the non-guaranteed search architectures would not be able to satisfy the average user of a peer-to-peer system - which they obviously do.

   It is not a coincidence that all three of the non-guaranteed search systems above have (or had, in case of Napster, before it was shut down by the court) tens of millions of users. Apparently they satisfy their users despite all their disadvantages - which are significant.

   As a matter of fact, these disadvantages become more pronounced as the peer-to-peer networks grow. Once upon a time, it was possible to query every host in the Gnutella network in search of some special file. Then the query was able to reach about half of all hosts. Then twenty percent. Now the same query would be lucky to reach a single-digit percentage of all network hosts, which makes a search for the rare file a tedious and time-consuming lottery. Kazaa is doing better in that respect, but its query seems to be unable to reach all the network hosts, too. All this makes the attempts to widen the query reach to the biggest possible number of hosts a very active research field today - even when the search is not guaranteed, increasing its probability of success is very important for the users' satisfaction and for peer-to-peer system competitiveness and survival.

   And one of the most widespread approaches to this problem is to use the superpeers (or ultrapeers) - the well-connected hosts that act as the concentrators, holding the content indices and routing the queries on behalf of the 'leaf hosts'. In this scenario, the ad hoc peer-to-peer network becomes semi-structured, with a two-level topology, as opposed to the flat topology of the early peer-to-peer systems.

   In fact, it is easy to see that the Napster search architecture can be viewed as a special case of the superpeer search architecture. From the search mechanism standpoint the Napster central directory servers are equivalent to a network with a few very large superpeers that do not forward the queries to each other (due to the resource constraints or for some other reason) and perform all the searches locally - only among the leaf nodes of the same superpeer. (Of course, from most of the other standpoints the Napster server is very different from a decentralized network superpeer - it is a highly specialized central control application that has little in common with the independently interacting superpeer agents.)

   It has long been an article of faith among the peer-to-peer developers that superpeers should be somehow able to solve the request reach problem, dramatically increasing the chances of success for the search. But only recently the serious research is starting to be performed in this area. The study [9], for example, is answering the questions about the best practices in superpeer architecture, superpeer performance and about the ways to minimize the load on the superpeers, allowing them to route more data on behalf of the users.

   At the same time, the same researchers were investigating the alternative approaches to the query reach optimization, trying to quantatively analyze the peer-to-peer network performance in case of the different routing and index storage approaches [10]. Unfortunately for the practical peer-to-peer network developers, thorough and solid studies [9] and [10] seek to maximize the search efficiency - essentially the authors freeze the search request reach or a number of returned results at some fixed value and try to minimize the system load under these conditions.

   This serves a purpose related, but not exactly equivalent to maximizing the query reach. In practical peer-to-peer system design the system load does not necessarily have to be minimized. In fact, as long as the individual host load does not exceed certain thresholds, it might be beneficial to increase it, as long as this increases the request reach and search success rate, which is the main goal of the practical search engine design. This study tries to do exactly that, applying the reasoning similar to [9] and [10] in order to maximize the query reach and search success rate.

(Query reach and search success rate criteria are not exactly equivalent, especially when the 'Local Indices' mutual index caching [10] is used. When the hosts cache the multiple indices on behalf of other hosts, even a query that reaches a single host can have a significant probability of success. However, it is often convenient to measure both query reach and search success rate in 'hosts'. Then a query reach of N hosts means that a query returns the number of results equal to the one that would be returned if the same query would be directly applied to N average P2P hosts, even though in reality, for example, it might have been applied to a much smaller set of superpeers. In that case 'search success' and 'query reach' optimization criteria become synonymous.)

   Of course, saying that efficiency and request reach criteria are different might sound strange, since the architectures with lower system loads should also result in a wider request reach. Generally this is really the case. However, in superpeer architectures the superpeers can become the performance bottlenecks, whereas the 'leaf' nodes would be mostly idle. So unloading the well-connected superpeers and loading the low-bandwidth hosts might lead to the dramatic search performance improvements, even though both aggregate and per-request system loads might be increased in the process. Also, since the architectural design recommendations are necessarily 'fuzzy' and combine a great deal of theoretical research and experimental data into a few seemingly simple (and sometimes even counterintuitive) recommendations, changing the optimization criteria can lead to the dramatically different architectural recommendations, as it will be shown below in Section 5.

2. Basic Assumptions.

   This section establishes several basic assumptions to be used throughout the rest of this article. These assumptions simplify the distributed system analysis and may or may not be true in a real peer-to-peer system, depending on its scale, purpose, and design. However, the changes brought by abandoning these assumptions are not of the extremely dramatic nature, and typically just would require including some limited-range coefficients in the analytical formulae below.

   It is important to note that even though the assumptions below are illustrated by the Gnutella network examples, these assumptions are of the general nature - this study tries to rely as little as possible on the query message delivery mechanisms specific to any particular distributed search network. Generally we try to assume only that the query is somehow delivered to a set of remote hosts that query their local databases and somehow return the results to the querying host. This might be illustrated by Figure 1:

Figure 1. Query message delivery to the remote hosts.

- here the mechanism of the message delivery might be a Gnutella [1] breadth-first traversal (BFS), random walkers [11], LimeWire GUESS proposal [12] or some other method. We are mostly concerned with the limits imposed on this delivery by the network resource constraints (bandwidth and CPU of the network hosts).

2.1. Full query message utilization.

   First, we assume that every query message transmitted over the wire is not dropped or ignored by the receiving host. In Gnutela context, this assumption is equivalent to the zero redundancy in the query propagation graph, or to the infinitely large network with the random connection establishment algorithm.

   The statement above deserves some explanation. Typically the Gnutella network topology graph is presented by pictures similar to Figure 2 in [13], or Figure 2 below:

Figure 2. Gnutella network graph.

   The pictures like this one are useful for visualizing the ad hoc topology of the Gnutella network, but they do create a flawed perception of the Gnutella network redundancy. For example, if the host g decides to query such a network, it sends the query to its neighbours d and h, which retransmit it to their neighbours a, e, b, c, f, and i. Host e receives the same query from both d and h. Since Gnutella queries have unique IDs, e answers just one of these queries, and ignores the duplicate, but half of its incoming bandwidth is wasted on the transmission of the redundant query. The situation becomes even worse on the next hop - by now, all hosts have already seen this query, but will receive multiple copies of it anyway, as, for example, b, e, and f will transmit the redundant information to c.

   This example creates an impression that 'looping' (redundant query transmission) is a serious problem for the Gnutella network [14]. Fortunately, this is not exactly the case. When the new hosts are connecting to the Gnutella network, they select the connection points from the very large lists of already connected hosts (or hosts known to be connected in the past). Further, the whole subject of the query reach increase came to light when the network became too large for a single query to reach every other host on the network. So the probablity of these randomly selected connection points being close in the 'hop space' is very low, and keeps becoming lower and lower as the network keeps growing.

   As a result, from the single host viewpoint the Gnutella network topology is closer not to Figure 2, but to Figure 3, or to the graphs presented on Figure 13 in [15] and on the GnuMap project page [16]:

Figure 3. Single-host view of the Gnutella network graph.

   Of course, some redundancy and looping are always present, but their influence on the aggregate traffic calculations is not very noticeable in the large network. Essentially, for the query propagation purposes the Gnutella network can be viewed as a tree with the root at the query-originating host. If this assumption will not happen to be true for some peer-to-peer network, then either the traffic statistics calculation will have to be adjusted to allow for the bandwidth wasted on looping, or the redundancy will have to be eliminated, either using methods similar to the ones suggested in [14], or (as suggested in [17]) using the well-known cycle avoidance techniques [18].

2.2. Infinite network size.

   This assumption is closely related to the previous one - it is much simpler to analyze the query propagation if it is not expected to reach all network hosts, since we don't have to consider the special case of the full query reach and cap all the formulae accordingly. This is also required for the previous assumption to be true.

   This assumption is really just a convenience - if it is not true, we already have the full network coverage by a single query, and further optimization is pointless. The inevitable (in this case) redundancy will not have a detrimental effect on the query reach, even though it might result in a sub-optimal bandwidth utilization.

2.3. Reach maximization for all queries.

   Many recent search optimization proposals ([10], [11], [12], [17], [19]) are seeking to reduce the number of nodes that process an average query without degrading the search quality. In many cases these approaches utilize the huge disparity in popularity of the different content items. The popular files in the file-sharing system can be present on a significant percentage of the network hosts, whereas rare files can be present only on a few hosts. Since the most frequent queries are likely to look for the most poplular files, a significant resource economy can be achieved if the search can be terminated immediately after reaching a certain number of positive matches.

   This approach is valid and interesting, but for the purposes of this study we assume that such an optimization is not used, and every search reaches the same average number of storage hosts. Setting aside the analysis simplification benefits when such an assumption is used, there are two additional reasons to use this approach:

   So here we'll keep in mind that there are some very valuable optimizational possibilities in this approach (it is certainly tempting to be able to answer 92-93 percents of the queries locally from a 10,000-leaf superpeer), but we'll restrict ourselves to the scenario when a superpeer does have to forward the incoming queries to the widest possible range of the other superpeers. Especially since this 10,000-leaf superpeer in [20] was a Napster 'slice' server that obviously requires a lot of resources - the savings in the more realistic scenario of 400-leaf superpeer are significantly lower.

   And of course, the numbers above reflect only the queries with no results at all - realistically, the usable peer-to-peer file-sharing system will be much better off if it will try to return not just one, but many results for every query. This is necessary for both multi-source downloads, which are becoming the standard feature of any successful file-sharing system, and for having an ability to select the proper answer from the results matching a non-precise query. These factors make it necessary to require many answers for a query, so even the superpeers with some results found might have to forward the query to other superpeers anyway, which will obviously make the savings less impressive (see Appendix A.)

2.4. Full host bandwidth utilization.

   As opposed to the studies [9] and [10], which freeze the expected number of the returned results and seek to minimize the bandwidth utilization, here we assume that we need to maximize the expected number of query results, and that the bandwidth of the peer-to-peer network hosts is going to be the major bottleneck that prevents the full network querying by a single search.

   So this study assumes that the peers' bandwidth is saturated by the transmission of requests. For a large-scale peer-to-peer network, this assumption is quite realistic. It has been noted long ago that the traffic in the Gnutella network tends to saturate the dial-up modem links ([21], [22]). This fact even prompted predictions about the eventual overload and death of the Gnutella network [23].

   Of course, nothing like this ever happened, and Appendix B explains in more detail why exactly it didn't. What is important for us here, is that the Gnutella queries' propagation is not stopped by their TTL (as a surprising number of people still think), but by the flow control mechanisms that selectively drop the queries if and when the physical network links start to get overloaded.

   This assumption is just an analysis guideline; in practice, there will be multiple scenarious when the host bandwidth won't be saturated:

2.5. Host behaviour model.

   Throughout this study, we are using certain statistical assumptions about the storage and querying behaviour of the average host in the peer-to-peer network. The assumptions used here reflect the expected behaviour of a host in a generic file-sharing network, and can be different for a real file-sharing application. Still, we tried to set the realistic expectations and use the values that would reflect the typical behaviour of the host in the existing file-sharing networks.

   The primary values used in the subsequent sections are the average host storage size , session length , interval between queries , query message size , file index record size , and i-th host bandwidth . Unless specifically stated otherwise, we'll be using the following values for these parameters:

files, seconds, seconds, bytes, bytes. (1)

and

bytes/sec for the modem connection. (2)

   These assumptions mean that the average host stores 200 files with an index record (name, author and other searchable metadata) for every file taking 100 bytes. The host connects to the peer-to-peer network through a modem connection capable of transmitting 5000 bytes of search requests per second, or 50 queries per second, since the average query size is 100 bytes. On the average, the host spends one hour at a time online, and while it is online, it issues a file search query (100 bytes) every 200 seconds - which is equivalent to the QueryJoinRatio of 18 (as defined in [10]).

   Even though it was not our intention to find the exceedingly accurate values for these parameters, they do roughly correspond to the values observed in the existing peer-to-peer networks. These numbers are loosely based on the existing studies of the peer-to-peer networks ([10], [20], [26]), on the observations of the average per-client storage reported by the Kazaa [2] network client in October of 2002 (which reported approximately 180 stored files per client) and on the estimates and discussions in the Gnutella [1] community about the typical Gnutella client behaviour.

   Since the great accuracy was not required, some numbers were rounded and/or changed to the values provided by less precise, but more recent estimates. For example, the query message size of 100 bytes is used instead of the 112-byte estimate from [26]. The 100-byte index record (metadata) size of the file is used instead of thed 208-byte value from [26] and 72-byte - from [9]. The average number of 200 files stored by a host is taken instead of the non-precise Kazaa client measurements (180), and 168 and 340 suggested in [26] and [10] correspondingly. The one-hour session length and 200-second interval between queries are not substantiated by any published research and are basically the 'educated guesses', based on the community discussions and observations. In any case, these two last numbers are most likely to change, since they are based on such application-dependent technical decisions as the re-request frequency (to find more download sources, for example) and on whether the application is automatically started in the user's machine boot sequence or not.

   The modem bandwidth (2) roughly corresponds to the 56-kbit modem throughput after the factors listed in Section 2.4 are taken into account. When the high-bandwidth connections will appear in our study, their bandwidth will be similarly estimated without any great precision - just to give us some rough numbers to work with.

2.6. Infinite CPU speed.

   Unlike [9], [10], [20], and [26], for the purposes of this study we assume that the host CPU is infinitely fast and is not a bottleneck in any of the peer-to-peer host operations. This approach is a direct consequence of the assumption made in Section 2.4 - that typically the network link capacity is a performance bottleneck.

   Of course, as it was already mentioned, this might not be true for the well-connected hosts. However, the majority of the hosts in the network like Gnutella [1] are not CPU-bound. This is why to simplify the matter, we simulate the CPU overload by applying an effective bandwidth cap to the overloaded network hosts.

   To roughly estimate this cap value, we use the CPU costs of the atomic actions provided in [9]. This data suggests that for the query size (1) the CPU cost of receiving one query, processing it, and maybe retransmitting one copy of the query, will be about 2M clock cycles on the Pentium III 930-MHz processor. This translates into approximately 500 queries per second limit, or the effective bandwidth cap of 50 kilobytes per second. After applying the frequently used in practice '10 bits per byte' estimate, which we'll be using throughout this study, this is equivalent to the bandwidth of about 500 kilobits per second.

   This bandwidth cap value is based on several assumptions:

   In any case, since some or all of these assumptions might be wrong, in this study we won't be using a strict 500-kilobits or 1-megabit per second cap, but rather investigate the range of the values likely to exist in the particular peer-to-peer network architecture.

   Now let's apply these assumptions to the performance analysis of various peer-to-peer network architectures.

3. Flat Network Architectures.

   In this section we'll estimate the performance of the 'flat' peer-to-peer network architectures, where all hosts are responsible only for their own data and index storage and treat the network traffic in a uniform fashion according to their capacities. No superpeers are present to store the indices for other, 'leaf' hosts and to route the query traffic on their behalf.

   This analysis is needed for two purposes: first, it establishes the baseline performance to be used as a reference point - the performance of the other, more complicated architectures will be compared to it. And second, the results of this analysis will be used in the analysis of the more complicated architectures - as we'll see later, often these architectures can be treated as the 'flat' ones, with the superpeers forming an effectively 'flat' network, whose performance can be evaluated according to the formulae from this section.

3.1. Identical-host symmetric link network.

   The simplest case of the flat architecture is when all hosts in the infinite network have the same bandwidth, and the incoming and outgoing bandwidth is the same:

(3)

   The mechanism used to propagate the query messages through the network is not important here - it might be Gnutella flooding, random walkers, or just direct requests from one host to another. What is important is that every host receives these query messages and performs the searches in its local database:

Figure 4. Query messages received by the host.

   Obviously the average receiving host ('e' on Figure 4) cannot receive more than B bytes per seconds of these messages, so its limit on the number of local searches performed per second is , and the total number of local searches per second performed by all hosts in the network is .

   Since in the flat network architecture all hosts are functionally equivalent, all these searches are initiated by the same hosts that are performing them. So also gives us the average number of local searches per second that can be initiated all over the network by a typical querying host. And since these 'remote' local searches are the result of a querying operation performed by an average host once in seconds, the total number of hosts reached by this querying operation is:

, (4)

and the total number of data items (files) covered by the search is:

(5)

   Of course, in order to achieve such a search coverage, all network hosts should agree to perform a certain amount of data transmission - if the average host sends out bytes of data per second, then obviously network-wide , and

(6)

   So in case of the symmetric network links the query reach (4), (5) is possible only if all hosts fully utilize their outgoing bandwidth - whether by broadcasting queries for other participants, performing 'random walking' on their behalf, or just sending multiple direct queries themselves (not that the last variant is recommended for a practical peer-to-peer system, but from the traffic and query reach standpoints it is equivalent to the previous ones).

   For the parameter values defined in (1), (2) (modem hosts only), the search request can reach about

hosts, (7)

which is close enough to the values observed on the real Gnutella network (5,000-10,000 hosts). In this study we do not seek to achieve the exact match of the performance numbers, but rather to find out the laws that determine the search performance and to suggest the ways of its improvement.

3.2. Identical-host asymmetric link network.

   After the analysis performed in the previous section, it probably won't come as a surprise that if all hosts in the peer-to-peer network have identical asymmetric links (ADSL, cable modem, etc), the network performance (request reach) will be defined by the lower bandwidth value:

(8)

   For the real ADSL and cable modem connections the outgoing bandwidth is likely to be the lower one, so in practice the query reach in such a network will be limited not by a high, 1.5 Mbits/sec+ values of the incoming bandwidth, but by the much lower (128-384 kilobits/sec) outgoing one.

   This is quite natural - if someone receives the data, someone has to send it, and since in our sample network all hosts have the similar connections, only the outgoing network links will be saturated, whereas the incoming ones will be mostly idle. If the network wouldn't be 'flat', we could've found some other uses for this excess capacity, but that will be the subject of the subsequent sections.

3.3. Heterogeneous symmetric link network.

   Now let us consider the case of the symmetric-link network, in which different hosts have different bandwidth. Let's say that some hosts (we'll denote their percentage as ) have the higher bandwidth than the rest of the hosts, which have the bandwidth .

   The total number of local searches that can be performed by all hosts in such a network in one second is , and, repeating the reasoning that led us to (4) and (5), we arrive to the request reach of

, and (9)

where is the average host bandwidth:

(10)

   It is easy to see that (9) stays correct also for the arbitrary bandwidth distributions - then if we have hosts in the network,

(11)

   Naturally, (9) and (11) define only the maximum achievable network performance. For these numbers to be actually reached, the low-bandwidth and high-bandwidth hosts have to find such a way to coexist on the network that would assure the full bandwidth utilization for the high-bandwidth hosts.

   If, for example, the majority of the Gnutella hosts are connected through the modems and have five connections to the other hosts, then each connection carries about one kilobyte of traffic per second. So if we open the same five connections on a well-connected host, it will route only five kilobytes per second - exactly the same amount of data as a modem one, and won't improve the network performance at all. Its bandwidth will be grossly underutilized.

   This is why typically the Gnutella vendors try to open more connections for the better-connected hosts, which allows the network as a whole to fully utilize their resources. Ideally the number of Gnutella connections for the host should be proportional to its effective bandwidth - the one after various bandwidth caps (as defined in Section 2.6) are taken into account.

3.4. Arbitrary bandwidth case.

   Now we are ready to analyze the case of the network in which every host can have an arbitrary incoming and outgoing bandwidth and .

   From the previous sections it is clear that in order to maximize the search request reach in such a network, we need to maximize the total query traffic over the whole network. This traffic is limited by the lower of the summary sending and summary receiving capacities of all network hosts:

   

(12)

   So the search request reach in this case is determined by the equation similar to (9):

, and (13)

- except that the average bandwidth is defined as:

(14)

   In the practical Internet case, it is likely that the outgoing bandwidth of the asymmetrically connected hosts will be lower that the incoming one, so we'll be able to find the average bandwidth as:

(15)

   Here we use the index '0' for the bandwidth and request reach variables to define the search performance of the arbitrary-bandwidth flat peer-to-peer network. These variables will be used later to compare this 'baseline' performance with the performances of the other, more complicated architectures.

   Let's take a hypothetical peer-to-peer network in which most of the participating hosts are connected through the modems with the 5,000 bytes per second transfer limits, but 10 percent of the hosts are using the cable modems with the 12,000 bytes per second uplink (outgoing) bandwidth, and another 10 percent - ADSL links with the 20,000 bytes per second uplink bandwidth:

Percentage of hosts Outgoing bandwidth (bytes/sec)
80 5,000
10 12,000
10 20,000

Table 1. Sample host bandwidth distribution.

   For such a model network, the average bandwidth value will be:

bytes per second, (16)

and, taking the variable values from Section 2.5, the number of hosts and files reached by an average search request will be:

hosts (17)

and

files. (18)

   Here we are not trying to accurately approximate the hosts' bandwidth distribution for the Internet or some real peer-to-peer network. The goal here is to provide an example of a network with the different hosts present and to see how such a distribution affects the search performance, though the values used in this example loosely reflect the actual residential broadband adoption percentages in the United States.

   The equations (13)-(15) show that the search performance of the 'flat' peer-to-peer network is determined by the average host bandwidth, query message size, number of the data items (files) stored by the average host, and the interval between queries.

   All these components can be - and are - used to maximize the search performance. For example, all existing file-sharing networks try to provide the incentives for the users to share more files and to stay longer on-line, thus increasing the average number of stored files and the interval between requests (the latter effect is caused by the increased session length that allows the host to route and answer significantly more queries while initiating the same or just slightly increased number of searches itself).

   Similarly, the continued broadband adoption growth will improve the search quality in a network due to the gradual average bandwidth increase. For the purposes of this study, however, we will consider such improvements to be orthogonal to our main goal, which is to increase the request reach without changing the values of these parameters.

4. Superpeer Architectures.

   Now we can use the results obtained in the previous section to analyze the search performance of the peer-to-peer networks with different topologies. Let's start with the plain superpeer architecture.

4.1. Plain superpeer network.

   It has been noted long ago that the traffic in the Gnutella network tends to saturate the dial-up modem links ([21], [22]), thus limiting the search request reach in the network. So the natural countermeasure was to isolate the modem hosts from the query routing, selecting a subset of well-connected hosts called 'superpeers' and making them bear the routing load on behalf of their 'leaf nodes'. Such a superpeer network can be represented by a graph similar to Figure 1(a) in [9], or Figure 5 below:

Figure 5. Query delivery in a superpeer network.

   Figure 5 shows the query message delivery in a network with four superpeers a, b, c, and d. Here the search request is delivered by a leaf node to its superpeer a, which propagates it to the superpeers b, c, and d. Every superpeer has an index of all the content stored on its leaf nodes, so instead of delivering the query message to all the network nodes, it has to be delivered only to the superpeers, which can perform the local searches in their index databases without actually querying their leaf nodes.

   The actual mechanism of the message delivery to the superpeers is not important for us here - it just has to deliver the query message to other superpeers. It can be a Gnutella breadth-first traversal (BFS), random walkers [11], some modified form of the LimeWire GUESS proposal [12] that would allow a superpeer to query all other superpeers directly, or some other method used by the superpeers to communicate with each other.

   In any case, this query message delivery architecture seems to be much more effective than the one shown on Figure 1, since only the superpeers are involved in the message routing. Let's find the search performance (query reach) of such a network.

   First, let us note that the request traffic from the leaves to their 'home' superpeers is very low - the traffic from a single leaf node can be found as , which is about 0.5 bytes per second per leaf in assumption (1).

   Another component of the leaf-to-supernode traffic is the 'join' component, arising from a necessity to transfer the full leaf index to the supernode when the leaf host joins the network. This traffic can be calculated as , or about 5.5 bytes per second per leaf when amortized over the average session length . So if the number of supernode leaves is less than several hundred, their summary bandwidth usage is much lower than the total bandwidth of the well-connected supernode, and we can discount the supernode bandwidth consumed by the leaves.

   As far as the traffic analysis is concerned, this allows us to effectively erase the leaves from the Figure 5. After such an operation is performed, this chart becomes remarkably similar to the Figure 1. In fact, the only difference between the flat peer-to-peer network investigated in Section 3 and the flat peer-to-peer network consisting of only the superpeer hosts will be in the number of files locally searchable by the host , interval between queries , and i-th host bandwidth . Let's find these parameters for the superpeer network.

   Introducing the average number of superpeer leaves , it is easy to see that the number of files locally searchable on the superpeer - including its own files - will be:

files, (19)

and the queries will be generated by a superpeer for every query generated by its leaves and by a superpeer itself, so its average interval between queries can be found as:

(20)

   Substituting these values into the equations (13)-(15) that describe the flat peer-to-peer network performance, we can see that the query reach of the superpeer network will be:

superpeers, files (21)

reachable by a search, where

(22)

is an average superpeer outgoing bandwidth (this presumes that the outgoing bandwidth is not higher than the incoming bandwidth for all superpeers - if this is not the case, the analog of the equation (14) will have to be used.)

   It is also convenient to introduce an effective number of reachable hosts, defined as the number of the regular leaf hosts that would have to be searched to achieve the same number of files covered by the search:

(23)

- then from the equation (21) we can find the effective number of reachable hosts in a superpeer network as:

(24)

   Now if we define the ratio of the superpeer network search coverage to the regular, flat network search coverage as the search improvement coefficient , this improvement can be found from (13), (21), and (24) as:

, (25)

meaning that the search improvement is proportional to the ratio of the average superpeer bandwidth to the average peer-to-peer network host bandwidth.

4.2. Plain superpeer network: explanations.

   The result (25) is so unexpected and counterintuitive that it deserves a special explanation. The comparison of the Figure 1 and Figure 5 clearly suggests that even with the same bandwidth, the superpeer network search is inherently more effective than the search in the flat network. One might even expect the search efficiency and reach to be proportional to the number of superpeer leaves .

   Not quite so.

   The confusion - as it is often the case with peer-to-peer networks - is caused by a naive geometrical model of the search operation that is implicitly suggested by Figure 1 and Figure 5. We see the decreased number of arrows between blocks and assume that it directly translates into the traffic decline. Not really.

   What Figure 5 fails to convey is that as the number of superpeer leaves grows, so does the query traffic generated by this superpeer. And since the search query reach is ultimately limited by the connections' saturation (see Section 2.4), the better-than-average superpeer bandwidth is an only factor that allows the searches to cover more content items (files) in the plain superpeer network.

   Let's consider an example - a network with a host bandwidth distribution from Table 1. Its search performance in the 'flat network' case is defined by (17) and (18). If every broadband host will be selected as a superpeer, the average superpeer bandwidth will be 16,000 bytes/sec, and the effective number of searchable hosts and files will be 32,000 and 6.4*10^6 correspondingly. This gives us an improvement over the flat peer-to-peer network of:

, (26)

   If, however, we select only the better-connected 20,000-bytes per second hosts for the superpeer role, the effective number of searchable hosts will be 40,000, the number of searchable files - 8*10^6, and the improvement coefficient will be:

, (27)

   Now if we imagine that this network also contains enough hosts capable of supporting 100,000-bytes per second data streams to make the rest of the network hosts (broadband or not) their 'leaves', the search coverage will grow to 200,000 hosts, 4*10^7 files, and the improvement will be:

, (28)

   The key word in the last scenario is 'enough'. After all, the large network is virtually guaranteed to have some very well-connected hosts. The question is, would their number be sufficient to take all other hosts as leaves? Every leaf consumes about bytes of RAM on a superpeer, and typically requires a TCP connection to be kept open between the superpeer and the leaf (here we assume that all leaf indices have to be stored in RAM by a superpeer, since the disk access times might make it very difficult to process hundreds of queries per second otherwise).

   Assumption (1) implies about twenty kilobytes of RAM occupied on the superpeer by every leaf's storage, so having 500 leaves will use ten megabytes of the superpeer RAM, and five hundred connections. If there are enough network hosts (more than 0.2 percent) that can take this load and still deliver the 100,000-bytes per second data streams, the improvement (28) can be achieved. If not, less capable hosts will have to be used as the superpeers, lowering the average superpeer bandwidth and search request reach.

   These examples show that in the superpeer architecture it is very important to select the best and only the best hosts for the superpeer role, maximizing their number of leaves in the process. Even if some host has enough resources and is well-connected enough to be a superpeer, it should not become one unless some better-connected superpeers are overloaded or the bandwidth of this 'candidate' host is higher than the current average superpeer bandwidth. Doing otherwise will harm the search in the network.

   Such superpeer selection decisions, however, are very difficult to make on the local, 'node' level in the completely decentralized peer-to-peer networks. In the existing Gnutella algorithms, for example, every host decides for itself whether it can and will be a superpeer or not. To do so in an optimal fashion - or at least, without harming the existing network - it must have some global, network-wide information (current average superpeer bandwidth, etc), which is difficult to obtain in the ad hoc network. First, this information is typically not available to a network host. (In fact, currently it is not available to anyone but the network researchers.)

   And second, any attempt to collect and distribute this information in a decentralized fashion should be carefully designed to be protected against the malicious agents that might try to use these channels to mount an attack on the network. One example of such an attack would be to widely distribute the very low fake value for the current average superpeer bandwidth, which might cause many low-bandwidth hosts to become superpeers and significantly decrease the quality of the network search.

   This is why it is extremely tempting to employ the centralized methods of the superpeer selection. Some central location might have information about all the existing superpeers and carefully monitor their performance and load, selecting only the best-connected hosts for the superpeer role. In fact, there are some indications [13] that this is exactly what Kazaa is doing with its 'index servers'. With up to 500 leaf nodes per superpeer, the performance of the Kazaa network can be as high as it is allowed by the effective average bandwidth of the top 0.2 percent of the network hosts, which can be quite impressive. Due to the closed character of the Kazaa network, however, this hypothesis is hard to verify.

   And in any case, any centralization introduced into the peer-to-peer network is an additional liability - in both technical and legal sense. Which are remarkably close on the architectural level - court decisions, hacker attacks, and peak loads alike can harm any network functionality that is not thoroughly distributed over a super-redundant array of hosts.

4.3. Non-routing superpeers.

   An interesting variation of the plain superpeer architecture has been recently suggested by LimeWire in its GUESS proposal [12]. This proposal essentially suggests to remove the query routing functionality from the superpeers and make every leaf node query the superpeers directly, as shown on Figure 6 below:

Figure 6. Direct leaf-to-superpeers query delivery.

   The primary goal of this approach was to have a simple way to stop querying when a desired number of results is reached. This architecture, however, also has an interesting additional property - it does not use the outgoing (transmission) capacity of the superpeers. All query messages are transmitted by the leaves, and superpeers have to send only the answers. And the answers typically do not consume a lot of bandwidth in a file-sharing network, unless the number of average superpeer 'leaves' becomes so large that the network is effectively 'Napsterized' - and significant percentage of queries can be answered by any superpeer (see Appendix A). Such a case, however, requires some very powerful superpeers and is not considered here.

   So since the superpeers are not using their outgoing bandwidth, this bandwidth stops being a bottleneck, and the summary network traffic is becoming constrained by the incoming bandwidth of the superpeers. To be precise, it is also constrained by the summary outgoing bandwidth of the leaf nodes, but in a practical network design it makes sense to make the summary leaves' uplink capacity much higher than the summary superpeers' downlink one.

   The reason for this is that since every leaf node transmits only its own queries, the equality of the summary bandwidths of leaf uplinks and superpeer downlinks would mean that the leaf would spend, on the average, seconds transmitting the same query to the different superpeers. That, in turn, implies the search latency (why would the leaf host keep transmitting if it would already have the result?) of about , or 200 seconds.

   Such a search latency is excessive from the user interface standpoint, so it makes sense to organize a network in such a fashion that an average leaf would query all the superpeers that he can possibly query much faster than that - maybe even in just ten or twenty seconds. For the rest of the interval the leaf uplink would be idle, so it would be the summary superpeer downlink capacity that would determine the search performance of such a network.

   This condition can be written as:

, (29)

- so for the superpeer selection scenario which led to (28) (100 kilobytes per second average superpeer bandwidth), it is enough to have 14 leaves per superpeer to satisfy the 'superpeers are the bottleneck' condition, and 140 leaves per superpeer - to bring the typical search time to 20 seconds. Such leaf numbers can probably be handled by a typical superpeer without putting an excessive load on it.

   Once a condition (29) is met, the search performance of such a network starts to be defined by the equations (21), (24), and (25) - but the average superpeer bandwidth in this case is defined as:

(30)

instead of the equation (22). Now, since a typical residential broadband host has an incoming bandwidth that significantly exceeds its outgoing bandwidth, the performance of a superpeer network with this architecture can significantly exceed the performance of the superpeer networks that led us to (26) and (27) - even with exactly the same superpeer selection algorithms.

   If we roughly estimate the average superpeer incoming bandwidth to be from 500 kilobits to 1.5 megabits per second, we might expect the search performance increase coefficient to be between 7 and 20 over the 'flat' peer-to-peer network search performance. That improvement might be achieved even with the simplest 'local' superpeer selection algorithm that would allow every broadband host to become a superpeer provided that its number of leaves satisfies (29).

   Of course, such a performance increase presumes that the GUESS algorithm is flow-controlled or has some other provisions to gracefully limit the superpeer load coming from the multitude of hosts that are simultaneously bombarding the superpeer with requests. The early version of GUESS [12] quoted here does not address this issue yet.

4.4. Redundant superpeer clusters.

   Another approach to the search improvement in the superpeer network is to introduce the redundancy into the superpeer architecture.

   Many practical developers have been doing exactly that - opening the connections from the same leaf node to multiple superpeers - for a long time. For example, every Kazaa [2] leaf node seems to maintain the connections to three superpeer nodes [13]. However, typically the rationale for this behaviour was the need to keep the leaf node connected to the network in case the superpeer to which it is connected goes off-line.

   The study [9] pioneered another look at the superpeer redundancy. The authors introduced the 'k-redundancy' as a special variant of superpeer design and showed that this design can dramatically increase the search effectiveness. Here we'll analyze the k-redundancy from the search performance (query reach) perspective.

   The idea behind the k-redundancy is to group k superpeers together and upload the same leaf indices on all of them - Figure 7, for example, shows a 2-redundant superpeer network:

Figure 7. 2-redundant superpeer network.

   A leaf node sends a query to all the superpeers it's connected to, or to a random one. These superpeers or superpeer retransmit this query to other superpeers. The retransmission method details are not important here, as well as whether the original query is sent to one or to several superpeers, since we are mainly concerned by the limits imposed on the query propagation by the combined bandwidth and search capacity of all the other superpeers in the network. What is important is that we assume the absence of redundancy in the network (Section 2.1). In the Gnutella network case it means that every superpeer randomly chooses its connection points into the network, regardless of the choices made by other superpeers - including the superpeers in the same k-redundant cluster.

   So in the infinite network assumption (Section 2.2) the probability of the same query arriving from the different superpeers to the same superpeer or even into the same k-redundant superpeer cluster is negligibly low. That makes the query message delivery look like shown on Figure 8:

(a)

(b)

Figure 8. Query delivery in a 2-redundant superpeer network.

   Here every cluster (group of superpeers) acts as a 'composite superpeer'. Such a composite superpeer has k times more bandwidth than an average superpeer, and the query arriving to it causes a local database with files to be searched. is the number of leaf hosts of a supereer, so since the superpeers in the same k-redundant cluster are not directly connected, the query arriving to the cluster can access only the 'leaf files' and the files belonging to the superpeer that receives the query - the files that belong to all other superpeers in that cluster are not accessed.

   The average interval between queries generated by a k-redundant superpeer cluster is , so similarly to (21), the query reach in such an architecture is:

superpeer clusters, and files (31)

   Comparing this to the flat peer-to-peer network search performance (13), we see that the performance increase coefficient can be found as:

(32)

   Note that the superpeers in the same cluster on Figure 7 and Figure 8 are not connected directly. The reason for this is twofold: first, since most of the files are identical on all superpeers in the redundant cluster, connecting these superpeers will waste a lot of CPU and bandwidth resources. In fact, it might virtually erase all the performance benefits of the redundant superpeer architecture.

   And second, an explicit grouping of the superpeers and their leaves into clusters as shown on Figure 7 and Figure 8 is complicated and difficult to achieve and maintain. In practice, it is much simpler to just open k connections to random superpeers from every leaf node and send every query to these k superpeers.

   In this case the clusters are not even explicitly defined and layed out - however, this practical and easily implementable approach to superpeer redundancy is statistically (traffic- and search-wise) equivalent to the explicit k-redundant clustering. The query propagation in such a network is shown on Figure 9:

(a)

(b)

Figure 9. 2-redundant superpeer network with no explicit clustering.

   In order to understand why this 'implicit clustering' method is equivalent to the explicit one in Figure 7 and Figure 8, let's start with a plain superpeer network like the one analyzed in Section 4.1, but with leaf hosts connected to every superpeer. The average query reach (21) in such a network will be:

superpeers, files (33)

   Now let every leaf node establish another k-1 connections to the randomly selected superpeers and transfer its file index to them. This operation won't change the number of superpeers reached by the query (assuming that the superpeer bandwidth consumed by such an index transfer is small enough, as stated in Section 4.1), but will increase the number of files locally searchable on every superpeer to . So after that operation, the query reach (33) will be transformed into:

superpeers, files (34)

- which is exactly equivalent to the search performance of the explicitly clustered k-redundant superpeers (31).

   This analysis also explains why the performances of the searches in Figure 8 and Figure 9, (a) and (b) are the same. The performance gain is achieved not because the original query goes to more superpeers from the leaf node - if all leaf nodes use the same initial querying strategy, the flow control will limit the propagation of their queries to the same average number of superpeers regardless of the query message delivery mechanism. Instead, the performance gain is due to the fact that in the redundant clustering approach the information about the data items (files) stored on the leaf nodes is more widely disseminated throughout the network and becomes locally searchable on the additional superpeer hosts.

   So the choice between the initial query sending methods 8-9(a) and 8-9(b) should be determined by other considerations than the average query reach. The multiple-superpeer query launch case 8-9(b), for example, can have the lower search latency, since more queries are injected into the network simultaneously when the search is started. But on the other hand, it might make it more difficult to control the number of the returned results if the superpeers would be using the 'random walkers' [11] - since all the initial queries are independent, it is difficult to control the cumulative total number of results returned by these queries, especially if the redundancy level k is high.

   If, however, the network uses the plain Gnutella [1] brodacasts to deliver the query messages to the remote superpeers, it might make sense to control the number of returned results by sequentially (after a small timeout) launching multiple queries from the leaf node to the different superpeers. That approach will improve the overall network effectiveness at the cost of the increased search latency. As it was already said, though, such considerations are outside the scope of this study and are orthogonal to the optimization techniques discussed here. This whole subject is mentioned only to clarify the issue of the different initial query launch scenarios.

4.5. Optimal redundancy level.

   Now let's see what would be the optimal redundancy level k for the k-redundant superpeer clusters analyzed in Section 4.4. The study [9] limits its analysis with k=2, since the number of the network connections grows as k squared when k is increased. This limit, however, is largely an artificial one - after all, the number of connections grows only linearly on every network host, so the amount of connection-related host resources required for k-redundant clustering is linearly proportional to k.

   In fact, the Kazaa file-sharing network [2] seems to have every leaf node connected to three different superpeers [13], which is equivalent to 3-redundant clustering and is probably one of the reasons why its search performance is superior to the competing networks like Gnutella [1]. Also, it is easy to see that when k is much smaller than , the search query reach (31) and search performance increase coefficient (32) grow linearly with k. This kind of search performance improvement is not something to be discarded lightly.

   So here we'll try to find the redundancy level that would optimize the search performance without constraining ourselves with any artificial limits for k, but rather trying to establish what are the real redundancy growth limits in a peer-to-peer network.

   To begin the analysis, let's assume that the average superpeer bandwidth is not affected by the redundancy level k. This is not true, of course. The k-redundant superpeer network requires k times more hosts in the superpeer roles than the plain superpeer network analyzed in Section 4.1. So in reality increasing k requires less and less capable hosts to become superpeers, which inevitably lowers the average superpeer bandwidth. But for now, let's just assume that in our k-redundant network no special bandwidth-related superpeer selection is performed, so their average bandwidth is equal to the average network host bandwidth as defined in (14)-(16):

(35)

   Then all the search performance improvements due to the better-than average superpeer bandwidth will become a 'bonus'. Later on, we'll also show what will be the size of that 'bonus' for a network with a host bandwidth distribution specified in Table 1.

   In the assumption (35), the performance increase coefficient (32) becomes:

(36)

   For k>1 (that is, when the superpeer network is redundant to at least some extent) this function grows when k or are increased. So in principle we could achieve the infinite search coverage by simultaneously increasing the values of these arguments. 'Simultaneously' is important here - function (36) obviously cannot exceed . In practice, though, there are several reasons why this infinite coverage won't be possible:

   Since most of the hosts in a peer-to-peer network are likely to be the consumer-grade PCs, let's see what happens when the superpeer redundancy level is limited with:

k=100 (37)

   The plot of the search improvement coefficient function (36) versus the number of superpeer leaf nodes for this redundancy level is presented on the Figure 10 below. (As suggested by (35), the superpeer hosts are not selected for their superior bandwidth here and have the same average bandwidth as the network as a whole.)

Figure 10. Search improvement in a 100-redundant superpeer network.

   Here we see that when there are 500 leaf nodes for every superpeer, this redundant superpeer network can improve the effective search request reach by a factor of 84 over the flat peer-to-peer network analyzed in Section 3. Further increase of the number of superpeer leaves does not seem to be justified, since it won't dramatically improve the search performance anyway - with this redundancy level, the performance improvement coefficient cannot exceed one hundred regardless of value.

   Even though the required number of superpeer nodes grows when the redundancy level is increased, with k=100 and only about 1/6 of all network hosts have to become superpeers. This makes it possible to further improve the search performance by selecting the best-bandwidth hosts for the superpeer roles when is increased. The more leaf nodes the superpeer has, the less superpeers are required, which should increase the average superpeer bandwidth and search query reach as defined by the equation (32).

   Figure 11, for example, shows the plot of the search improvement coefficient versus the number of superpeer leaf nodes when the network has the host bandwidth distribution specified in Table 1:

Figure 11. Search improvement in a 100-redundant network with best-bandwidth superpeers.

   We see that when , it is possible to select only the broadband hosts for the superpeer roles, and the search improvement coefficient can be 195 vs 84 in Figure 10. At this point the average superpeer bandwidth can be as high as 16.8 kilobytes per second as opposed to 7.2 kilobytes per second for the network as a whole. Increasing the number of superpeer leaf nodes further, we can have only the best-bandwidth hosts (20 kilobytes/sec) in the superpeer roles, which could bring the search improvement coefficient to 250 if these best hosts would be capable of supporting 900 simultaneous TCP connections and would have about 18 megabytes of RAM to spare for the leaf nodes' indices.

   The last two assumptions are probably not very realistic, but even the 200-fold search improvement is very impressive. Comparing this with the flat network performance (17) and (18), we see that such a massively redundant superpeer network is capable of searching the content on

hosts, (38)

with a total of

files (39)

accessible for the search. These numbers are coming close to the ones necessary to perform a full search on the biggest of the modern file-sharing networks like Kazaa or Gnutella (though not quite achieving that goal - Kazaa is currently reporting up to 3M+ hosts on-line at the same time; Gnutella numbers are probably similar).

   It is also important to note that when these numbers for redundancy and leaf nodes per superpeer are used, the index transfer bandwidth limits mentioned above won't play a significant role yet. A 5,000-bytes per second modem host can fully transfer its 20-kilobyte index to 100 superpeers in about 400 seconds, which is about ten percent of its one-hour average session time. A 16,000-bytes per second superpeer would be able to receive 500 leaf indices in about ten minutes - but it probably won't have to. There is a high probability that the broadband superpeer will have an asymmentric link, whose incoming bandwidth will be much higher and mostly unused anyway.

   So the performance numbers (38) and (39) should not decrease by more than ten percent even when the index transfer bandwidth will be taken into account. More serious is a problem of the uniform superpeer load. If, say, an average superpeer is RAM-bound and has a limit of 500 leaf hosts, a special care should be taken to always keep its number of leaf nodes as close to this level as possible. Figure 11 shows that if, for example, the network connection maintenance algorithms cannot fully load an average superpeer, and it has only 250 leaf nodes, then the performance improvement coefficient will drop to only 125, which is much worse than the performance numbers above. Even worse, some modem-connected hosts will become superpeers, and the search performance will be further degraded, since their effective bandwidth will be lowered even more because of the bandwidth consumption by the index transfers, which might take as much as half of the average session time (2,000 seconds to transfer 500 leaf node indices). This problem is discussed in detail in the next section.

4.6. Superpeer network problems.

   Even though the superpeer architecture has a potential to deliver a very significant search perfomance increase, the efficient superpeer usage necessary for that increase might prove to be surprisingly hard to achieve. When the superpeer host goes on-line, someone has to direct the leaf hosts to connect to it as quickly as possible - otherwise it may take quite a while (possibly a full superpeer session duration) to load it to full capacity. In that case the average number of the superpeer leaf nodes might be only half of its peak capacity, causing the performance drop discussed at the end of the previous section.

   Similarly, the redundancy level optimization techniques discussed in Section 4.5 assumed that the network can be optimized from a central location that has a full knowledge of the network host bandwidth distribution, of the practically achievable redundancy level, and of the maximum number of superpeer leaf nodes for the existing network hosts.

   If this is not the case, and the network is fully decentralized, every host has to make its own decisions about becoming a superpeer or not, which can significantly affect the network performance. If, for example, too few hosts decide to become the superpeers, the leaf nodes might not have enough redundant connections, since the superpeers will be fully loaded with leaf indices and won't accept more leaf nodes. That will cause the search performance to drop almost proportionally to the redundancy level k decrease.

   If, on the contrary, there are too many superpeers, they won't be able to fully utilize their storage capacity, causing to drop and also decreasing the search performance (31), (32) in the process.

   It is not entirely clear how to achieve the full superpeer utilization without a centralized (or semi-centralized) connection director and superpeer role assigner, whose function might be performed by some network component similar to Kazaa index servers [13]. (It is not clear whether Kazaa index servers are really trying to keep the number of leafs on every superpeer host as close to its maximum capacity as possible. This functionality is not crucial for Kazaa, since it seems to have a relatively low redundancy level of 3, and the issue of maximizing might be not very important to it - in any case, secondary to the maximization of the average superpeer bandwidth discussed in Section 4.2).

   So the fully decentralized superpeer network is very likely to be in a less-than-optimal state most of the time. For example, the realistic average number of leaf nodes might be about 250, and the search improvement coefficient - about 125, as we can see from Figure 11. Or the average redundancy level k might be 50, which will give us about the same search improvement coefficient value of 125 - the performance drop is not exactly linear, since with the reduced k less superpeers are required and their average bandwidth increases.

   So the realistic numbers for the search query reach in a fully decentralized supepeer network might be significantly lower than the ones suggested by (38) and (39). For the search improvement coefficient value of 125, these numbers will be:

hosts, files (40)

- assuming that the decentralized superpeer role assignment algorithm will be effective enough to achieve either the optimal redundancy with a 50-percent superpeer load, or the full superpeer load with a 50 percent of the optimal redundancy level. If the decentralized superpeer assignment and/or the connection director algorithms would not be good enough to do that, the search performance will be degraded further. Little research was performed on the decentralized superpeer assignment and connection management algorithms, so it is difficult to predict the actually achievable numbers here.

   Finally, even these results depend on the fact that the superpeers have the asymmetric bandwidth with a high downlink capacity. If that wouldn't be the case, a significant share of the bandwidth normally used for the search would be used for the index transfers. With the average session time , a superpeer with the incoming bandwidth can receive bytes of data over its session lifetime. Since the connection between the superpeer and its leaf can be broken when either the superpeer session or the leaf node session ends, the average connection lifetime is two times smaller than the average session length . This means that with the average number of superpeer leaf nodes there would be, on the average, two index transfers for every superpeer link over the superpeer session time, and the number of index transfer bytes received by the superpeer during the session will be (a somewhat more detailed explanation of that average connection lifetime value is provided later, in Section 5.3 - see the equation (46)). Thus the effective superpeer search bandwidth can be found as:

(41)

- or, in the assumption (1), approximately 2,800 bytes per second less than the original superpeer bandwidth, if the average number of superpeer leaf nodes is 250. When the equation (41) is used to determine the average superpeer bandwidth for the search improvement coefficient function (32) instead of the original host bandwidths from Table 1, the chart on Figure 11 is transformed into Figure 12:

Figure 12. Search improvement in a 100-redundant network with symmetric superpeer links.

   This chart is interesting for two reasons - first, it shows that if the superpeer links are symmetric, the search performance is degraded further, bringing the search query reach close to just a million and a half hosts when the average number of superpeer leaf nodes is about 250 (half of the full superpeer capacity, assuming that an average superpeer has about 10 megabytes of RAM to spare for the storage of the leaf nodes' indices). And second, it shows that when the bandwidth used to transfer the leaf indices to the superpeers cannot be discounted as negligibly small, there is an optimal number of the superpeer leaf nodes (close to 500 in our model case), and further increase of their number actually harms the search performance.

   This effect is caused by the decrease of an effective average superpeer bandwidth in the equations (31) and (32) when the superpeer has too many leaf nodes and has to receive too much index data from them. Naturally, this is also the case in the asymmetric-bandwidth superpeer scenario - it is just that in our sample network defined by the assumption (1) and Table1, the incoming superpeer bandwidth was presumed to be sufficiently high for this optimum to be outside the plot area on Figure 11. The lack of smoothness of the curve on Figure 12 is caused by the discrete character of the network hosts' bandwidth distribution specified in Table 1. After 400 leaf nodes no modem hosts are used in the superpeer role anymore, and after 900 - only the best-bandwidth hosts are being used as the superpeers. The similar plot for the real peer-to-peer network should not have these clearly visible anomalies, although the general character of the plot should stay the same.

5. Mutual Index Caching.

   Another approach to the peer-to-peer search optimization is the mutual index caching. In this approach the peer-to-peer network hosts store not only the indices of their own data, but also the indices of the data located on other hosts. Doing so requires every peer-to-peer host to set aside some space for these indices and to somehow obtain them from the other network hosts, effectively becoming a superpeer.

   This approach has been repeatedly suggested before ([10], [11] and multiple suggestions about passive query reply monitoring on Gnutella Developer Forum [1], to name just a few). It is important that here we are talking about mutual caching of only the indices - not the content itself. Storing the content on behalf of other users can be very burdensome for the user and for the host hardware, whereas the requirements for the mutual index-only (metadata) caching are much lower.

   It seems to be clear that disseminating the content throughout the network should make it easier to find. Little quantitative analysis, however, has been performed in that area, and this study tries to correct this oversight. Our goal here is to suggest a caching strategy with the widest possible search query reach. This goal is not the only one imaginable - study [10], for example, did perform the quantitative analysis and simulation of the 'Local Indices' mutual index caching strategy, but in an assumption that the strategy goal is to minimize the system load (query cost) in terms of the processing power and bandwidth when the number of returned search results is fixed.

   This goal led the authors to recommend the caching depth of 1 for the Gnutella network with a query-to-join ratio of 10, which in our terms would correspond to the average session length of 2000 seconds instead of 3600 seconds in (1). This caching depth means that every Gnutella node should store the indices of its immediate neighbours. Study [10] also suggested that the caching depth should be increased if the nodes would spend more time on line, and their query-to-join ratio would be higher.

5.1. Mutual index caching performance.

   Let's have a look at the mutual index caching performance when the aggregate system load is of secondary importance, and the primary goal is to increase the query reach.

   Looking back at the analysis of the flat peer-to-peer network performance in Section 3, we can see that the mutual caching allows every host to have more files available for the search. Designating the number of hosts whose indices are made accessible on the average network node in addition to its own index as , we'll have the number of files searchable in such a network change from a value determined by (13) to:

files, (42)

which gives us a search performance improvement coefficient of

(43)

- so for the Gnutella network with four connections per host, the mutual caching with a depth of one will improve the effective query reach by a factor of up to five.

   It is not a coincidence that we're using the variable for the number of other hosts' indices stored on an average peer-to-peer network host. This variable was already used in Section 4 to designate the number of leaf nodes for the superpeer host, and is reused here because in both cases this variable defines how many 'foreign' indices are stored on the host. The limits and expected values for this variable are very similar, whether it is the number of superpeer leaf nodes or the number of mutually cached indices.

   Comparing the search improvement coefficient (43) with (32) and with the plot on Figure 11 that shows the best possible search performance (when the index transfer bandwidth is not a limiting factor) for the 100-redundant network defined by the assumption (1) and Table 1, we see that the flat network with the mutually cached indices has a potential to significantly outperform the k-redundant superpeer network. Figure 13 below shows the search improvement coefficients for 100-reduntant network (same as on Figure 11) and for the flat network with the mutually cached indices on the same chart:

Figure 13. Search improvement in a 100-redundant network and in a network with mutually cached indices.

   If every network node caches 500 indices from the other network nodes, the search improvement coefficient for the mutually cached indices network will be 501, and the potential (best achievable) search reach for our model network can be as high as:

hosts, files. (44)

- which is 2.5 times better than the 100-redundant superpeer network result (38), (39).

   This is hardly surprising, of course. The power of the peer-to-peer network is ultimately determined by its ability to harness the power of the individual network hosts, making every host contribute to the best of its abilities, however meager these abilities might seem at the first glance. Both mutual index caching and redundant superpeer clustering approaches achieve their search improvements by massively disseminating the data indices of network hosts over a large number of nodes - remember, that was exactly the reason why the k-redundant superpeer outperformed the plain superpeer network (see the analysis in Section 4.4).

   The difference between mutual caching and superpeering is that for a variety of reasons, the superpeer-based approaches try to fully utilize the local search and query routing abilities of only the best-connected network hosts, discarding the potential contribution of the less-capable ones.

   The summary contribution of these less-capable nodes, however, may be quite substantial, as it can be clearly seen on Figure 13. We can see that as the number of stored indices grows, the difference between the superpeer and mutual caching performance becomes more and more pronounced. The reason is quite simple - as the number of superpeer leaf nodes grows, less and less hosts are becoming the superpeers and performing the local database searching and query routing. Of course, at the same time the average superpeer bandwidth grows, but this effect cannot overweight the fact that the cumulative network potential becomes more and more underutilized.

   As a matter of fact, mathematically speaking, the equations (42) and (43) represent the best theoretical search performance of the peer-to-peer network with a non-selective query routing regardless of its architecture and topology. To put it simply, you cannot beat these results.

   But there are plenty of ways to never reach them.

5.2. Optimal search conditions.

   The optimal peer-to-peer search improvement results (42) and (43) can be achieved only if two conditions are met: first, that the mutually cached indices on the hosts accessible by a single search will not overlap, and second, that the bandwidth used to transfer the indices will not significantly decrease the bandwidth used for transferring the search queries. Otherwise, the search performance will degrade accordingly.

   The first condition is easy to meet - or at least, to make the index overlap negligibly small. One way of doing so would be to increase the number of connections (outdegree) of the average Gnutella node, which, by the way, would also fulfill the recommendation of study [9] that suggests that having the bigger average peer outdegree would reduce the average response path length, reducing the bandwidth consumed by the responses. Increasing outdegree will also reduce the average search latency.

   Here, however, we are mainly concerned with the fact that increasing the outdegree will tilt the histogram of responses vs hop number (distance from the query originator node) towards the last hop. If the average Gnutella host has four connections, the search query would be multiplied as 1-3-9-27-81-... during the search like shown on Figure 3 and we can expect about 2/3 of all the responses to arrive from the last-hop nodes, whereas 1/3 of the responses would arrive from the 'inner' nodes and would largely consist of the duplicate results. Further, even the last-hop responses would contain at least 1/4 of the duplicates - on Figure 3, for example, hosts a and e both receive the same indices to cache through the connections from the 'parent' host d.

   Increasing the number of host connections to ten will give us the query multiplication 1-9-81-..., and only 1/9 of the results will arrive from the 'inner' nodes. Similarly, only 1/10 of the last-hop host cached indices will be received from their 'parent' hosts, whereas 9/10 of the data will be unique. Of course, some index overlap will always exist. Also, increasing the average number of connections on the existing network might be inconvenient for a variety of reasons.

   So another approach to minimizing the mutually cached index overlap is to use separate and independent networks for the index transfers and for the search query delivery. For example, the same host might open two times more connections and use the new ones for index transfers only - no queries will go across them. These new connections are represented by the dotted lines on Figure 14 - for simplicity, it shows the index transfer connections and the index transfer network only for the host d.

Figure 14. Separate networks for querying and index transfers.

   When the host g issues the search query, it goes to d, which retransmits it to a and e only, but not to k and j, which are on the different logical network, as far as the host d is concerned. Similarly, the host d stores the data indices for k, j, l, m, and so on, received over the dotted-line connections, but not for the hosts g, a, and e. This approach allows to achieve the negligibly small overlap of the mutually cached indices when the network size is sufficiently large and the connection establishment is randomized (see the assumptions in Section 2.1 and Section 2.2).

   Other approaches to minimize the overlap are also possible - the study [10], for example, suggests using a 'Local Indices' algorithm that selectively searches only the hosts separated from the query originator by a predetermined number of hops. This algorithm, though, underutilizes the potential capabilities of the hosts when they only route the query message without doing a local search. It won't be analyzed here in detail, since it is probably not optimal because of this waste of the network resources.

5.3. Index transfer bandwidth.

   The issue of the bandwidth spent to transfer the mutually cached indices between hosts is more complicated. Of course, in the Gnutella example of a mutual caching with a depth of one from the previous section it is not an issue at all. Even a modem host with 5,000-bytes per second bandwidth can transfer four 20-kilobyte indices to and from its immediate neighbours in just 16 seconds, which is less than one percent of the total one-hour average session time. So the search performance degradation caused by this transfer will be negligible.

   But so will be the reward - five times better search query reach might sound impressive only until it is compared with the potential improvement from the other strategies - like massively redundant superpeers analyzed in Section 4.5.

   Obviously much better search performance can be achieved if the hosts cache much more data than the indices of their immediate neighbours. And as the hosts transfer more index data, the bandwidth consumed by these index transfers starts to play a more significant role in the overall system search performance. The resulting performance decrease will be similar to the gradual performance decrease shown on Figure 12 for the k-redundant superpeer network. If the average session length can be increased at will, it is easy to make the bandwidth consumed by the index transfers arbitrarily low. In this case the search performance can approach its theoretical peak defined by (42), (43) and shown on Figure 13.

   Of course, even with an infinite session length some bandwidth will be consumed by the index updates that have to be propagated through the index transfer network (dotted lines on Figure 14) when the local indices on the hosts are changed. This happens when the new content is downloaded on the hosts, when the unused files are deleted by the users, and so on. The study [9], however, suggests that this index update bandwidth is much lower than the index transfer bandwidth necessary for the initial index distribution when the host joins the network. In this study we won't be paying attention to the index update bandwidth, concentrating only on the initial index transfer one.

   So the simplest way to approach the peak search performance is to increase the average session length. Unfortunately, however, many factors that determine the average session length are out of the developer's control. As we'll see later, the session lengths that are typical for the peer-to-peer applications today (see Section 2.5 and assumption (1)) produce the index transfer bandwidth that can make the actual search performance significantly lower than its theoretical maximum. In the ad hoc networks it is also difficult to make hosts connect to the same addresses every time and cache their index data on the remote hard disks when these hosts are not on-line, as suggested in [20]. So in this study we assume that the index transfer bandwidth is not negligibly small and can negatively affect the search performance.

   In order to estimate the performance degradation due to the mutual caching index transfers, let us first see how big we'd like the mutually cached indices to be - find the other (non-bandwidth) limits for index transfers, moving to the bandwidth-related limit estimates later.

   This is something that we've already done when we were trying to find the best possible search performance of the k-redundant superpeer architectures in Section 4.5. There we tried to find the factors that limit the number of the superpeer leaf nodes , and identified the storage (RAM) limit on the superpeer, number-of-connections limit, and the index transfer bandwidth limit.

   Let's forget about the bandwidth factor for a second - we are trying to find the other limits here. The number of connections is not an issue when the mutual index transfers are performed in a tree-like fashion, as shown on Figure 14. Then the maximum number of the mutually cached indices on a host is defined by its memory, and - to a lesser extent - by its CPU speed. This study disregards the CPU-related limit, assuming that it is the host's query interface speed (bandwidth) that limits the local search capability of the host, not its CPU speed (see Section 2.6).

   For the typical peer-to-peer network host computer its RAM size probably won't be strongly correlated to its connection bandwidth. So we'll assume that the amount of memory that an average network host will be able to spare for the mutually cached indices will not be very different from the amount that an average superpeer host can spare for its leaf nodes' indices. If, on the average, this amount will be 10 megabytes, we might expect that an average host will be able to store the indices from 500 other hosts.

   The host bandwidth requirements for this amount of index data to be transferred to the host were already evaluated in Section 4.6 (equation (41)), where we were looking at the bandwidth consumption of the leaf-to-superpeer index transfers. The main difference between equation (41)) and the mutual index scenario is that in the mutual caching case (see Figure 14) the hosts are not necessarily connected directly, by a single link. This obviously lowers the effective 'connection' lifetime between hosts. For example, after the index of the host m on Figure 14 is transferred to the host d, the 'virtual', two-link connection between d and m can be broken when either host m leaves the network, or when the host j does. In either case the host m data already transferred to d will have to be discarded, and some other data will have to replace it in order to keep the number of cached indices on host d constant.

   So the average bandwidth taken by the index transfers is going to be higher than specified by (41)), unless all the mutual index transfer connections are going to be direct. This, however, might require hundreds of open connections on every network host, which might be difficult to achieve in the Windows environment. Another possibility would be to open a short-lived direct connection, transfer the index data over it, close the connection and then just periodically reopen it to verify that the remote host is still there, that its IP address was not reassigned to some other host, and that no changes in its index data took place since the last check. This is possible to do, but it would complicate the code, making, for example, the direct reuse of the basic Gnutella kernel code for that purpose impossible.

   Let us estimate the index transfer requirements in the general case, when the average caching depth is , and two hosts are separated by links, as hosts d and z on Figure 15:

Figure 15. Multi-link connection between hosts.

   If we assume that the probability of the host session termination during the infinitely small time interval dt is , the probability that the host session will last at least t seconds will be , and the probability that the host session length will be between t and t+dt will be .

   This gives us the average host session length of:

seconds. (45)

   The full chain of links between d and z on Figure 15 can survive only until any one of the involved hosts will decide to terminate its session. The full number of the hosts involved in the chain is (including the host d itself).

   So the probability that at least one of these hosts decides to terminate its session during the infinitely small time interval dt will be , and repeating all the calculations that led us to (45), we receive the following equation for the average lifetime of the compound connection:

seconds. (46)

- which, by the way, also explains why the average direct connection lifetime value of was used to arrive to the equation (41). There we just implicitly used the equation (46) without providing the detailed explanations.

   In the multi-level Gnutella-type mutual index caching scenario, most of the content belongs to the nodes that are separated from the 'central', caching node d by a maximum number of hops . This allows us to repeat the reasoning that led us to (41) for the multi-link connections, and approximately estimate the average incoming bandwidth used on the host by the mutual index transfers as:

(47)

so the effective bandwidth left for the search queries on the i-th host will be:

(48)

   These equations show that generally it is better for the search performance to minimize the caching depth, unless other limits - like the maximum number of the opened TCP connections - prevent it. Throughout the rest of this study we'll assume that

(49)

- this number allows a host with 25-35 open index transfer TCP connections to access the indices from 600-1,200 other hosts within a two-hop radius, 95+ percent of these hosts being separated from the 'central', caching node by two network hops.

5.4. Host search value maximization.

   The equation (48) marks the i-th host bandwidth value and the number of the other hosts' indices stored on it with this host number i. This is done to underscore the fact that the network hosts might have different bandwidths, and choosing the same number for all hosts regardless of their bandwidth might prove to be a suboptimal solution from the search performance standpoint. If the host bandwidth is low, an excessive volume of the index transfer traffic might decrease its bandwidth available for the search queries to the unacceptable value. The search perfomance might be maximized if, depending on the individual host bandwidths, different amounts of content will be made available for the local search on these hosts.

   In order to find the best values of for different hosts, let's introduce the i-th host search value as a function of . The host search value is proportional to the host contribution to the overall network resource pool. It is defined as the host bandwidth available for the search queries multiplied by the number of host data indices that can be locally searched by the queries that arrive to this host. In the simplest case of symmetric host links, when all hosts have the incoming bandwidth equal to the outgoing one, the effective incoming bandwidth available for the search is defined by the equation (48), and the host search value can be found as:

(50)

   The host search value is measured in bytes per second and might be viewed as an effective bandwidth that the host should have in order to deliver the same amount of search service to the network if it would be searching only its own content. Similarly to (9) and (11), the overall network search performance (number of files reached by the query) can be found as:

, (51)

where is the average host search value:

(52)

   The search improvement coefficient can be found as:

(53)

   In the symmetric-link network case the search values of all network hosts are independent, since all hosts fully utilize their incoming and outgoing bandwidth (are fully loaded) and the search performance cannot be increased by redirecting some load towards the less loaded hosts. So the network search performance is maximized when the search value of each individual host is maximized. For now, let's assume that all network hosts' links are symmetric - the asymmetric connection case will be analyzed later.

   The optimal value of the number of other hosts' indices stored by a host is easy to find from (50) - this value is:

(54)

   This number, however, might be unachievable in practice, if the host cannot dedicate a sufficient amount of RAM to store these indices. So designating the RAM-based limit for as , we arrive to the following formula for the value of that would maximize the host search performance:

(55)

- it means that if the value is not limited by the RAM-based limit , the optimal query/index transfer bandwidth split is approximately 1:1. After the RAM-based limit is reached, the index transfer bandwidth becomes frozen at a constant level, and proportionally larger bandwidth can be used for the search queries.

   Let's apply these results to the symmetric-link network defined by the assumptions (1), (49), and host bandwidth distribution from Table 1, and find the optimal and host search values for this network's hosts. We also assume that every host can use up to 10 megabytes of RAM for foreign indices, making it possible to store the indices from 500 other hosts.

Bandwidth
(bytes/sec)
Resulting Effective search
bandwidth
(bytes/sec)
Host search
value
(bytes/sec)
5,000 500 150 150 2,500 375,000
12,000 500 360 360 6,000 2,160,000
20,000 500 600 500 11,700 5,830,000

Table 2. Effective search value for different host bandwidths.

   Here we can see that only for the best-connected hosts this RAM-based limit for can actually be reached. For all other hosts an attempt to fully utilize the 10-megabyte RAM space available for index storage would actually result in a lower search performance because of the extra bandwidth spent to load and to maintain this index volume.

   The average host search value for this peer-to-peer network can be found as:

bytes per second, (56)

and the search improvement coefficient is:

(57)

   This search improvement coefficient is higher than 130, which was found earlier (Figure 12) for the similar symmetric-link network with 100-redundant superpeer clustering and the same foreign index storage limit of 500. In fact, for this network the performance of mutually cached indices is uniformly better than the performance of the 100-redundant clustering regardless of the RAM-based foreign index storage limit . Figure 16 below plots the performances of these two search optimization strategies on the same chart:

Figure 16. Search performance of mutually cached indices vs 100-redundant superpeer clustering in the symmetric link case.

   Here we see that the mutual index caching performs better than 100-redundant superpeer clustering even when the plots are compared at the same value of , which presumes the existence of some effective, maybe even centralized mechanism for the superpeer role assignment in the 100-redundant superpeer network (see Section 4.6). If the role assignment mechanism is not fully effective and won't guarantee the optimal superpeer role assignment, the difference in the search performance will become even more pronounced.

   Note that the mutual caching search performance does not drop when keeps growing, as it was the case with k-redundant superpeers. When exceeds the value defined by (54), the equation (55) stops the further growth, freezing the search performance at its maximum level and preventing its degradation.

   This mutual caching performance, however, is dependent on the full bandwidth utilization by all hosts. We have already briefly touched this subject in Section 3.3 - there it was mentioned that in order to fully utilize the bandwidth of all hosts, in Gnutella case the number of connections opened by the host should be proportional to the host bandwidth. This is true for the mutual index caching, too - but only for the query message delivery connections.

   On the Figure 14 it is easy to see that the host belongs to two virtual networks: the index transfer one (dotted lines) and the query delivery one (solid lines). The incoming traffic in the query delivery part of the network is defined by the number of the locally cached indices and does not depend on the number of index transfer connections. So it may be viewed as constant, which leaves the rest of the bandwidth (48) for the incoming query message traffic. The equation (48) defines the effective bandwidth of the query delivery network host, so it is this bandwidth that should be used to estimate the desirable number of the query delivery connections. Using the full host bandwidth for that purpose will leave some network bandwidth unused and reduce the overall search performance on Figure 16. We'll analyze the extent of such performance degradation later in Section 5.7.

5.5. Asymmetric link case.

   Just as it was the case with the k-redundant superpeer clustering, the mutual index caching search performance plotted on Figure 16 can be significantly increased if some of the hosts have asymmetric links with the high incoming bandwidth. The analysis of this scenario, though, is more complicated than the previous asymmetric link case analysis performed in Section 3.2 and Section 3.4.

   There we assumed that all hosts are equal in terms of the index storage, so it did not matter which host received the search query - it always was performing the local search against the same-size array of index data, and the overall search performance could be easily estimated from the total and average outgoing host bandwidths (15). The previous section analysis, however, shows us that the best performance of the mutual index caching strategy is achieved when the amount of the individual host index storage maximizes its search value, which implies the different index database sizes on different hosts.

   As a result, the query message destination becomes a matter of significant interest in this case - the query message search value becomes different, depending on the amount of content stored on the query-receiving host. That fact did not really matter while we were analyzing the symmetric-link network, since then all the hosts' incoming and outgoing bandwidths were saturated in any case, and it did not matter where the individual queries were routed - the total amount of search service performed by these queries stayed the same (52), because all hosts were already operating at their maximum performance.

   The asymmetric links change this situation. Assuming that the typical asymmetric link has the higher incoming (downlink) bandwidth, the summary incoming bandwidth of the network is higher than the outgoing one:

(58)

   This means that the incoming bandwidth of all network hosts cannot be saturated anymore, and the equation (50) for the host search value should be written as:

, (59)

where is the actual bandwidth of the incoming query message stream received by the i-th host, which might be lower than the effective search bandwidth value defined by (48).

   Then the task of maximizing the network search performance can be defined as maximizing the summary search value of all network hosts:

(60)

   Designating the actual incoming bandwidth consumed on the i-th host by the mutual caching index transfers as , from (47) we can find the number of foreign indices loaded and maintained by the host as:

(61)

   Since we are interested in the search performance maximization, we can assume that on the hosts that contribute most to the overall searching service volume performed by the network, is much bigger than 1. So after discounting '+1' in (60) and substituting (61) into it, we can transform (60) into:

(62)

   At the peak search performance the summary incoming bandwidth actually used by all network hosts is defined by the summary outgoing bandwidth of the network:

, (63)

and the additional constraints imposed on the data streams by every host incoming bandwidth and RAM limits, are:

   

, and (64)

   It is possible to solve the optimizational task (62), (63),(64) for any existing set of network hosts, even though it is much more difficult to write the general-form solution. Here, just to provide an example, we'll solve this task for our usual sample network defined by (1), (49), and Table 1 when the broadband network hosts (rows 2 and 3 in Table 1) have asymmetric links with the effective incoming bandwidth of:

bytes per second. (65)

   If the links would be symmetric, that would be exactly the network already analyzed in Section 5.4, whose optimal performance is reached at the bandwidth distribution defined by the equation (55) and shown on Figure 16.

   Adding more incoming bandwidth to the broadband hosts in this symmetric-link network allows us to divert some outgoing traffic - in both search query and index transfer streams - to the broadband hosts from the modem ones. Of course, that will lower the search values of the modem hosts, so diverting traffic can be justified only if the resulting summary increase in the search value of the broadband hosts will exceed the summary search value decrease in the modem part of the network.

   This is really going to be the case - the search value of every query is proportional to the index storage size on its destination host, and the search value of every index record is proportional to the incoming bandwidth of the query stream that is delivered to the host with that record. Table 2 shows that the broadband hosts already had the higher index storage sizes and incoming query streams even before any traffic was diverted to their high-bandwidth downlinks. This makes every diverted byte of data more valuable when it arrives to these broadband hosts instead of the modem ones, and the difference becomes more pronounced as both index sizes and incoming data streams on the broadband hosts keep growing. Ideally, from the equation (62) viewpoint, we'd like to fully saturate the incoming links of the broadband hosts with data streams that would be equally split between the query and index transfer traffic, as in the equation (54).

   However, this is impossible for two reasons: first, an average outgoing bandwidth of 7,200 bytes per second (16) of this network is not enough to fully load the incoming channels (65), even though the hosts with such channels represent only 20 percent of all the network hosts. The best we can hope for is either 36,000 bytes per second for all broadband hosts, or 50,000 bytes per second for 14 percent of all hosts.

   Second, even if the hosts have enough RAM to store 1,000 indices, the incoming index transfer bandwidth necessary for that purpose (47) will be just 16,667 bytes per second, which is less than 1/2 of available incoming bandwidth.

   So the index storage size on the broadband hosts will be RAM-limited in any case. Every broadband host that is chosen for the search value maximization will store indices from other hosts, and consume the corresponding amount of bandwidth for the index transfers. This makes it worthwile to select only 14 percent of the hosts for the search value maximization, but utilize all the incoming bandwidth they have - since every host with indices consumes the same amount of index transfer bandwidth, it makes sense to minimize their number, thus leaving more network bandwidth resources for the transmission of the queries.

   To determine the search performance of such a network we can use the equations (50), (52), and (53), where most of the hosts will have the zero search value, and only some of the hosts (their share is , which is 7,200/50,000=0.14 in our case) have the search value of:

bytes per second, (66)

so the average host search value can be found as

bytes per second, (67)

and the search improvement coefficient as

(68)

   When is 500, these equations give us the average host search value of 3,000,000 bytes per second, and the search improvement coefficient of 417. Figure 17 shows the search improvement coefficient function (68) together with the performance plot of the 100-redundant superpeer network with the same bandwidth distribution and asymmetric links (that curve was taken from Figure 11).

Figure 17. Best performance of mutually cached indices vs 100-redundant superpeer clustering in the asymmetric link case.

   This performance, however, is very hard to achieve. The assumption that led us to (68) was that all the outgoing traffic from all the network hosts should arrive to the hosts that perform the local searches. But we saw that in order to maximize the performance, only the best 14 percent of the hosts should store the other hosts' indices and do the local searching - otherwise the best-performing hosts won't be fully utilized, and the search performance will suffer. So all the traffic from the rest of the hosts should go directly to a small percentage of the hosts that carry the bulk of the search load.

   Such a network is very similar to the redundant superpeer network analyzed earlier - the main difference is that it tries to maximize the incoming data streams of the asymmetric-link superpeers (though GUESS [12] tries to do that, too) and to filly utilize the outgoing bandwidth of the leaf nodes. In practice it might look, for example, as a network in which the local searching and the bulk of the index data transfers are performed by the 'central core' of the interconnected superpeers, and the low-bandwidth 'leaf' hosts are primarily used as the query multiplicators with the large number of per-host connections.

   These leaf nodes would have a very low incoming traffic and send every incoming query to a large number of superpeers and to a few next-stage query multiplicators. This way, almost all outgoing query traffic would be arriving directly to the superpeers, making the search performance above possible. Of course, some small performance degradation would be caused by the queries sent between the query-multiplicating leaf nodes, since these queries would not reach the superpeers and would not cause any local searches on them. This effect, however, can be minimized by opening a sufficiently large number of connections on the query multiplicator nodes.

   The resulting network topology would be quite complicated, so reaching this performance would be even more difficult than reaching the full theoretical search performance of the k-redundant superpeer network. Here the superpeer role assignment and the network topology also have to be precisely controlled based on the full knowledge of the hosts' bandwidths - pretty much as it was the case in Section 4.5 and Section 4.6. With the central control mechanisms this might be possible, but it is difficult to see how, for example, the query multiplicator chains described in the previous paragraph can be optimally formed in a decentralized fashion. So in the decentralized network this search performance is probably unachievable.

   In many cases, though, the centralized control mechanisms might be undesirable in peer-to-peer networks, as it was already mentioned at the end of Section 4.2. So it would be very interesting to somehow utilize the high incoming bandwidths of the asymmetric-link hosts, using only the optimization techniques that would not have any centralized topology-forming mechanisms and would not require the individual hosts to have any information about the network as a whole. Even though this approach will necessarily degrade the search performance in comparison with the optimal solution (68), the implementation simplicity and the absence of the vulnerable centralized functionality hubs and bottlenecks might make it attractive despite the performance loss. This approach is going to be the subject of the next section.

5.6. Decentralized network control.

   In this section we'll try to analyze the search performance of the mutual index caching strategy in the case when the network hosts might have asymmetric links, but - unlike the analysis in the Section 5.5 above - the hosts are expected to make their topology- and traffic-forming decisions without having any information about the network as a whole. The hosts should base their actions (open connections, cache other hosts' indices, etc) only on the information that is available to them directly, meaning that the hosts should be able to obtain this information just from analyzing the locally observed data, without having to rely on the other hosts or on some central control authority to provide that information.

   This approach totally distributes the network control over all network nodes, removing the possible control performance bottlenecks and attack points. It also simplifies the issue of trust - since every host uses only its own direct observations to make decisions, it is difficult to provide the false information to multiple hosts, tricking them into making the decisions that would negatively affect the overall network search performance.

   Of course, this approach will necessarily be less effective from the search performance standpoint than the optimal solution from the Section 5.5, and this makes it very difficult to prove that some particular decentralized decision-making mechanism is the optimal one. It is always possible that some other decentralized mechanism will result in a better search performance.

   But finding the optimal solution is not our goal here. Our goal is to analyze a specific decision-making mechanism, to find its search performance, and to show that even though this performance is lower than the one on Figure 17, it is still acceptable - in fact, that for our model network it is better than the performance of the 100-redundant superpeer clusters. If and when the better-performing mechanism will be found, the additional performance increase will be a 'bonus' - the results presented here form the lower boundary for the search performance achievable in the fully decentralized network. And the existence of a better-performing decentralized decision-making mechanism is highly likely, since here we wanted to primarily concentrate on the algorithm simplicity rather than on its performance. So when this better-performing decision-making mechanism will be found, the real question will be whether its better performance will overweight its possible additional complexity or not.

   The decentralized decision-making mechanism suggested for the analysis here is based on the observation made at the end of Section 3.3 about the connection setup algorithm that would allow us to fully utilize the bandwidth of all peer-to-peer network hosts when their individual bandwidths are different. The suggestion made there for the Gnutella network was to open the number of connections proportional to the host bandwidth. Then the traffic flowing through every connection is roughly similar, and every host can fully utilize its bandwidth, maximizing the total search performance of the network.

   That was said in the context of the symmetric-link network, where it was irrelevant whether the incoming or the outgoing bandwidth would be used as a guideline to determine the number of the opened connections. It is clear, though, that if the outgoing (the one assumed to be smaller) host bandwidth is used for that purpose, the asymmetric-link hosts will have the incoming data streams approximately equal to their outgoing bandwidth. This did not worry us until we started to analyze the search performance of the mutual index caching strategy - before that, the search query message value was the same regardless of the host that received it, since every host had the same average amount of the locally searchable content.

   With the mutual index caching the situation is different. Better connected hosts - which normally include the asymmetric-link cable modem and ADSL hosts - tend to have bigger optimal index cache size (see Table 2). So the search value of the query sent to these hosts is higher than the search value of the same query sent to a modem host with smaller index cache size, and the network search performance can be increased if the query traffic is redistributed towards the asymmetric-link hosts. If the Gnutella host opens one connection per every bytes per second of the host incoming bandwidth , the total number of the opened connection endpoints on all hosts will be .

   When the overall network traffic is limited by the fully utilized outgoing bandwidth of all network hosts, the amount of data passing through an average connection in one direction will be bytes per second, and since the i-th network host has connections, with the random connection establishment it will receive, on the average, bytes per second of query messages. This is lower than the incoming bandwidth of the host, so some data that would otherwise arrive to the symmetric-link modem hosts will be diverted to the better connected asymmetric-link hosts, which is exactly what we wanted to achieve.

   In the mutual index caching case some bandwidth is also consumed by the index transfers. The query traffic bandwidth distribution consequences caused by this were already mentioned in Section 5.4 above - basically, from the query traffic standpoint, the index transfers decrease the effective host bandwidth according to the equation (48). So if the incoming index traffic on the i-th host is , the number of connections that should be opened by this host becomes

(69)

and the total number of the opened connection endpoints becomes

. (70)

   At the same time the total search traffic is decreased by the index transfers, too. Repeating the reasoning above, we can find the incoming search traffic for the i-th host as:

, (71)

where

. (72)

   So the search value of this host is going to be:

   

bytes per second. (73)

   Here we are going to evaluate the search performance of this network in an assumption that all individual hosts try to choose the number of the cached indices that would result in the equal incoming index transfer and query message traffics. This is roughly equivalent to the caching volume that the equations (54), (55) suggested for the symmetric-link networks. These equations resulted in approximately 1:1 ratio of the index transfer traffic to query traffic unless the number of the locally cached indices reached the RAM-based limit . This condition can be written as:

, (74)

where

, (75)

is the coefficient (see (47)) that ties the amount of locally cached index data to the mutual caching index transfer bandwidth needed to load and to maintain this cache.

   From (74) and (71) can be found as:

, (76)

or

. (77)

   After the value of is known, it can be used to determine the average host search value:

, (78)

and the search improvement coefficient:

. (79)

   Note that the procedure of finding the best value is different for the individual network host and for us when we want to evaluate the performance of this network.

   For the individual host this is the iterative procedure that uses an equation (76) and does not require any knowledge about the network-wide statistics - it does not have to know the value of , which contains the network-wide averages for bandwidths and traffic streams (72). Assuming that the host knows or can estimate its effective incoming bandwidth cap , it might, for example, start with setting the incoming mutual caching index transfer traffic varable to zero, find the recommended number of query delivery conections from (69), open them, and start receiving the query traffic, at the same time requesting some small number of indices from other hosts. After observing the incoming query and index traffic and for a while, the host can choose some reasonable starting value for (see Appendix C), use (76) to determine the recommended amount of cached index data, and start requesting indices from the other hosts over the index transfer network (Figure 14). The average number of files stored on the hosts might vary significantly, so the index requests should be formulated in terms of the index records instead of the full host indices - this allows the requestor to control the index transfer traffic granularity and to achieve the better match between the volume of the transferred index data and the amount of RAM available for its storage.

   After some interval the host should take the latest averaged value of , return to (69) to reevaluate the recommended number of query delivery connections, use the latest index traffic averages to correct the estimate, and recalculate according to (76). This procedure should be repeated periodically in order to iteratively approach the optimal number of connections and index cache size, and to track the possible changes in the network-wide parameters' values. Since the index caching processes have the characteristic times close to the average session length, the averaging used in these calculations should be performed over some lengthy period, possibly with the next application session starting with the values of parameters saved by the previous one (more on this in Appendix C).

   When we are doing the network performance evaluation in this study, the procedure is different. In that case we know the average incoming and outgoing host bandwidths, so it is more convenient to use the equations (72) and (77) to calculate the recommended number of the cached host indices for every host. To do so, though, we have to calculate the average index transfer traffic first, and its value, in turn, depends on .

   So the simplest way to calculate is to use the iterative process similar to the one suggested for the individual host above. Starting with a random (for example, zero) value for all hosts, we calculate , find from (72), and then find the new value of for all hosts from (77).

   Then this process is repeated with this new values until it converges (the iteration step does not change anymore). To improve the iterative process stability it is convenient to use:

. (80)

instead of (77). Here is a small number (0.1 works fine for our model network) used to lower the iterative process step size, thus improving its convergence - the direct application of (77) can lead to the persistent numerical oscillations. The choice of the value is, of course, not something that the network hosts are expected to do - it is used only here to stabilize the numerical solution of the network search performance equations.

   Taking the asymmetric-link network defined by (1), (49), Table 1, (65), and setting to 500, we arrive to the following results after this iterative process converges:


(bytes/sec)

(bytes/sec)

(bytes/sec)
Incoming search
traffic
(bytes/sec)
Host search
value
(bytes/sec)
5,000 5,000 500 85 0.393 1,411 1,411 121,000
12,000 50,000 500 500 8,333 16,380 8,200,000
20,000 50,000 500 500 8,333 16,380 8,200,000

Table 3. Effective search values for network with asymmetric links.

   Comparing this with Table 2 that lists the similar data for the symmetric-link network, we see that the incoming search traffic and host search value here are lower for the modem hosts (first row) than in Table 2, but significantly higher for the asymmetric-link broadband hosts. For the modem hosts the sum of the incoming search traffic and incoming index transfer traffic is just 2,822 bytes per second, which fills only 56 percent of their incoming bandwidth. This was exactly our intention - to divert some bandwidth towards the broadband hosts, and it was done without any central control whatsoever. In fact, the modem hosts did not even have any idea about the existence of the broadband hosts, the broadband hosts knew nothing about the modem ones, and none of them had any network-wide traffic statistics.

   The average network host search value for this network is:

bytes per second, (81)

and the search improvement coefficient is:

. (81)

   This corresponds to the search query reach of:

hosts, files. (82)

   This search coverage is noticeably better than the search coverage of the 100-redundant superpeer network (38) and (39), even when the superpeer roles are assigned in an optimal fashion - though, of course, it is significantly lower than the 'ideal' mutual index caching performance (44) that could be achieved if the network hosts would have an infinite session lifetime. This performance is also 42 percent lower than the performance of the centrally controlled asymmetric-link mutual index caching network analyzed in Section 5.5 (Figure 17). However, considering the simplicity and the full decentralization of the host behaviour algorithms analyzed in this section, this performance reduction might be a fair trade.

   The full plot of the performance improvement coefficient for this mutual caching algorithm can be found on Figure 18 together with the best-performance plot for the 100-redundant superpeer clustering algorithm applied to the network with the same host bandwidth distribution (taken from Figure 11):

Figure 18. Decentralized mutually cached indices vs 100-redundant superpeer clustering.

5.7. Suboptimal decentralized control.

   In this section we'll simplify the decentralized network control algorithm analyzed in the previous section and look at the performance of this simplified algorithm. The goal here is to deliberately harm the algorithm in a way that is likely to happen in a real implementation and to see how resistant the algorithm is to such changes.

   This will also allow us to see what happens to the search performance when the network bandwidth is not fully utilized. The full host bandwidth utilization was the assumption used throughout this study, so the effects of the incomplete bandwidth utilization that are analyzed here are relevant not only for the mutual index caching. The similar search performance degradation might be expected for all other scenarios (flat networks, superpeers, etc) when not enough attention is paid to the number of the opened query delivery connections and the network bandwidth becomes underutilized as a result.

   The single change we are going to introduce into the algorithm is that the number of the query delivery connections opened by the host is going to be proportional to the full incoming host bandwidth instead of being proportional to the part of this bandwidth left after the average index transfer traffic is subtracted from it (69). Basically, we are going to replace the equation (69) with:

. (83)

   This change makes it easier to estimate the recommended number of the opened query delivery connections, but results in the incomplete utilzation of the outgoing network bandwidth. The goal of the equation (69) was to completely utilize the outgoing bandwidth on all hosts, so any deviation from it will cause the suboptimal bandwidth utilization.

   When (83) is used instead of (69), the equation (72) is transformed into:

. (84)

   Using the equation (84) instead of (72) in the network search performance estimate procedure outlined in Section 5.5, instead of the Table 3 and Figure 18 we arrive to the Table 4 and Figure 19 below:


(bytes/sec)

(bytes/sec)

(bytes/sec)
Incoming search
traffic
(bytes/sec)
Host search
value
(bytes/sec)
5,000 5,000 500 74 0.325 1,227 1,227 91,500
12,000 50,000 500 500 8,333 13,550 6,800,000
20,000 50,000 500 500 8,333 13,550 6,800,000

Table 4. Effective search values for network with asymmetric links and suboptimal bandwidth utilization.

Figure 19. Suboptimal decentralized mutually cached indices vs 100-redundant superpeer clustering.

   If we look at the average incoming traffic in the Table 4 we can see that its value is:

bytes per second, (85)

which is 12 percent lower than the average outgoing host bandwidth of 7,200 bytes per second (16). This network bandwidth underutilization, as expected, causes the search performance degradation. The actual performance degradation value is even more than 12 percent, since with the value of 500 the index sizes on broadband hosts are in both cases defined by this maximum value, and the incoming index transfer traffic for the broadband hosts stays the same. So the decrease in the part of the bandwidth that remains for the search queries is more pronounced than the overall bandwidth decrease, which results in the 17-percent search performance decrease when the outgoing network bandwidth is not fully utilized.

   The fact that on a large part of the analyzed range this search performance almost exactly matches the search performance of a 100-redundant superpeer network is, of course, just a coincidence. This plot of the redundant superpeer network performance assumes the ideal superpeer selection and the full bandwidth utilization. If the superpeers won't pay attention to the number of the opened connections and cause the incomplete network bandwidth utilization, their performance will drop too. Besides, the performance of the mutual index caching network will increase if the average session time or the incoming bandwidth of the broadband hosts is increased, whereas the performance of the 100-redundant superpeer network will stay the same. It is already 'ideal' and cannot be increased without raising the network redundancy level. So as the percentage of the constantly connected broadband network hosts will keep growing, the performance of the mutual index caching will grow with it, while the performance of the 100-redundant superpeer network will grow much slower, being affected only by the average superpeer bandwidth.

   These results show the importance of the full network bandwidth utilization, which can be achieved in Gnutella case only if the hosts open the right number of the query delivery connections (69). This, of course, raises the question of agreeing on the proper value of , which defines how many connections should be opened by the host with a given bandwidth. The applications for the Gnutella network are developed by multiple vendors, and many vendors have different ideas about the proper connection establishment policies. As a result, the overall Gnutella network bandwidth is likely to be underutilized and the actual Gnutella network search performance will probably be suboptimal until all vendors agree on the same value of . At the very least, this agreement should be reached between the vendors that represent the majority of the network hosts.

   This value should be low enough to allow the modem hosts to create multiple connections, and high enough to allow the broadband hosts to open the number of connections required by (69) without exceeding the operating system limits. Incidentally, the current de facto value of about 1,000 bytes per second might be very close to the optimum. On one hand, it allows the modem hosts to have several (3-5) connections. On the other hand, even if the broadband hosts cannot open more than 100 TCP connections, and 30 of these are used by the index transfer network on Figure 14, the remaining 70 connections are still enough for the hosts with almost one-megabit effective bandwidth cap, especially after the index traffic is subtracted from the overall host bandwidth in (69).

6. Conclusion.

   Search performance is one of the most important characteristics of peer-to-peer networks. This study analyzes the performance of the peer-to-peer networks with a non-guaranteed search that represent the majority of the most popular peer-to-peer systems today. The study shows that for the superpeer-based architectures the best search performance can be achieved by using the massively redundant superpeer clusters, and that today's openly discussed peer-to-peer architectures just scratch the surface of the potential search performance of the peer-to-peer systems.

   After that, the study analyzes the search performance of the mutual index caching architecture and shows that this architecture can achieve the best theoretically possible search performance for the non-guaranteed search networks, since it can fully utilize the available resources of all network hosts, whereas the superpeer architectures utilize only the resources available on the most powerful hosts. For the practically interesting case of the ad hoc network with a finite host session time the study suggests the optimal volume of the mutually cached data on the host. It also proposes the simple fully decentralized network bandwidth management algorithm for the hosts in this network, showing that with the network parameters typical for today's file-sharing systems this algorithm outperforms the massively redundant superpeer clusters, and that this performance gap will continue to widen as the average host session time will grow with the continued broadband deployment.

   The study suggests that unlike the superpeer architectures, the mutual index caching approach effectively distributes the superpeer functionality over all network hosts, which makes it not only better-performing, but also easier to develop, deploy, manage and maintain. The absence of the explicitly assigned superpeer roles also eliminates the obvious performance bottlenecks and attack targets in such a fully decentralized peer-to-peer network.

6.1. Acknowledgements.

   Author would like to thank Bill Kallman from Timberline Venture Partners, without whose constant prodding this research would not be even started, discussions with whom contributed heavily to this study, and whose comments to the first draft of this document were very helpful. Author also wants to thank all the good people from the peer-to-peer community whom he met online and in real life in the past several years. Without countless discussions with people too numerous to mention here this research would not have been possible.

7. References.

[1] Gnutella Developer Forum.
http://groups.yahoo.com/group/the_gdf.
[2] Kazaa website.
http://www.kazaa.com.
[3] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker. A Scalable Content-Addressable Network.
http://www.acm.org/sigs/sigcomm/sigcomm2001/p13-ratnasamy.pdf.
[4] A. Rowstron, P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.
http://research.microsoft.com/~antr/PAST/pastry.pdf.
[5] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
http://www.pdos.lcs.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf.
[6] B. Zhao, J. Kubiatowicz, A. Joseph. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing.
http://www.cs.berkeley.edu/~ravenben/publications/CSD-01-1141.pdf.
[7] C. Huitema. Distributed Peer-to-peer Name Resolution.
http://www.huitema.net/talks/p2p-names.ppt.
[8] Napster website.
http://www.napster.com.
[9] B. Yang, H. Garcia-Molina. Designing a Super-Peer Network.
http://www-db.stanford.edu/~byang/pubs/superpeer.pdf.
[10] B. Yang, H. Garcia-Molina. Improving Search in Peer-to-Peer Networks.
http://www-db.stanford.edu/~byang/pubs/p2psearch.pdf.
[11] Q. Lv, P. Cao, E. Cohen, K. Li, S. Shenker. Search and Replication in Unstructured Peer-to-Peer Networks.
http://http://www.cs.princeton.edu/~qlv/download/searchp2p_full.pdf.
[12] S. Daswani, A. Fisk. Gnutella UDP Extension for Scalable Searches (GUESS) v0.1.
http://groups.yahoo.com/group/the_gdf/files/Proposals/GUESS/guess_01.html.
[13] giFT. What is the giFT project?
http://gift.sourceforge.net/docs/?document=whatis.html.
[14] T. Miconi. I Jumped in the GnutellaNet and what did I see? Lessons from a simple Gnutella network simulation.
http://thomas.miconi.free.fr/TSN3.pdf.
[15] S. Saroiu, P. K. Gummadi, S. D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems.
http://www.cs.washington.edu/homes/tzoompy/publications/mmcn/2002/mmcn.html.
[16] GnuMap Project page.
http://home.attbi.com/~gregory.bray/.
[17] A. Crespo, H. Garcia-Molina. Routing Indices For Peer-to-Peer Systems.
http://www-db.stanford.edu/~crespo/publications/crespoa_ri.pdf.
[18] A. Tanenbaum and A. Woodhull.
Operating SystemsDesign and Implementation. Prentice-Hall, Inc., 1999.
[19] JXTA Platform Scalability Proposed Design.
http://platform.jxta.org/java/workinprogress/ScalabilityOverview.pdf.
[20] B. Yang, H. Garcia-Molina. Comparing Hybrid Peer-to-Peer Systems.
http://www-db.stanford.edu/~byang/pubs/hybridp2p_med.pdf.
[21] K. Truelove. Gnutella: To the Bandwidth Barrier and Beyond.
http://www.jnutella.org/docs/dss/dss001106.shtml.
[22] K. Truelove. Gnutella: Alive, Well, and Changing Fast.
http://www.openp2p.com/pub/a/p2p/2001/01/25/truelove0101.html.
[23] J. Ritter. Why Gnutella Can't Scale. No, Really.
http://www.darkridge.com/~jpr5/doc/gnutella.html.
[24] S. Osokine. The Flow Control Algorithm for the Distributed 'Broadcast-Route' Networks with Reliable Transport Links.
http://www.grouter.net/gnutella/flowcntl.htm.
[25] S. Osokine. The Implementation of the Flow Control Algorithm for the Distributed 'Broadcast-Route' Networks in the Finite Message Size Case.
http://www.grouter.net/gnutella/finitmsg.htm.
[26] B. Yang, H. Garcia-Molina. Comparing Hybrid Peer-to-Peer Systems. Technical report, Stanford University, February 2001.
http://dbpubs.stanford.edu/pub/2000-35.
[27] BearShare website.
http://www.bearshare.com.
[28] LimeWire website.
http://www.limewire.com.
[29] C. Rohrs. SACHRIFC: Simple Flow Control for Gnutella.
http://www.limewire.com/developer/sachrifc.html.

Appendix A. Superpeer query failure probability.

   After analyzing a sizable array of the experimental data from the live file-sharing peer-to-peer systems, the study [20] suggested the following model for the probability of query returning R or more results from the array of n queried items (files):

(A-1)

where T(n, m) is the probability of returning exactly m results:

(A-2)

and g(i) and f(i) are the probability density functions for the i-th query popularity and 'selection power' (likelihood of a random file matching that query), which were approximated as:

(A-3)

and

(A-4)

   The observations of the OpenNap host with approximately 400 simultaneous users and n=69,000 files, and of the Napster host with approximately n=1,500,000-2,000,000 files (which suggests about 10,000 users) showed that the Q(n,R) calculated from the model fits the observed distributions of queries and results with a remarkable precision when , (for the OpenNap host) and , (for the Napster host).

   To determine the probability of the host (superpeer) failing to return any results for a query, we have to set m=0 in Eq. (A-2), which gives us:

(A-5)

   This calculation is easy to perform, and for the distribution parameters above, it gives about 56-percent probability of the query failure in case of the 400-user OpenNap host, and about 8-percent - in case of the 10,000-user Napster host.

   Of course, the study [20] never suggested that these distribution approximations are accurate at such low values of R and m, so the precision of these numbers might be low. But on the other hand, these are just the examples needed to illustrate the significant probability of failing to find any answers for the query. Real file-sharing system will probably try to return the sizable array of results (maybe 50 to 100) for the search. This will make the Q(n,R) value much lower than suggested by these numbers - and much more precise, since the approximations in [20] involved matching the experimental curves in this range of R (up to 100). Such calculations (provided that the query and content distribution parameters are known) will give the numbers necessary to estimate the query reach increase when some selective query routing approach is applied.

   Some sample probability charts and calculations for different distributions and numbers of files can be found in [26]. However, one has to remember that predicting the probability distributions on the design phase is a very hard task. At the same time, these distributions can have a very large influence on the search success probability. So the exact measure of peer-to-peer system performance increase due to the selective querying optimization might be very hard to determine in advance.

   To give just one example, two very similar file-sharing systems above (OpenNap and Napster) have very different query and file distributions:

, (A-6)

for the OpenNap, and

, (A-7)

for Napster.

   If we consider a file-sharing network in which a search request can effectively reach one million nodes, and every node has 200 files (n=200,000,000 total), the probability of not finding any matches for a query in this array will be about 25 percent in case of the OpenNap distribution (A-6), and just 2-3 percent in case of the Napster distribution (A-7). The probabilities of the query returning at least a predefined number of results R will also be dramatically different - for R=100, this probability Q(n,R) it is plotted for these two sample distributions on the Figure A-1 below:

Figure A-1. Q(n,R) distribution for R=100.

   So if we try to optimize the search in this system using something like 'random walker' [11], the effect from such an optimization might be much more pronounced for the Napster query and content distribution (A-7) - in case of the OpenNap distribution (A-6) more than forty percent of all queries won't be able to return the full set of one hundred results. This means that their 'walkers' won't be stopped when the desired number of results is achieved, trying to reach more and more hosts and having to rely, as usual, on the normal flow control mechanisms (more on that in Appendix B) to be stopped. And since we cannot predict whether the real distribution on the live system will be closer to OpenNap or to Napster, the precise improvement numbers are really hard to predict in advance anyway.

   Even if we gather some statistics on the live sytem after its deployment, the query and file distributions might change with time, significantly changing the predicted system performance. Currently there seems to be no way to predict the character of these distributions, and the researchers have to resort to the fuzzy postfactum explanantions of the observed behaviour, similar to the one in [20]:

"...since Napster is widely known, it tends to have more casual users who access popular songs. OpenNap, on the other hand, is harder to find and use, so its users tend to be more involved in the music-sharing concept and community, more technically savvy, and may not always have tastes that conform to the mainstream. As a result, query frequencies and selectivities for OpenNap are more evenly distributed, accounting for the larger and larger r. In contrast, the popularity skew of songs in Napster is probably high and heavily influenced by the latest trends. Hence, and r are relatively small."

   Such an uncertainty in the optimization outcome makes it difficult to rely on the selective query routing approaches to achieve some predefined effect - this effect might be really achieved and even exceeded, but the contrary might be true, too.

   So from the practical design standpoint it might make sense to maximize the search system reach in an assumption of the reach maximization for all queries (Section 2.3) and consider a search performance improvement caused by the selective query routing to be a bonus, without relying on it on the system design phase.

Appendix B. 'Napster meltdown' and why Gnutella failed to die.

   It is now clear that the grim predictions of Gnutella network overload and collapse [23] failed to materialize. In the past two years the Gnutella network grew by several orders of magnitude and probably keeps growing. (The early crawlers have long lost the ability to crawl the whole network, so right now its growth can be assessed only by indirect means. Anecdotal evidence puts the number of simultaneously connected hosts somewhere into the low millions, but this number is hard to verify.)

   So why didn't Gnutella die?

   These predictions were based on a 'Napster meltdown' - a several-month long early Gnutella network crash caused by the influx of new users looking for another file-sharing system after the Napster shutdown verdict in July of 2000. The key reason for that meltdown was that this early Gnutella network had no built-in flow control in the message transfer mechanisms, and the only two mechanisms to stop the query propagation were the request TTL (number of 'Gnutella hops' to live - not to be confused with IP TTL) and the loop detection (dropping the request when it reached the host that had already received it).

   Rumor has it that the Gnutella protocol was designed with just 250 hosts in mind. For quite a while (from March to July, 2000) every request was able to reach every host. The requests were dropped when there were no more hosts to query, so the loop detection was the primary mechanism for the traffic control. This continued until the network size exceeded 5,000 hosts, which happened in a matter of hours due to the huge crowds of the former Napster users arriving to the network.

   At this point the network size was still too low for TTL limit to have any noticeable effect, but high enough for the traffic sent to every host to exceed the average client's modem incoming bandwidth. So after filling all the TCP pipe buffers, the sending TCP calls started to fail on the sender's end. It is difficult to reconstruct what exactly happened next, since many servent versions from different vendors were involved, but generally speaking, these early Gnutella servents could do several things when the send() calls failed:

   Regardless of the choice made by any particular servent, the bottom line was the same: if before the July, 2000 meltdown the reported network size was 4,000-5,000 hosts, after the meltdown it was 100, 200, 400 - maybe 1,000 at a lucky moment.

   The low observed number of hosts (which was actually a number of hosts whose replies to a special 'who is on the net?' query could reach the requestor) prompted a hypothesis suggested in [21] and [22] that the Gnutella network became fragmented into multiple 'islands' when some nodes were shut down by the overload, leaving parts of the network disconnected.

   Such an event, however, seems highly unlikely. Even if we imagine that for some reason two parts of the network were somehow separated, the random establishment of the new connections to replace the forcefully closed ones would tie these two parts of the network together at once. The probability that all new connections would somehow be established to the same part of the network, without crossing the 'breakdown boundary' seems to be so low that for all practical purposes it can be assumed to be zero. The connections 'flapping', augmented by the routing tables' expiration seems to be a more realistic explanation for the observed behaviour.

   In any case, whatever the explanation, the situation was obviously unbearable. Search results were scarce, and took minutes to arrive. It was pretty clear that unless some adaptive flow control mechanism would be found that would allow every host to carry its share of network load without overloading its physical link, the network would stay unusable. But it took the Gnutella community several months to figure out how exactly to achieve that.

   The first flow-controlled servents started to appear on the network around December, 2000, and by the end of January, 2001 most of the big vendors had the flow control implemented. It was called "rudimentary flow control" and mostly involved dropping queries when the link was getting overloaded.

   That simple measure helped to bring the Gnutella network back from the July-December 2000 'Napster meltdown' - and the rest is history.

   This whole subject received almost no attention outside the tight circle of Gnutella developers. Surprisingly few people know that there is a flow control in Gnutella, and that it is the flow control, not the TTL that stops the query propagation and avoids the physical link overload. Even some Gnutella developers were not aware about this for quite a while - vendors with a small network share could get away with forgetting about it, since the network-wide traffic was shaped by the code written by the big vendors (which in 2001 primarily meant BearShare [27] and LimeWire [28]). So it is not surprising that the network death predictions failed to materialize - they predicted death to another, earlier network, not to the one that had been successfully growing since December of 2000.

   Currently there are several more sophisticated flow control proposals ([24], [25], [29]), but for all practical purposes all flow control algorithms are doing the same thing: they are limiting the query transmission when the host links are getting saturated, doing this adaptively and in real time according to the host bandwidth and current load.

Appendix C. Index transfer traffic averaging.

   When the peer-to-peer network host tries to optimize its search performance using the decentralized mutual caching algortihm described in Section 5.6, it has to estimate the value of . This parameter determines the relationship between the number of other hosts' indices cached on a host and the average incoming bandwidth needed to load and to maintain these indices:

(C-1)

   In practice, though, the number of content items stored on every host might vary. Some hosts might store nothing at all, and some hosts might have huge content libraries. Also, when the indices are requested, it might be desirable to do so with lower granularity than the one provided by 'give me another full host index' request. So it is more practical to establish the relationship between the index transfer traffic and the number of other hosts' individual index item (file) records stored on the i-th host:

(C-2)

   But in any case, every individual host i should try to achieve the 1:1 relationship between its incoming index transfer traffic and incoming search query traffic (74):

(C-3)

   Note that this is not something that has to be done to improve the individual host usability from the user's standpoint. The host usability is not affected whether the condition (C-3) is satisfied or not. The goal here is to improve the individual host long-term search value for the network - the host is expected to satisfy this equation in the long run, not at any given moment.

   To explain this, let's look at the index transfer bandwidth of the host d from Figure 14 when it joins the peer-to-peer network for the first time. Its bandwidth vs time plot might look like the one on Figure C-1 below:

Figure C-1. Host index transfer traffic.

   When the session starts, the host d establishes the index transfer connections to the hosts k and j (Figure 14) and requests some index records from these hosts - which, if the number of the requested records is high enough, might require hosts k and j to transmit not only their own index records, but also the records from the other hosts (l, m, n, o, p...) that are curently cached on k and j. This results in an initial wide traffic burst on Figure C-1. After that, the index traffic goes to zero and is nonexistent until some host whose records are stored on d decides to leave the network.

   If, for example, host p leaves the network, the host k notices it and notifies host d that all index records belonging to p are now invalid. This, of course, requires all hosts to mark the cached index records with an ID of the host that 'owns' these records. In the simplest scenario it might be a strongly random long (128-160 bits) unique host ID. To address the privacy concerns, a different ID might be created by the host for every session. This strong uniqueness, however, is not exactly necessary - the ID should be just unique enough to allow the host d to determine which records to purge from its cache. Host k, for example, might mark all the records passed to d with a simple connection number (1,2,3...), and when the connection (say, connection #2) to host p is closed, it might tell d to invalidate all records received from k that were marked with #2. (Of course, the loss of the k-d connection should be treated by d as a signal to invalidate all the indices received through that connection - in our example these are k, n, o, and p.)

   In any case, when this happens, host d has to keep the number of cached index records at its original level , so it requests more index records from its neighbours, and the transfer of these records creates the next traffic spike on Figure C-1. Then some other host leaves the network, which creates another spike, and this process is repeated until the end of the session on host d.

   During the index traffic spike the incoming search bandwidth of the host is obviously decreased, and can be lower than the index transfer one. This is expected - again, the goal is to satisfy the equation (C-3) in the long run. So these traffic bursts should be averaged over time, and only the sufficiently large averaging interval can allow us to select the right value for the number of index records transferred to a host. From Figure C-1 it is clear that the averaging interval should be at least as long as the session length (in order to include the initial wide traffic burst and multiple spikes). Ideally it should include several sessions - when the next session starts, the averaging process should continue from the data saved by the previous session.

   This leaves us, however, with a question of what should be done by the host on the first launch? At this point there's no previous information available, the total host on-line time is much less than the average session length, so the reliable averaging is not possible, but still has to be evaluated in order to come up with a recommended number of the cached index records.

   To answer this question, let's have a look at the index transfer traffic averaging plot presented for this scenario on Figure C-2:

Figure C-2. First session index traffic averaging.

   This plot presents the index transfer traffic averaged over time that passed since the session was started. At first, the wide burst of the initial index transfer traffic causes the average value to quickly grow, but after the initial index transfer is finished, the average traffic value starts to drop, reaching its real value by the end of the session.

   This is not so bad from the estimation standpoint - if we just use this 'instant average' value somewhere near the session start, it will give us some very high estimate for and lead us to choose the value of that will be too low. But as we'll be spending more time averaging the index transfer traffic, the estimate for will keep growing. This will cause us to request more index records, and this process will continue in a stable fashion - we'll never request too many records that will overload the host incoming bandwidth and will have to be dropped.

   The reason for this stability is that the averaged index traffic on Figure C-2 asymptotically approaches the final averaged value for the index transfer bandwidth from above, always giving us the low estimate for . So when the host has no information from the previous sessions, it can start with requesting some low number of index records (maybe a few hundred) and start the index traffic averaging as shown on Figure C-2, gradually requesting more and more records as the averaged index traffic keeps dropping, and the estimate keeps growing. This approach, of course, will result in a lower than optimal host search value during its first session, but if we assume that the peer-to-peer application session will be launched on the host machine many times, this one-time effect should not significantly lower the overall search performance of the peer-to-peer network.

*** End of the Document ***

Back to the Gnutella development page.