Workloads
From New Wiki
This page contains the results of collecting workloads.
Contents |
WORKLOADS
Wikipedia
We have 284GB of compressed HTTP requests, which represent 20 bilion requests, corresponding to 10% of the workload of Wikipedia over a 107 days period. This dataset is derived from the "Wikipedia Workload Analysis for decentralized hosting" paper.
Epinions.com
Waiting for response
- Nokia (Yekesa): willing to give us data distributions and read/write workloads but not actual data
- Yahoo (Brian Cooper)
Others
These are uninteresting for whatever reason (e.g. read-only).
- Ensembl Genetic DB: we have a 4.7GB raw-text file containing (read-only) queries collected from an on-line public installation of the Ensembl Genetic DB.
- Adam Seering might eventually be able to get us Postgresql query logs for MIT ESP, but not from when it's actively in use (during Splash in March and November)
- We can get complete access to FeedMe by joining the project, but the workload is very small
- CarTel data can be obtained from Eugene but it's read-only
Dead-ends
- Microsoft: haven't actually pressed Phil Bernstein but this seems unlikely.
- Foursquare: only the firehose
- Twitter: only sampled stream; have a fuller stream from Jeff Terrace (via Yahoo dev acct); OHR paper authors refusing to release now
- OkCupid
- ITA
- Wordpress.com
- Slashdot
