Page MenuHomePhabricator

All Graphs broken on Wikimedia wikis (due to security issue T336556)
Open, HighPublicBUG REPORT

Assigned To
None
Authored By
Iniquity
Apr 18 2023, 12:29 PM
Referenced Files
F41711233: Wishlist_feasibility_flow_chart.jpg
Jan 23 2024, 7:25 PM
F36955928: Screenshot 2023-04-19 at 00.58.31.png
Apr 18 2023, 11:59 PM
F36955428: image.png
Apr 18 2023, 12:29 PM
Tokens
"Heartbreak" token, awarded by aliu."Heartbreak" token, awarded by Ita140188."Heartbreak" token, awarded by Misfortunesdaughter."Manufacturing Defect?" token, awarded by Sj."Manufacturing Defect?" token, awarded by Dalba."Hungry Hippo" token, awarded by Don-vip.

Description

On April 19, 2023 it was identified that the Graph extension, which uses the older Vega 1 & Vega 2 libraries, had a number of security vulnerabilities.

In the interest of the security of our users, the Graph extension was disabled on Wikimedia wiki's. WMF teams are working quickly on a plan to respond to these vulnerabilities.

We recommend that any other third party users of the Graph extension should disable the use of that extension on their wikis.

A configuration change will suppress the exposed raw tags and graph json definition to avoid excess disruption to the end user experience when the extension is disabled. [2] This also provides a tracking category "Category:Pages with disabled graphs" showing the pages that used to contain graphs. Local administrators can localise the name of the category and its description by editing [[MediaWiki:Graph-disabled-category]], [[MediaWiki:Graph-disabled-category-desc]] interface messages on your local wiki.

On Wikimedia projects, graphs created via the extension will remain unavailable. This means that pages that were formerly displaying graphs will now display a small blank area. To help readers understand this situation, communities can now define a brief message that can be displayed to readers in place of each graph until this is resolved. That message can be defined on each wiki at [[MediaWiki:Graph-disabled]] by local administrators.

An example from the English Wikipedia:

Screenshot 2023-04-19 at 00.58.31.png (610×636 px, 69 KB)

ORIGINAL:
Steps to replicate the issue (include links if applicable):

What happens?:

image.png (453×1 px, 59 KB)

Or blank space.

What should have happened instead?:
Graphs should be shown

Other information (browser name/version, screenshots, etc.):
I know graphs was disable because of a security issue, but an open issue is also needed so that people understand what's going on.

April 21 update part 1 - part 2.

April 28 update.

July 15 update.

August 11 update.

Related Objects

StatusSubtypeAssignedTask
DuplicateNone
OpenNone
OpenNone
OpenBUG REPORTNone
ResolvedSecurityJdlrobson
OpenFeatureJdlrobson
ResolvedBawolff
DeclinedNone
DeclinedNone
DeclinedNone
Resolved Jseddon
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedsbassett
ResolvedFeatureJdlrobson
OpenFeatureNone
OpenJdlrobson
OpenNone
Resolved Elitre
DuplicateNone
DeclinedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is how it would work at any moderately sized tech company.

Earlier in my career I was a Principal Consultant at a $1B software company. I didn't find the environment anything like this.

This sort of uncertainty-spiral for a prominent bug is not normal in healthy projects, but is a common enough failure mode elsewhere. It burns people out, a systemic risk for our work. Let's err on the side of certainty :) 10 pages down in a ticket about a single extension is perhaps not the place to discuss delegation, planning, and maintainership, but we could use such a discussion in an appropriate place (perhaps catalyzed by a dev who both staff and other community devs would read receptively.

Proposals specific to this issue:

  1. Assign this ticket to someone. This is the most visible ticket on Phab, linked directly from the "Graphs are unavailable due to technical issues" message on ten thousand pages, which has already conservatively affected 100M pageviews. High seems to undersell its priority.
  1. In addition, make someone (a person, not a role or group) directly responsible for the outcome of the short-term resolution of the critical regression: the broken graphs. Something like DRI may work. Update the third sentence of the ticket (which currently sets expectations rather high) to match reality.
  1. Divide and conquer the set of related issues:

: a) Replace current broken graphs
: b) Be healthily pessimistic: clarify we need non-Vega solutions, no matter what
: c) Invite new ideas and celebrate them: like Nux's piechart tool. Invite maintainers of tools we'd like to integrate.
: d) Develop a reusable approach to [discussing, evaluating] client-side and server-side rendering of static media. Use that language + approach here.
: e) Separate the discussion of interactive content (e.g. T290519 and its wish) to its own epic. This is important, but more feature dev than bug fixing.

Some active graph-creators have noted on-wiki they're working or would like to work on fixes, and could own pieces the current !maintainers don't have time for. But given the lack of clarity and incidental licking of the cookie tray, warm engagement and facilitated delegation may be called for.

As far as I can see, there are only a few options right now ?

@TheDJ a slightly more helpful as-is option:

  • Leave ~as is, help people pre-render SVGs w/ hosted tools and post those to Commons, along with links to the tool + inputs used to generate those SVGs

In T334940#9480339, @TheDJ wrote:
Separate domain to serve iframes from, so that we can have interactive and cacheable content (pretty complex)

Complex? Yes, sure. But I'm also quite sure WMF employees would manage.
Expensive? No, not really. Should be a drop in an ocean of costs WMF already have.

Basically:

  1. New proxy with https.
    1. Buy whatever domain (minor cost really).
    2. Land that domain on some server (some proxy probably).
    3. Setup a wildcard certificate (free-ish).
    4. Add that domain to CSP.
  2. Create a rendering pipeline.
    1. Probably extensions.
    2. One extension renders graphs e.g. as graph.php?title=Page&id=graphid (a simple html page with Vega data from wiki and Vega scripts).
    3. The other replaces graph tags with <iframe src="https://en.cookieless-wmfcloud.org/w/graph.php?title=Page&id=graphid"....
  3. Setup servers to work with new extensions.

Not sure about the 3rd point, but all this shouldn't be too hard. Would require work from various teams, but don't seem like something that wasn't done before. Note that new proxy could just point to old servers. And the graph.php would simply only be allowed to load from a specific domain. So just one new server and a bit of configuration.

Hi everyone, thanks very much to whoever is working on this and has worked on this. The issue has been here for 10 months now, on 18,000+ articles on the English Wikipedia at least. If it's not going to be resolved by end of Feb, perhaps we should delete disabled graphs from all 18,000 pages as there doesn't seem to be an end in sight? I'd need to discuss this on the project (English wiki) perhaps wouldn't I? And at the 'pump'? Does anyone know if it's been done before? Or perhaps there's a credible path to completion with an end month we can be confident about?

I think that is the worse approach: if we just delete the broken graphs, there's no need to solve this and we will lose it forever.

No, people can add them back when it's fixed. The alternative is that we have 18,000+ articles with broken graphs on them, for months. or for years? there's no end in sight

If we don't have 18.000+ articles at enwiki it won't be fixed. That's the problem here. The only need now to fix this is that we have the graphs. If we delete them, the problem is gone forever, and we will lose the opportunity to have something great. It will happen the same that happened with the book creator: if we forget that it worked, there's no need to fix it.

It is perhaps utopic but can't you guys just meet physically in an hackaton and build things up with both security and dataviz WMF+ others skilled team?
An encyclopedia without graphs, can it still be an encyclopedia?

@Ita140188 I'd like to strongly pushback on your comment. It is meaningless to say hundreds of people work at WMF implying therefore solutions should be quick if we don't analyse the magnitude of the work to keep the websites of the wikimedia projects working.

This is a security issue on an upstream project that won't be fixed upstream. It is not a trivial matter to have this fixed in safe and functional way.

I do think we have several technical areas without proper ownership, and graphs was one of them, but without a significant increase in people available and responsible for technical stewardship it is unlikely to be solved. I invite you to ponder if your comment is more or less likely to invite volunteers and professionals alike to join that group.

I never said it should be quick, nor that it is a trivial matter. However, this is a core functionality that has been broken for a year. After a year we have zero solutions and still no end in sight. This ticket has not even been assigned to anyone, no one that leads this or has any responsibility for its success of failure. Since the WMF is a relatively large organization with plenty of funds, and also has at its disposal hundreds of skilled volunteers, it seems to me that this failure is due to either dysfunction or lack of interest in solving this, which are both frustrating for the people that worked on graphs over the years as well as the readers that don't have access to valuable content provided by graphs

! In T334940#9509039, @Ita140188 wrote:
I never said it should be quick, nor that it is a trivial matter. However, this is a core functionality that has been broken for a year. After a year we have zero solutions and still no end in sight. This ticket has not even been assigned to anyone, no one that leads this or has any responsibility for its success of failure. Since the WMF is a relatively large organization with plenty of funds, and also has at its disposal hundreds of skilled volunteers, it seems to me that this failure is due to either dysfunction or lack of interest in solving this, which are both frustrating for the people that worked on graphs over the years as well as the readers that don't have access to valuable content provided by graphs

A comment from a "normal" Wikipedia user. I've been waiting for this fix for I guess a year, regularly checking. There's a WIkipedia page on Covid (statistics) and I wanted to contribute, but can't due to this issue. The bug has thus been causing a lot of valuable data NOT to be added to Wikipedia, and continues to cause this. Once in a while I see Wikipedia asking for money / donations. If this is a complex problem, then it needs money to solve. It's about core functionality (statistics). Wikipedia is an encyclopedia!!! What is an encyclopedia without statistics and graphs?! I would like to suggest to run a fundraiser specifically for this so people can be paid to fix this. I'd happily donate. Wikipedia can't complain that the graphs issue is not being fixed, if they don't even ask for donations to fix it.

@Nux, just to say, there is ongoing work in T222807.

Somehow they managed to implement the sandbox even without a separate domain. Although, if I understand correctly, this allows to make /api.php calls (non-authenticated, ofc).

When I initially suggested iframe sandbox, again, it did not require completely new domain. It did not require caching. It was:

  • en.wikipedia.org: Extension to handle {{#tag:iframe| arbitrary user content with scripts }}
  • en.wikipedia.org: <iframe id="sandboxFrame" src="https://sandbox.en.wikipedia.org/sandbox" sandbox="allow-scripts allow-same-origin"></iframe>
  • en.wikipedia.org: iframe.contentWindow.postMessage("arbitrary user content with scripts", "https://sandbox.en.wikipedia.org")
  • sandbox.en.wikipedia.org: add_header Content-Security-Policy "default-src 'none'; script-src 'self' 'unsafe-inline';"
  • sandbox.en.wikipedia.org: window.addEventListener('message', () => if (event.origin === 'https://en.wikipedia.org') setInnerHTML(document.body, event.data); ...
  • sandbox.en.wikipedia.org: few proxied apis to read GraphQL, mwapi and raw content (tab, json, wikitext, js, css) from Wikimedia satellites.

This solution is OP. There is no need for cookieless domains, as iframe sandbox has no access to cookies anyways. There is no restriction in content: it can be finally something mobile-friendly, for example (Vega was not). Comparing to inline SVG it does not make life more difficult, as SVG again is not secure "as is": one need to either create the same iframe sandbox or heavily sanitize inline SVGs.

Here should be some poster calling for revolution and demands to give the power to manage dynamic content to ordinary users.

@Nux, just to say, there is ongoing work in T222807.

Somehow they managed to implement the sandbox even without a separate domain. Although, if I understand correctly, this allows to make /api.php calls (non-authenticated, ofc).

When I initially suggested iframe sandbox, again, it did not require completely new domain. It did not require caching. It was:

  • en.wikipedia.org: Extension to handle {{#tag:iframe| arbitrary user content with scripts }}
  • en.wikipedia.org: <iframe id="sandboxFrame" src="https://sandbox.en.wikipedia.org/sandbox" sandbox="allow-scripts allow-same-origin"></iframe>
  • en.wikipedia.org: iframe.contentWindow.postMessage("arbitrary user content with scripts", "https://sandbox.en.wikipedia.org")
  • sandbox.en.wikipedia.org: add_header Content-Security-Policy "default-src 'none'; script-src 'self' 'unsafe-inline';"
  • sandbox.en.wikipedia.org: window.addEventListener('message', () => if (event.origin === 'https://en.wikipedia.org') setInnerHTML(document.body, event.data); ...
  • sandbox.en.wikipedia.org: few proxied apis to read GraphQL, mwapi and raw content (tab, json, wikitext, js, css) from Wikimedia satellites.

This solution is OP. There is no need for cookieless domains, as iframe sandbox has no access to cookies anyways. There is no restriction in content: it can be finally something mobile-friendly, for example (Vega was not). Comparing to inline SVG it does not make life more difficult, as SVG again is not secure "as is": one need to either create the same iframe sandbox or heavily sanitize inline SVGs.

Here should be some poster calling for revolution and demands to give the power to manage dynamic content to ordinary users.

Well i agree with your point generally, I don't think what you precisely wrote is secure since subdomains can set cookeis for the parent domain. But it could be made secure relatively easily by just using a separate domain. [For some definition of secure... There are some complex questions here around what exactly the threat model is]

@Nux, just to say, there is ongoing work in T222807.

Somehow they managed to implement the sandbox even without a separate domain. Although, if I understand correctly, this allows to make /api.php calls (non-authenticated, ofc).
...

I don't think you can do it without a separate domain. I mean, it works fine in Firefox, but Chrome is problematic, mostly due to caching problems. This issue is primarily described in T169027 (an old sandboxing issue) and in T352227 (specifically addressing the caching problem). I assume the caching problem has not yet been resolved. But if I'm wrong, even better :)

! In T334940#9509039, @Ita140188 wrote:
I never said it should be quick, nor that it is a trivial matter. However, this is a core functionality that has been broken for a year. After a year we have zero solutions and still no end in sight. This ticket has not even been assigned to anyone, no one that leads this or has any responsibility for its success of failure. Since the WMF is a relatively large organization with plenty of funds, and also has at its disposal hundreds of skilled volunteers, it seems to me that this failure is due to either dysfunction or lack of interest in solving this, which are both frustrating for the people that worked on graphs over the years as well as the readers that don't have access to valuable content provided by graphs

A comment from a "normal" Wikipedia user. I've been waiting for this fix for I guess a year, regularly checking. There's a WIkipedia page on Covid (statistics) and I wanted to contribute, but can't due to this issue. The bug has thus been causing a lot of valuable data NOT to be added to Wikipedia, and continues to cause this. Once in a while I see Wikipedia asking for money / donations. If this is a complex problem, then it needs money to solve. It's about core functionality (statistics). Wikipedia is an encyclopedia!!! What is an encyclopedia without statistics and graphs?! I would like to suggest to run a fundraiser specifically for this so people can be paid to fix this. I'd happily donate. Wikipedia can't complain that the graphs issue is not being fixed, if they don't even ask for donations to fix it.

@NightOwly is completely right! Even before the desactivation of the extension, stats were missing a lot in Wikipedia. Datavizualisation is an essential functionnality. Humans access complexity throught knowledge spatialisation. This should be the main worldwide priority for WMF in 2024.

Should we mark this as stalled, after reading the disappointing message from @MMiller_WMF at Wikimedia-l, or should we still be confident that a there will be a solution soon?

Stalled generally means there is a specific thing preventing progress, and progress will proceed once that thing is fixed. In essence it means waiting on something specific to happen.

If no progress is being made simply due to lack of interest or resources, then "open" is correct status. If its been decided to not do this task, then "declined" would be correct. Stalled would only be used if progress cannot continue until something else happens.

A task being "open" has no connection to how soon a solution will happen

@Charles_Matthews told me about the new project WikiFunctions at the recent London meetup. This is intended to host a variety of standard functions for use across other projects. Displaying data graphically seems like a good fit for this. If the WMF can't hack this then perhaps it can be offered to them.

I note that this current task is not assigned to anyone. I'm not sure if it has always been like that but this lack of ownership of the problem seems significant.

For those not on the mailing list, the message from MMiller on February 6 was:

Hi everyone – My name is Marshall Miller, I am a Senior Director of Product at the Wikimedia Foundation, and I work with many of the teams that are involved with the user experience of our websites and apps, such as the Editing, Web, Growth, and Mobile Apps teams (among others) [1]. I’m part of the leadership group that makes decisions about how the WMF teams approach things like graphs, interactive content, and video. Thank you all for having this in-depth and important discussion.

I know that issues with graphs [2] are what started this discussion, but I agree that it makes sense to think about this in terms of the broader category of “interactive content”, because other kinds of interactive content, such as maps or timelines, would share architecture with what is needed for graphs (video is a different and more complicated content type). I wrote a lot in this email, but here are a couple of the main points up front: to support graphs and other interactive content, we would need to take a step back and make a substantial investment in sustainable architecture to do it – so that it works well, safely, and is built to last. And because that’s a substantial investment, we need to weigh it against other important investments in order to decide whether and when to do it.

I know that it is very frustrating that the Graph extension has not been operational for many months – it means readers haven’t been seeing graphs in articles, and editors haven’t been able to use graphs to do things like monitor backlogs in WikiProjects. Over the months of trying to find a way to turn graphs back on, it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on. Every year we have to make difficult tradeoffs around what areas of our technical infrastructure we can and cannot take on. In the current fiscal year, the Product and Technology department has made experienced editors a priority [3], and many things that volunteers have asked for are either accomplished or in flight:

Improvements to PageTriage (complete) [4]
Watchlist in the iOS app (complete) [5]
Patrolling in the Android app (in progress) [6]
Dark mode (in progress) [7]
Improvements to the Commons Upload Wizard (in progress) [8]
…and other projects.

But I know this conversation isn’t as much about what editors need as what current and future readers need. Between talking about interactive content and talking about video, it sounds like we’re having the larger conversation of what we should be offering today’s and tomorrow’s readers to help them learn from encyclopedic content – whether we need to be offering interactivity, or video, or perhaps enabling other platforms/apps to use our content to make interactive or video materials there. This is a really important conversation, because even working together we probably will not be able to build all of it – we’ll have to make hard choices about where to invest. One place where this broader conversation is happening is called “Future Audiences”, which does experiments on how to reach newer generations who use the internet differently than previous generations – and thinking particularly about video. Future Audiences has regular calls with community members to shape the direction of those experiments, which in turn inform how the broader Foundation prioritizes. I hope many of you will get involved in those conversations – you can sign up here. [9]

Focusing back on graphs, since that’s what kicked this thread off, the several approaches we’ve attempted for quickly re-enabling the extension have ended up having security or performance problems. Therefore, we think that if we were to support graphs and other interactive content, we would need to plan substantial investment in sustainable architecture. This way, our approach would work securely and stably for the longer term. But that would take significant resources, and we’ll need to weigh it against many other important priorities, like tools for functionaries, improvements to the editing experience, automated ways to stop vandals, etc.

To be clear, if we do assign resources to the planning and building of an architecture for graphs (and other interactive content), it means that we are still at least several more months away from having a working Foundation-supported architecture. Therefore, I think we should also be having the additional conversation that many others have brought up about what volunteers can do in these intervening months to make graphs somewhat available to users. I know people are talking about that concretely on the Phabricator task, and I will join that conversation as well. For the bigger question, I would like to start with some more learning about which kinds of interactive content are important for our encyclopedia, and how our community members see the evolution of the reading experience on our projects. I’d like to have some small conversations with many of you so that we can get into the details and ideas, joined by some of my colleagues. I’ll start reaching out to see who is interested in talking – and please let me know directly if you’d like to talk.

Thank you for weighing in so far, and let’s keep talking and planning together.

Marshall

[1] https://meta.wikimedia.org/wiki/User:MMiller_(WMF)
[2] https://phabricator.wikimedia.org/T334940
[3] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024#Our_approach_for_the_future
[4] https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/2023_Moderator_Tools_project#October_20,_2023:_Final_update
[5] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/iOS/Watchlist#October_2023
[6] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/Android/Anti_Vandalism
[7] https://www.mediawiki.org/wiki/Reading/Web/Accessibility_for_reading
[8] https://commons.wikimedia.org/wiki/Commons:WMF_support_for_Commons/Upload_Wizard_Improvements
[9] https://meta.wikimedia.org/wiki/Future_Audiences#Sign_up_to_participate!

That comment is nothing more than willingness to let the perfect be the enemy of the good, and things not getting done as a result.

That's not how I read it. He brought up security and performance concerns, both of which have not even reached "good" let alone "perfect" as this task has gone on. (Besides, when talking about security, I'd hope for better than just "good enough".) I'm frustrated by the time lost in this task as well, and the fact that it looks like it'll be much more time before we see anything concrete, but it *is* important that the solution be secure and performant.

it *is* important that the solution be secure and performant.

@SlyCooperFan1 I'm not sure that applies to this bug report specifically.
(ofc no one suggested doing anything insecure, but addressing this bug report in the simplest way doesn't require novel security or performance work)

"All Graphs broken on Wikimedia wikis" is the current task, calling for a near-term solution 'quickly', the essential goal being:

What should have happened instead?: Graphs should be shown

This refers to existing graphs, already rendered many times apiece with no adverse effects, not necessarily arbitrary future graphs.


Other related issues, such as
"Develop a secure and performant solution for future non-interactive graphs" or
"Develop a secure and performant solution for future interactive content"
are separate tasks, deserving their own tickets, important to discuss as there is time and capacity, but less urgent. In particular both 'secure' and 'performant' can expand to fill available space, planning time, and budget.

Today I had to explain to a classroom full of students why they still have to make improvements to the demography section of some articles, being that part of their assignment, despite not being able to visualize what their fellow students did last year.

Another week without a roadmap to solve this, and another opportunity lost to engage digital humanities students into improving articles with relevant graphs.

For those not on the mailing list, the message from MMiller on February 6 was:

...

... In the current fiscal year, the Product and Technology department has made experienced editors a priority [3], and many things that volunteers have asked for are either accomplished or in flight:

Improvements to PageTriage (complete) [4]
Watchlist in the iOS app (complete) [5]
Patrolling in the Android app (in progress) [6]
Dark mode (in progress) [7]
Improvements to the Commons Upload Wizard (in progress) [8]
…and other projects.

Are they really that bad in prioritizing?
All items mentioned above are improvements to what is already working while we are talking about a feature that broke down. I'd expect that long ago quickly re-prioritize current works and add this problem to the list of things they invest into.

I agree with Ppperry and Michgrig. We don't need to do everything at once. Right now, it'd probably enough to have scripting be sanitized, Vega updated to the latest, while work on a long-term solution is prioritized.

Right now, it'd probably enough to have scripting be sanitized

Not easy per T336595#8848425.

Hmm... thanks. Maybe use the proposal to restrict it to intadmins? The vulnerability is similar to the default JS stuff, and I don't see what TheDJ means with his concerns. If we've already decided to shut down the editing activity for JS stuff, why not use that against this vulnerability?

Hmm... thanks. Maybe use the proposal to restrict it to intadmins? The vulnerability is similar to the default JS stuff, and I don't see what TheDJ means with his concerns. If we've already decided to shut down the editing activity for JS stuff, why not use that against this vulnerability?

This solution is already declined in T336595: Restrict editing of Vega spec to a small set of users

Hmm. Without a given reason to decline, the reject isn't very convincing to me. The only reason against that I can see in that thread is an (unfounded?) distrust in intadmins being able to respond timely, and a pretty good concern about intadmins not necessarily knowing about graphs, which could be solved with making the small group template editors instead of intadmins.

Hmm. Without a given reason to decline, the reject isn't very convincing to me. The only reason against that I can see in that thread is an (unfounded?) distrust in intadmins being able to respond timely, and a pretty good concern about intadmins not necessarily knowing about graphs, which could be solved with making the small group template editors instead of intadmins.

“If you really want to do something, you'll find a way. If you don't, you'll find an excuse.”
Jim Rohn ENTREPRENEUR, AUTHOR, MOTIVATIONAL SPEAKER

I assume the reason T336595 was declined was because the WMF (rightly) thought T222807 was better. But this odd quest for a nonexistent perfect solution has gone on long enough.

By this time, I think that is pretty evident that WMF is not going to solve this, and has no plans to work on solutions. Is not only @MMiller_WMF's message on the mailing list, you can see the Infrastructure goals for next fiscal year here: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Goals/Infrastructure. There's no single mention to iframing, graphs or other path needed for this to be solved. They will try us to forget that this existed (as it happened with many other issues broken), and hope that in some years there's no need to ask for a solution.

T336595 feels like a band-aid solution and a bad one at that. It does nothing to adress the underlying security problem or maintenance problem. I personally don't think we should do that except as a last resort.

I suspect WMF is more interested in replacing vega with a different tbd solution, which feels like a more proper fix.

I suspect WMF is more interested in replacing vega with a different tbd solution, which feels like a more proper fix.

I would assume that Marshall's aforementioned wikimedia-l response and comments of this nature strongly support this reading of current WMF strategy.

Band-aids are much better than nothing at this point. JS is a similar security risk and restricting it is what we did.

aliu: absolutely right. Thanks for continuing to engage on this, and your generally constructive approach :)

T336595 feels like a band-aid solution ... I personally don't think we should do that except as a last resort.

We are well past that point. Having failed to implement a last resort, while experiencing long-term degradation of reading experience for 200M pageviews, we are at the more philosophical point of wondering at the point of it all, and what sort of failure analysis could avoid such events in the future.

Given current gentle deployment velocity, I don't see a novel "tbd" solution being deployed within the next 18 months, by which point we may all be doing very different things with our time. Please let's implement anything at all that will let known, trusted editors who have Graphs in their workflows clean them up, wrap them up, prepare them for either new solutions or static replacement.

we are at the more philosophical point of wondering at the point of it all, and what sort of failure analysis could avoid such events in the future.

This is not going to get me any friends here, but the real problem was a communication one not an action problem. People were given false hope. The action plan was never clear. Nobody was willing to call a spade a spade publically. I suspect a large source of the frustration is the community feeling they were lead on, only to be dumped at the end, and quite frankly its hard to blame them feeling this way. Telling people no may be a bitter pill, but it is much less bitter than telling people yes (including yes by omission) when the answer is actually no.

Sometimes things fail and the cost (broadly defined, including but not limited to $$$) of fixing is not worth the price. That is life.

P.S. just to be clear, in my view, a lack of transition plan, even just to static images, is a reasonable thing to complain about.

we are at the more philosophical point of wondering at the point of it all, and what sort of failure analysis could avoid such events in the future.

This is not going to get me any friends here, but the real problem was a communication one not an action problem. People were given false hope. The action plan was never clear. Nobody was willing to call a spade a spade publically. I suspect a large source of the frustration is the community feeling they were lead on, only to be dumped at the end, and quite frankly it's hard to blame them feeling this way. Telling people no may be a bitter pill, but it is much less bitter than telling people yes (including yes by omission) when the answer is actually no.

I will say that even during last year's Athens hackathon (which was just after the new deploy failed due to the second discovered security issue), several developers were expressing doubt that this would be recoverable and predicting that it would result in the death of the Graphs extension. Mostly because the developers in question did not see the plan and the needed budget/resources by the foundation to develop a sustainable solution. The foundation was betting on short term, minimal fixes and hope. But sometimes the easy solution just doesn't present itself. The foundation made a bet and lost, all the while 'Corporate America'-management-style pussyfooting around the issue.

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already. You loose interactivity, but interactivity was already pretty rare in graphs.

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

You loose interactivity, but interactivity was already pretty rare in graphs.

I believe the main purpose of bringing graphs back should be at least enabling us to use static graphs, as they contributed to (in my opinion, no research here) to like 99% of all graphs before the incident.

The desire for interactivity is for me one of the issues that contributed to the failure of Graphs (but not the only one ofc). I think that the voice 'to bring back interactive graphs' sounded much louder than how it's actually important to do it. I've never made any research on how many graphs used interactive features, but I believe after deciding not to use Vega it should have been at least specified, what are the requirements for the solution (like what wikis really need and not what was taken from them). If there are no requirements, nothing will happen ever.

I have been working on a non-vega version of OSM Location map, which is now at 'close to complete' stage at https://en.wikipedia.org/wiki/Template:OSM_Location_map/sandbox . It jumps through some very ungainly hoops, as it uses the Maplink overlay, but only seems to work if an en:overlay template also adds an invisible square. That has allowed me to re-use the mercator calculations I had needed to get vega5 working, and add inline CSS graphics and text instructions on top of the map. (Betraying my ignorance, I had no idea CSS could be used like this). So far as I can tell, it appears to have a lower performace hit than Vega did.

There are a selection of examples at https://en.wikipedia.org/wiki/Template:OSM_Location_map/examples which also showcase some new features not possible with the old graph template. Any thoughts on the stability, performace, sustainability, portability and 'security safety' of this approach would be welcome. So far it only does 10 map-items. I am doing a few more compatibility/bug-find tests with existing map examples, and all being well will then ramp it up to the original 60 and go live in the next few days.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

Of course there would be downsides. That's the whole problem, that all of what has been discussed has downsides. But this at least would be within the control and capabilities of the community.

I have been working on a non-vega version of OSM Location map, which is now at 'close to complete' stage at https://en.wikipedia.org/wiki/Template:OSM_Location_map/sandbox

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

The biggest single display difference is text-labels either alongside a marker or simply naming a map feature. More control over symbols and other graphical elements also means a map can be 'added to' rather than simply given markers. Numbered dots are most similar, but with extra caption and display features, and can be used along with text labels. In some measure it has different use cases, and of course it builds on Kartographer through use of the maplink template. It also makes more use of standard wikitemplate syntax.

I just realised the sandbox template still showed the old documentation, with the interim solution, which you correctly noted is all done through kartographer. I have switched the maps to the sandbox version - a bit rough and ready but you will see the differences!

Just to confirm my understanding of the situation: graphs are still broken after a whole year, with no replacement whatsoever, and no plan for implementing one?

T336595 feels like a band-aid solution and a bad one at that. It does nothing to adress the underlying security problem or maintenance problem. I personally don't think we should do that except as a last resort.

But, to be clear, having graphs unavailable for a year already feels like there needs to be some last resort solution. I don’t see why Graphs cannot be brought back with tight security requirements about editing them, and then a more comprehensible solution can be worked out by the engineers (maybe even dropping the current library etc., but, eh, not waiting for two years for a new solution). Many wikis have used graphs extensively and therefore require them to be back, even though that might not apply to English-speaking audiences.

I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already. You loose interactivity, but interactivity was already pretty rare in graphs.

At that point, you might as well resurrect Graphoid in a more secure way. Which is, by the way, a solution: sandbox Vega JS part on the server, allow users to generate graph SVG or PNG images, and that’s it. No interactivity, of course, but also no security threat to the end-users. Which is what the original removal of Graph extension was about. Tbf I don’t get why the perfect is such an enemy of the good here, yes, maybe Graph extension should be phased out in the long-term, but currently we are serving readers a big pile of nothing for the last year. Surely the WMF, with all of its resources, can develop a short-term solution that would not compromise security and then work on a long-term solution. @MMiller_WMF’s email seems weirdly avoidant of that.

I work in a software company (as a technical writer, but it doesn't matter). If any of the teams in our company would have concentrated on dark mode instead of quickly fixing such a major bug in production, I'm definitely sure that team would have been immediately kicked out of the company.

At that point, you might as well resurrect Graphoid in a more secure way.

Saying T211881#9425243 here again: The last (before archival) version of graphoid is prone to RCE due to CVE-2020-26296. For a POC, see https://github.com/vega/vega/issues/3018#issuecomment-748929438

And there are still potential issue in latest Vega 5, though I don't know the detail.

Note Graphoid is a service (i.e. continously running server in node.js), and by its design it accesses external data from the internet (such as result of WDQS query). It is not a single binary like imagemagick or lilypond that can be run one-off statelessly and without internet access (for that we have Shellbox for it).

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

I feel that T66460 could be another promising solution for non-interactive graphs. When modules are able to generate SVGs on the fly, graphs can update dynamically based on changes in data and the laborious process of updating it in an external tool and reuploading the SVG can be avoided.

The ticket has an attached patch which looks like a good start, although it is 10 years old.

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

I feel that T66460 could be another promising solution for non-interactive graphs. When modules are able to generate SVGs on the fly, graphs can update dynamically based on changes in data and the laborious process of updating it in an external tool and reuploading the SVG can be avoided.

The ticket has an attached patch which looks like a good start, although it is 10 years old.

This will require T334953: Introduce an SVG Sanitizer.

Why? I could be wrong but my understanding is that that patch would generate SVGs which are treated as if they were uploaded SVG files. So they're not rendered client-side, and so it would only require T86874: Make SVG sanitization into a library at best.

If the sandboxing approach is abandoned, https://www.mediawiki.org/wiki/Extension:Graph/Plans should be updated correspondingly to provide correct information about what is going on.

it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on.

Am I reading this right? A year after a major feature was broken they haven’t even started working on it?

it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on.

Am I reading this right? A year after a major feature was broken they haven’t even started working on it?

To be fair they did explore some options. I don't know why would they say configuring a cookieless-domain is a "substantial investment". It should be fairly easy... But maybe current infrastructure is so complicated that configuring a proxy with a new domain is somehow hard ¯\_(ツ)_/¯

Saying T211881#9425243 here again: The last (before archival) version of graphoid is prone to RCE due to CVE-2020-26296. For a POC, see https://github.com/vega/vega/issues/3018#issuecomment-748929438

I wasn’t strictly talking about making Graphoid available again, I was talking about providing a sandboxed generator of Graphoid-like images. Maybe I am missing something, but I can’t see this being impossible in the same way that container version of Lilypond is able to be used.

I am wondering if we could use a rewrite of the graphs extension using something like P5JS or something similar (in other words moving to the frontend). This would also reduce the need for caching of files and whatnot used in the plugin.

I can confirm the usefulness of the functionality now re-enabled by the rewrite of Template:OSM Location without using the graph module and that it compliments the other mapping functionality in en:Wikipedia. This rewrite allowed about 5,287 pages on en:Wikipedia to have improved user friendly information at first page sight compared to other mapping options that I have now used widely and been forced to understand their limitations especially on first page sight.

Even though this apparently more efficient and presumably more secure rewrite would not have happened if the graph module had remained available, this editor is of the view that a low overhead and secure graph option that allows an ordinary editor to update changing data by text entry is core functionality moving on that the Wikimedia Foundation could usefully prioritise.

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

The biggest single display difference is text-labels either alongside a marker or simply naming a map feature. More control over symbols and other graphical elements also means a map can be 'added to' rather than simply given markers. Numbered dots are most similar, but with extra caption and display features, and can be used along with text labels. In some measure it has different use cases, and of course it builds on Kartographer through use of the maplink template. It also makes more use of standard wikitemplate syntax.

I am wondering if we could use a rewrite of the graphs extension using something like P5JS or something similar (in other words moving to the frontend). This would also reduce the need for caching of files and whatnot used in the plugin.

The Graph extension was (at the time it got disabled) frontend-only. And this was the issue: the graph code was written by one (unprivileged, i.e. not interface admin) user and executed in another user’s browser. While its input was JSON, apparently it was possible to write such JSON that runs arbitrary JavaScript code. JavaScript running in a user’s browser can do bad things (a less serious example being that it vandalizes pages in the victim’s name). In a JS-based frontend like P5JS being able to run arbitrary JavaScript code is not a vulnerability but the basic design, so it’s even worse than Vega.