Wikipedia:Village pump (idea lab)

From Wikipedia, the free encyclopedia
 Policy Technical Proposals Idea lab WMF Miscellaneous 
The idea lab section of the village pump is a place where new ideas or suggestions on general Wikipedia issues can be incubated, for later submission for consensus discussion at Village pump (proposals). Try to be creative and positive when commenting on ideas.
Before creating a new section, note:

Before commenting, note:

  • This page is not for consensus polling. Stalwart "Oppose" and "Support" comments generally have no place here. Instead, discuss ideas and suggest variations on them.
  • Wondering whether someone already had this idea? Search the archives below, and look through Wikipedia:Perennial proposals.

Discussions are automatically archived after remaining inactive for two weeks.

« Archives, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54

Interest in testing a tool for Breaking News? Seeking feedback.[edit]

My team at the foundation, WME, has developed a dashboard that tries to identify new articles related to global "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. You can read more about it here. I'm seeking help to improve the feature.

Here is the direct link to the dashboard. (desktop only).

I'd appreciate if anyone that tries it out can surface any potentially missing templates from across language projects that would help us capture more results. Using the thumbs up and down buttons in the demo to confirm or deny if entries are accurately identified as breaking news, would help me in the long and medium-term in building a better, more accurate tool.

Although Enterprises' focus is not on creating editing tools or gadgets, we hope this can be of use to the community, too.

Thanks! FNavas-WMF (talk) FNavas-WMF (talk) 16:19, 8 December 2023 (UTC)Reply[reply]

Are the thumbs up/down supposed to be if the article as a whole is about a current news event, or has been created in wake of a current news event? Because e.g. you have on the tracker Mama Diabaté who died two days ago, so that would be news and result in increased traffic and editing, but her notability would have been established over decades. On the other hand 2023 Guyana Defence Force helicopter crash was created for the purpose of covering a specific important recent news event. Are both to be considered "hits" for the tracker?
Also, "indications count" isn't documented, and I don't know what it means, and it seems odd (being a count) that you can only filter numbers equal to the count as opposed to higher or lower than. I also don't think the raw number of edits is too useful of a metric for the user to filter potential news articles, since news is rather localized by interest and region. Page-views-to-editor-ratio would seem more useful -- a niche new article or split may have a lot of edits from a dedicated editor and reviewers at first, but very few outside viewers will care to see it in the first hours. Any news event will blow it out of the water in viewer-to-editor ratio, even if news stories will have more anonymous editors. SamuelRiv (talk) 16:58, 8 December 2023 (UTC)Reply[reply]
Thanks for this feedback @SamuelRiv -- thumbs are to say, is this news or or not. there are a lot of false positives so were trying to filter what is not news. I'd consider both those examples as news. What is news and what isn't is so subjective, so really just up to the individual.
We don't use any pageviews right now, so all this is based on editing behavior/presence. Good call on the "viewer-to-editor" ratio idea ... My only issue that we could only calculate that 24/h too late (given how PV work right now). FNavas-WMF (talk) 21:10, 8 December 2023 (UTC)Reply[reply]
A 24h delay in the ratio is fine as long as you have some smoothing average on both views and edits -- it will be better than the metrics you currently have available. (I'm sure you can figure out better metrics once you get some data.)
News isn't really subjective in these clear cases -- your first verification would just be a Google n-gram call to see if there was a major spike in searches in the past week. If the API for that is free, that'd be the best metric I can think of. There's tons of simple algorithms to verify a spike or step discontinuity in rough data. SamuelRiv (talk) 21:20, 8 December 2023 (UTC)Reply[reply]
@FNavas-WMF many breaking news items are related to articles that already exist - so being able to see articles that have high "within last hour" activity instead of only new articles may be useful. — xaosflux Talk 18:08, 8 December 2023 (UTC)Reply[reply]
yep 100% agree @Xaosflux -- i'm working on getting us to within the last hour method you describe as we speak! FNavas-WMF (talk) 21:11, 8 December 2023 (UTC)Reply[reply]
I would go further and say that anyone who feels compelled to write about a news event on Wikipedia should look for existing articles to update rather than create a new one. This is an encyclopedia, not a newspaper. Phil Bridger (talk) 21:27, 8 December 2023 (UTC)Reply[reply]
@Phil Bridger totally agree. It seems to me that folks, at least on enWiki, do try to add to an existing article, which is why this tools as it works now is only very good for NEW, totally unforeseen events. Do you pointing editors to existing articles that are part of news is more valuable than to new articles? FNavas-WMF (talk) 16:25, 11 December 2023 (UTC)Reply[reply]
If memory serves, Another Believer does a lot of work with breaking news and might be interested in this. WhatamIdoing (talk) 00:33, 12 December 2023 (UTC)Reply[reply]
Thanks for the ping. This is on my radar and I was even able to chat with Francisco a bit at WikiConference North America recently. I've subscribed to this discussion and I'm curious to see what folks say about the tool. ---Another Believer (Talk) 00:39, 12 December 2023 (UTC)Reply[reply]
Thanks! These comments have been very useful. I'm looking for more ways to cut down false positives to cut the noise! The "cite news" template is extremely useful to catching breaking news. It seems quite reliably used in new news events.
@WhatamIdoing @Phil Bridger @Xaosflux do you all see any more templates I should be following? FNavas-WMF (talk) 20:01, 18 December 2023 (UTC)Reply[reply]
{{cite web}} gets used a lot as well, especially when being used by newer editors who are using one of the citation insertion tools. — xaosflux Talk 20:46, 18 December 2023 (UTC)Reply[reply]
That template is probably less specific, though. WhatamIdoing (talk) 21:13, 18 December 2023 (UTC)Reply[reply]
Indeed. But especially if you are a user (new or old) that isn't aware of some of these templates and go through the basic VE workflow of (a) Type in something (b) Click the Cite button (c) Dump in your URL -- you will end up inserting a cite web. — xaosflux Talk 21:46, 18 December 2023 (UTC)Reply[reply]
@FNavas-WMF, an article being tagged with {{Current}} would be a direct indicator that we consider it a current event. But it is automatically removed by bot as soon as editing activity fades, which is often still while a layperson might consider something to be breaking news. Wikipedia:Current event templates#Current events has related templates/categorization. I'm curious how your tool uses/relates to this. An article being linked from Portal:Current events would be another strong indicator. {{u|Sdkb}}talk 00:12, 3 January 2024 (UTC)Reply[reply]

Option to omit subordinate sections on edit[edit]

Case in point: [1] The editor meant to add the content at the end of the "Discussion (II)" section, but ended up adding it at the end of its subordinate section, "Split off into a new page". He didn't catch the error and it was fixed later by a different editor (me). He is an experienced editor, significantly above average in technical competence, and I see this happen too often.

(In this case, I ended up changing the level of "Split off into a new page" to that of "Discussion (II)" to prevent this from happening again, but that solution was sub-optimal. By all logic the "Split off into a new page" should be subordinate to the Discussion section.)

Even if one is aware of this pitfall, it can be really cumbersome to have to back up to find the section you want. Imagine if there are four or five subordinates, some of them really long.

There should be the option to edit a section without its subordinates. Equally beneficial on any page that has multi-level sections, including articles, not just talk pages. As for specifics, that's why I'm on this page.

One thing to consider is that an editor might not know the option exists, or it might not occur to them to use it. In such cases the option would do little good. I'm thinking a pop-up box if the edited section has any subordinates: "Do you want to include the subordinate section(s)?" ―Mandruss  21:58, 10 December 2023 (UTC)Reply[reply]

+1 for this sort of feature. It's been requested in various places for over a decade IIRC. I don't get caught adding content in the wrong place, so much as it's annoying to have to scroll to the correct place and an excessively long preview of subsections I am not planning to change. DMacks (talk) 22:19, 10 December 2023 (UTC)Reply[reply]
Okay, only half a decade. I knew it sounded familiar though... Wikipedia:Village pump (technical)/Archive 163#Edit section without subsections. DMacks (talk) 07:52, 12 December 2023 (UTC)Reply[reply]
So the last comment in that thread was PrimeHunter, one of our most credible editors on technical questions, saying this is not only technically possible but "straightforward". There was no reply, suggesting concession by the naysayers. That was at VPT, and it seems to me the next step would've been this page. Not sure why that didn't happen. ―Mandruss  22:17, 12 December 2023 (UTC)Reply[reply]
@PrimeHunter:... DMacks (talk) 20:16, 18 December 2023 (UTC)Reply[reply]
I said "It seems straightforward". I'm not a MediaWiki developer and don't know how easy it would be in practice but it doesn't sound hard. I don't believe Izno's earlier comment there: I'm pretty sure "this is not technically feasible" is the answer due to the way that HTML sectioning works. That seems irrelevant. When you save a section edit, MediaWiki reparses the wikitext of the whole page in the same way as if you had edited the whole page. PrimeHunter (talk) 21:55, 18 December 2023 (UTC)Reply[reply]
-1 to the popup confirmation, but +1 to being able to edit just the "lead" of a section sans any subsections. I'm sure people will jump in with some good examples, but I'm struggling to imagine when "edit smallest applicable subsection" and "edit entire page" are both worse options than "edit intermediate size chunk". Folly Mox (talk) 02:19, 11 December 2023 (UTC)Reply[reply]
@Folly Mox: Your last sentence seems to suggest that it should never include subordinate sections, which would be another way of solving this problem; do I have that correct? If so, there are some cases where one would want to do that, such as re-ordering the subordinate sections or moving text between subordinate sections. Such things could be accomplished in other ways, including editing the entire page, but significantly less easily and more error-prone. ―Mandruss  20:33, 11 December 2023 (UTC)Reply[reply]
Yeah, never including subsections except in the "edit full page" case was my idea for avoiding a popup confirmation, but those things you mention are fine arguments for retaining the ability to edit a section including all its subsections. Another one is when there is no "section lead", and the prose starts after the first subsection. Misclicking on the wrong pencil would send users to an empty editing interface, which we'd have to cancel out of annoyingly. So maybe my idea is bad? I definitely am not liking an additional modal thing to tap between the editing pencil and the editing interface, but I'm not sure of the way round it. Folly Mox (talk) 21:45, 11 December 2023 (UTC)Reply[reply]
"Editing pencil": You must be using a different editor. I click [ edit ] next to the section heading.
Remember that the pop-up would only happen when there are subordinates, so the impact might be less than you imagine. The question would be asked only when needed. ―Mandruss  21:56, 11 December 2023 (UTC)Reply[reply]
On mobile skin, you have to go all the way to the top toolbar on a page, click the three dots, and click "edit full page" to do that. On very large pages that may well be a bigger inconvenience than the issue described here. Mach61 (talk) 19:50, 11 December 2023 (UTC)Reply[reply]
(Actually, there's no technical reason why this feature would have to be implemented the same on m.wiki AFAIK, so carry on) Mach61 (talk) 19:52, 11 December 2023 (UTC)Reply[reply]
There are indeed two issues here. The major one is the back-end: we need MW API support for it. The other one is the interface to activate it, for which we could have all sorts UI/UX design ideas, gadgets, etc. But none of the latter matters without the former. DMacks (talk) 02:12, 12 December 2023 (UTC)Reply[reply]
That's above my pay grade. If this earned a consensus at VPR, what are the realistic odds it would happen? ―Mandruss  06:47, 12 December 2023 (UTC)Reply[reply]
Any chance the gadget that allows the editing of lead sections might help? CMD (talk) 07:43, 12 December 2023 (UTC)Reply[reply]
No, that is quite different. Each section is numbered sequentially, so the lead is section 0 already and is not a header-delimited section at all (so the other sections are not subsections of it, in the way a === is a subsection of ==). DMacks (talk) 07:52, 12 December 2023 (UTC)Reply[reply]
All the gadget does is make a section=0 link like https://en.wikipedia.org/w/index.php?title=The_Example&action=edit&section=0&summary=/*%20top%20*/%20 to use a feature which already exists in MediaWiki. You could have made the same url manually. The proposal here would require a new MediaWiki feature. PrimeHunter (talk) 21:55, 18 December 2023 (UTC)Reply[reply]
Brainstorming a gadget that would be a clickable link in the section to call action=edit buth then intercept the actual spawning of the editor. It would snip off everything starting with the first line that begins with "==" into a hidden separate field, then reattached it when the user clicks 'publish'. DMacks (talk) 10:11, 2 January 2024 (UTC)Reply[reply]

Brainstorming a COPYVIO-hunter bot[edit]

I'd like to propose the idea of a a COPYVIO-hunter bot, but I'm not ready to make a specific Bot request yet, and so I'd like to expose this idea here first to brainstorm it. Sometimes, copyright violations are discovered that have been present on Wikipedia for years. (The copyright-violating content at Barnabas#Alleged writings was added on 4 August 2014 and discovered 18 December 2023.) But for an alert Tea house questioner two days ago, who knows when, if ever, this would have been discovered. That's worrisome.

We have some good tools out there, such as Earwig's detector, and my basic idea is to leverage that by building a bot around it, which would apply it to articles, and either generate a report, or apply the {{Copyvio}} template directly. A couple of additional bot tasks could streamline the human part of the investigation by finding the insertion point (Blame) and determining copy direction (IA search). There are input, performance, scaling questions, and human factors, and likely others I haven't thought of. As far as input, ideally I'd like to see a hybrid or dual-channel input of a hopper with manual feed by editors (possibly semi-automated feed by other tools), and an automated input where the bot picks urls based on some heuristic.

For performance, I launched Earwig with all three boxes checked, and it took 62 seconds to return results for Charles de Gaulle (174,627b) and 16 seconds for (randomly chosen) Junes Barny (5,563b). I'm pretty sure there are a lot more articles closer in size to the latter than the former, so let's say Earwig takes 30 seconds per search on average; multiplying that by {{NUMBEROFARTICLES}} gives us 6.43 years to search all of Wikipedia with a dumb, single-threaded bot with no ability to prune its input stack. (Of course, Wikipedia would be bigger six years later, but that gives us an idea.) Given that the Barnabas violation went undiscovered for nine years, six years is not so bad, as I see it. But not all articles are equal, and probably some pruning method could decrease the size of the input stack, or at least prioritize it towards articles more likely to have undiscovered violations.

As far as scaling, I have no idea of server availability at WMF, but presumably there are some bot instruction pages somewhere for bot writers which address how many threads are optimal, and other factors that could scale up the processing for better throughput; maybe someone knows something about that. If we had six threads going against one input stack, that would reduce it to one year; it would be great to run it annually against the entire encyclopedia.

For human factors, I'm thinking about the increased number of articles tagged with copy violations, and the additional load on admins that would inevitably result. There are currently 17 articles tagged with the {{Copyvio}} template right now. I wanted to provide some estimate of activity at Wikipedia:Copyright problems to gauge current throughput, but I'm not so familiar with the page, and was unable to do so. Inevitably, a bot would increase the load on admins (for WP:REVDEL) and other volunteers, and it would be helpful to gather some data about what would happen. Not sure if its possible to project that, but maybe a stripped down version of the bot just to wrap Earwig and spit out numbers on a test run of a week or two might give us some idea. I'm guessing in operation, it would generate a big, backlog balloon initially based on the first two decades of Wikipedia, but then its output would slow to some steady state; in any case, backlogs in other areas have been generated and attacked before with success.

Maybe a bot could somewhat reduce load per investigation, by means a handy output report that includes Earwig percent, maybe a brief excerpt of copied content, and so on. A couple of additional tasks could be defined which would work off the output report, one task running Blame on the suspect articles to add date of insertion to the report, and another to read IA snapshots and determine direction of copy (i.e., is it a mirror, or a copyvio), resulting in a report with information that ought to make the human part of the investigation considerably faster and more efficient per occurrence, which should at least somewhat offset the increased overall number of investigations.

Would love to hear any feedback on the technical aspects of this, as well as the human factors, and whether something like this should even be attempted. Thanks, Mathglot (talk) 02:00, 21 December 2023 (UTC)Reply[reply]

Maybe a fourth task could be a disposition-triage task, and would act on the report output of previous tasks based on configurable values; something like: "if copy-direction = copyvio then if Earwig-pct > 85 then remove content from article and mark/categorize as revdel-needed; else if Earwig-pct < 20 then remove Copyvio template and mark report as handled; else leave for human assessment; else mark as mirror and handled." Mathglot (talk) 02:29, 21 December 2023 (UTC)Reply[reply]
EranBot currently sends every new edit through CopyPatrol if I understand it correctly, which essentially runs the edits through Turnitin/iThenticate. One could reduce the bot load by making it only look at articles that were created prior to August 2016.
@MusikAnimal (WMF) and Mathglot: I understand that the WMF is currently working on a replacement/re-vamp of CopyPatrol (i.e. Plagiabot). Is there a way to integrate a sort of "historical article detection" into a similar interface while re-using some of the code from the new Plagiabot, or is this something that you think would be better kept separate? — Red-tailed hawk (nest) 02:42, 21 December 2023 (UTC)Reply[reply]
That's terrific news, which means, if I understand correctly, that whatever the scope of the problem is, at least it's not getting worse (assuming perfect precision from Plagiabot). So we only have to deal with the pre-whatever-year issue, and slowly chip away at it. (I am subscribed; no ping needed.) Mathglot (talk) 02:56, 21 December 2023 (UTC)Reply[reply]
@MusikAnimal (WMF) I remember putting this up on phabricator somewhere (I think?), but would it be possible to provide a stable API to integrate CopyPatrol with various other editing/CVUA tools (specifically it would be great to be able to answer the question "What is the iThenticate score/URLs for a specific edit") Sohom (talk) 06:29, 21 December 2023 (UTC)Reply[reply]
I've left MusikAnimal a comment on their WMF account talk page. It would be nice to hear from them on this. — Red-tailed hawk (nest) 17:45, 25 December 2023 (UTC)Reply[reply]
I acknowledge it's Christmas, and many WMF staff are taking vacation/holiday, so it's fairly possible that we might not hear back for a week or so. — Red-tailed hawk (nest) 17:53, 25 December 2023 (UTC)Reply[reply]
Thanks. I've added DNAU for 1 month, imagining that he may be on a nice, long winter vacation. Mathglot (talk) 21:24, 25 December 2023 (UTC)Reply[reply]
An API for reviewing/unreviewing does exist, but it's undocumented right now. It also doesn't provide Access Control headers. I was working on an external-use API for CopyPatrol, but decided to hold off until the new version that uses Symfony was finished and deployed, since it won't be usable anyway until deployment has finished. Chlod (say hi!) 02:22, 26 December 2023 (UTC)Reply[reply]
Thanks for your patience! I was "around" on my volunteer account, but haven't been checking this one until today (my first day back at work after the break).
It sounds like you all are asking for phab:T165951, which was declined last November. It can be re-opened if there's interest in it. However, it's worth noting CopyPatrol doesn't go through every edit, only those that meet certain criteria. I let @JJMC89 speak to that before I say something wrong ;)
As for an API, we can certainly add an endpoint to get the score for a given revision, if it exists in our database. That's simple to implement and won't require authentication. If you could file a bug, I can have that ready for when the new CopyPatrol goes live.
API endpoints that make changes to our db, such as reviewing/unreviewing, is another matter. Right now we authenticate with OAuth, so we'd need to somehow have clients go through that before they could use the endpoint. If @Chlod is interested in building this, I'll happily review it! :) Off the top of my head, I'm not sure how to go about implementing it. Alternatively, maybe we could provide all logged in users an API key? That would avoid clients having to login to CopyPatrol.
I don't think we want to permit requesting new scores for any arbitrary revision, at least not until our partnership with Turnitin is finalized. That should happen very soon, and then we'll know for sure if we can send out that many API requests. Some changes to JJMC89's bot would likely also need to be made. All in all, I'd say this feature request is not much more than a "maybe".
Also, in case no ones mentioned it yet, attempting to identify old copyvios is tricky because of the all-too-common WP:BACKWARDSCOPY issue. In some cases it may not be possible to ascertain which came first -- Wikipedia or the source -- so I'd weary of attempting to automate this. MusikAnimal (WMF) (talk) 00:57, 3 January 2024 (UTC)Reply[reply]
The new bot looks at edits made in the article and draft namespaces (0 and 118) to submit to turnitin and skips the following types of edits:
  • made by a bots or users on the allow list
  • (revision) deleted before processing (rare unless catching up from a service outage)
  • rollbacks (MediaWiki native or Twinkle)
  • additions of < 500 characters after cleaning the wikitext.
Those that come back with more than a 50% match to a (non-allow listed) source are shown in CopyPatrol for human assessment.
As a quick test, I added an endpoint to dump the data from the database for a specified revision.[2]
{
  "diff_id": 7275308,
  "lang": "en",
  "page_namespace": 0,
  "page_title": "Mahāyāna_Mahāparinirvāṇa_Sūtra",
  "project": "wikipedia",
  "rev_id": 1178398456,
  "rev_parent_id": 1178304407,
  "rev_timestamp": "Tue, 03 Oct 2023 12:16:34 GMT",
  "rev_user_text": "Javierfv1212",
  "sources": [
    {
      "description": "C. V. Jones. \"The Buddhist Self\", Walter de Gruyter GmbH, 2021",
      "percent": 50.3817,
      "source_id": 820817,
      "submission_id": "3084bde6-3b8b-488c-bf33-c8c27a73ae06",
      "url": "https://doi.org/10.1515/9780824886493"
    }
  ],
  "status": 0,
  "status_timestamp": "Tue, 03 Oct 2023 12:38:16 GMT",
  "status_user_text": null,
  "submission_id": "3084bde6-3b8b-488c-bf33-c8c27a73ae06"
}
Please file a task so we can workshop the best way to design the API.
— JJMC89(T·C) 00:40, 4 January 2024 (UTC)Reply[reply]
Filed as phab:T354324. This could be done on either the frontend or the backend; but it doesn't look like the backend source is publicly-available (and API endpoints are a frontend task anyway, so it should probably live on the frontend). Chlod (say hi!) 10:03, 4 January 2024 (UTC)Reply[reply]
I'd encourage making the repos public unless there is a reason for keeping them private. It will make things easier if someone goes inactive or if someone wants to submit a patch. –Novem Linguae (talk) 11:36, 4 January 2024 (UTC)Reply[reply]
Hi, Mathglot! Great to hear more initiative on copyright cleanup tasks; they're always a big help. Someone brought up a related idea at WT:CCI a while back, and I responded with a few points that probably apply here too. I've got a cannula lodged in my hand right now, so I'll copy over what I said in that thread to avoid straining it. There wasn't a lot of back-and-forth on that thread anyway so it's probably easier if I just repost it here.

There was an idea previously floated around about having Turnitin or Earwig run on all revisions of past cases; I'd say this is probably the general idea when talking about automation for CCI cases. When it actually comes down to making it happen, though, it's a spider web of caveats and limitations that make it hard to get off the ground. Here's a more-organized explanation of my thoughts that I randomly collected in the past few months:

  • First is the issue of cost. There's around 508 thousand revisions left to check (as of May this year), but we only ever have a finite amount of Earwig search engine searches or Turnitin credits. Processing all of these automatically means we have to work with the WMF to get more credits for a one-time run-through, and we're not sure if we'll get decent results for a majority of those checks.
    • We could work around this by completely disabling search engine checks, as the thread you linked discussed, but this can either work for or against us based on the case. We could also work around this by only selecting a few cases which rely mostly on web sources or (for Turnitin) sources that we know would probably be indexed. This significantly cuts down on the amount of revisions to check. But then there's the next issue:
  • A lot of the older cases, especially the ones over three years old, start getting a lot of false positives. As article text remains on the wiki for long periods of time, SEO spam sites, academic documents, slideshows, and others start copying from Wikipedia. We filter out a lot of these already (like those in this list and a bunch of others), but we still hit them every once in a while and enough that it clogs up what reports we would otherwise get from Earwig/Turnitin.
    • A possible solution to this would be human intervention (which is more or less a given with something like this), where editors will double-check to see if a flagged revision actually is copied from somewhere, or if it's just a false positive. Human intervention will weed out false positives, but then it won't weed out the false negatives.
  • At the end of the day, copyvio checking is a really hard computer science problem that humanity is still in the middle of solving. False negatives; like when a revision flies under the radar because a source it copied from has died, or when the text has been paraphrased enough to make checkers think it's completely original text; will always be one of the biggest brick walls we face. False positives waste editor time, yes, but false negatives arguably take up more time, because we then need to re-check the case. It also wouldn't be a good look for us or the WMF if it turns out that we get a lot of false positives and negatives, since that could be perceived by the community as a waste of funds. Perhaps this is still something that could benefit from research and testing.
    — User:Chlod 13:02, 24 November 2023 (UTC)
This was for checking revisions on CCI pages, but the same applies for scanning every latest revision for all articles. It seems we've also been stretching Earwig to its limits recently, Earwig has been going down for almost every day in the past two weeks (CommTech's UptimeRobot). Unfortunately, the Earwig logs are project members-only, so I can't snoop in to figure out the cause by myself. But usually, we chalk this up to Earwig running out of Google API tokens. Would appreciate comments or ideas for the problems above; anything to ensure copyvios don't fly under the radar. Chlod (say hi!) 02:15, 26 December 2023 (UTC)Reply[reply]
Chlod thanks much for this. A few questions or comments:
  • Whats the 508,000 revisions? Is that just from CCI investigations?
  • In that same bullet, what cost are you talking about, processing time? And what did you mean by decent results, are you alluding to false +/- that you raised lower down?
    • As far as the workarounds, this sounds like roughly what I referred to as various pruning methods to shorten or reorder the input list.
  • Re false + due to websites copying from Wikipedia, I don't see this as a major problem and I addressed it in the 'direction of copy' comment involving IA checks. Maybe we'd have to negotiate with IA for a certain amount of search traffic per unit time, but as a fellow non-profit and given the reasons for it, I can't imagine there wouldn't be some positive arrangement to come out of that. That would eliminate the need for human intervention in a proportion of cases; see the "if-then" psuedo-code at the end of my comment. The triage attempts to automate a lot of it, and steer only the grey-area cases toward human intervention. And it should also weed out most false negatives for the same reason, and I don't see the failure to have 0% false negatives as a problem. There is always a problem identifying edge cases, even when humans are involved; if an automated solution improves our accuracy and throughput over what it was before, then it's worthwhile. One hundred percent accuracy and coverage are a goal but they will never be attained and that shouldnt stop us from incremental progress; even if automated processes fail to identify some sites for human intervention, we'll catch 'em, hopefully, next iteration of the processing.
  • "Really hard computer science problem": again, imho, we don't need to "solve" it, we just need to do a bit better than we were doing heretofore. Paraphrase will fall, imho, to better shingling turbocharged with some AI to recognize synonyms and linguistic transformations at some point in the not-nearly so distant future as I would've guessed a year ago. We needn't let the perfect be the enemy of the good, and I think we can do a lot of good now.
  • Earwig woes: is anyone maintaining it?
Thanks, Mathglot (talk) 00:02, 27 December 2023 (UTC)Reply[reply]
  • Yep, the 508k revisions is those we have to check at CCI. That's from a dashboard by Firefly to see how much is left. It has its inaccuracies, but it's correct for most cases.
  • For the cost, it's actual monetary cost. From what I've heard (and what I assume from what I've heard), the WMF pays for the Google API and Turnitin credits, and that cost is pinned to how much we use Earwig and how many edits are checked by CopyPatrol, respectively. Attempting to request more credits for either needs discussion with the WMF, who then needs to discuss with Google/Turnitin. And yeah, the decent results is whether or not Earwig comes up with a false positive/negative.
    • Definitely; there's a lot of one-or-two-sentence stubs that don't really need checking. This could, of course, be filtered out, possibly with a lot more criteria for skipping than just that.
  • I'm wary about using Internet Archive as a "source of truth" for dates. Though we do exactly that in CCI, it's probably not reliable enough to make broad judgements on whether a page is a copy or was copied from. If the pipeline goes Earwig → URL of likely match → Internet Archive, the data it would provide in a report could be a false positive if either the page changed URLs at any point in time (as I've seen happen with Sparknotes) as Internet Archive may not recognize the switch or if it was never archived before (though this practically never happens for recently-added citations). Of course, it's best if this is tested empirically first.
    • This is a step in the right direction though. The downside of not using a system like this at all is that the direction checking will be manual, which then just pushes the investigation work back to the addressing user/administrator, and that could result in anywhere from zero (by luck) to a lot of false positives. But what has to be checked first is whether this will end up increasing processing time/workload for checking users.
  • Earwig's Copyvio Tool is actively maintained by The Earwig. The recent downtimes were shortly discussed in User talk:The Earwig § Copyvio tool is down; I only saw this now. Seems to have been from increased usage.
I agree; something is better than nothing. I'm mostly just worried about stretching the few editors working on copyvio even thinner by adding more work to do. We could balance this by encouraging more editors to help out at WP:CCP, but copyright cleanup really just has historically low participation rates. Chlod (say hi!) 05:14, 27 December 2023 (UTC)Reply[reply]
Hey Chlod, thanks for pinging me here.
  • With Google's API, there's a hard daily limit of 10,000 queries per day, which costs US$50. The copyvio detector will make up to 8 queries per page (each query corresponds to a sentence or so of text, so that is chosen to strike a balance between performance and detection accuracy – longer articles would really benefit from more than 8 queries in many cases). So that works out to somewhere between 1,250 and 10,000 articles per day; let's say 2,000 on average. To be very clear, that's a limit built into Google's API terms. We can't get around it without a special agreement with Google, and everything I've heard from the WMF indicates we have no special agreement: we're paying the regular rate. Over ten years of running the copyvio detector, and despite multiple people asking, I've never managed to make the right connections with the right people at Google to get a special agreement (or the WMF hasn't, and IMO it's really them who should be doing that instead of me).
  • Just bashing the numbers out, checking 500,000 pages without a special agreement with Google would cost $12,500 and take at least 8 months (again assuming 5 queries/page).
  • The search engine is really the limiting factor here, hence my emphasizing it. Compute cost is much cheaper and we could use WMCloud to parallelize this more effectively if the daily limits weren't so severe.
  • Recent issues aren't related to using up all of our Google API credits but mostly due to my own poor software engineering decisions ten years ago. Sometimes it's due to unauthorized bot traffic that needs to be identified and blocked, but in this case I haven't noticed any. There's an ongoing project to improve performance, but no timeline for when it will be ready, unfortunately.
— The Earwig (talk) 14:53, 27 December 2023 (UTC)Reply[reply]
Thanks for these detailed explanations. Just noting that I've started User:Novem Linguae/Essays/Copyvio detectors to try to document all these copyright tools and their nuances. Seems like every couple months this comes up and I've forgotten all the details since the last discussion, so maybe an essay will help me remember it :) –Novem Linguae (talk) 12:13, 31 December 2023 (UTC)Reply[reply]
@The Earwig: Anywhere I could possibly help with the copyvio detector's uptime? It's also affecting the NPP workflow at times, as the copyvio detector is part of checks to be done when patrolling. Chlod (say hi!) 13:56, 4 January 2024 (UTC)Reply[reply]
@Chlod: Thanks for offering to help! I've given you maintainer access to the tool, and you have permission to restart it when needed. This is the case if the request backlog gets full (a log message "uWSGI listen queue of socket" is printed to uwsgi.log over several minutes) but occasional slowness doesn't necessarily mean the queue is full and needs to be cleared. It's good for us to have maintainers across different timezones. But beyond the occasional restarts, addressing the underlying issue is complicated and not something I expect help with. As hinted above, a backend rewrite is in progress to improve performance. — The Earwig (talk) 16:41, 4 January 2024 (UTC)Reply[reply]
As I understand it, the issues with applying Earwig's copyvio thing to more pages (and the reason it always takes a million years to run) has nothing to do with computational power or programming skill on our part, but rather because Google search, which is a quite critical part of this software working, has deliberately decided to fuck us sideways on search queries.
Well, it's not clear: it could be that or it could be that nobody from Wikipedia or from the WMF has succeeded in figuring out how to ask them from a special dispensation.
At any rate, we have a rather low quota, and it would cost tens of thousands of dollars to make it higher, and we do not get any special dispensation although I guess they are perfectly fine to make millions of dollars from reusing our content in their own knowledge panels lol. jp×g🗯️ 11:25, 28 December 2023 (UTC)Reply[reply]
Maybe @NPerry (WMF): might give more insight as to why the Wikimedia Foundation has not been able to get resources for copyright detection with Google search ? AFAIR, last year, they were involved with managing Wikimedia's partnership with Google. Sohom (talk) 11:54, 28 December 2023 (UTC)Reply[reply]
  • I'm not active in copyvio detection work, so take what I say as an outsider's perspective. Overall, copyvio detection on Wikipedia seems like an area that's struggling despite the heroic efforts of those working on it — multi-year backlogs at places like CCI are indicative of a system that's just not working. Bot assistance is our best hope of changing that dynamic on a systemic level, so I think it's a fruitful avenue to pursue. It'd be complex on a level greater even than ClueBotNG, but if successful it'd be similarly impactful.
    One thing to perhaps think about is the difference between old copyvios and newly added ones. My vague understanding is that a lot of the difficulty/pain comes from years-old insertions, which have since been built upon, necessitating removal of large chunks of an article. If it'd be simpler to build a bot that only checks/fixes new contributions, then perhaps that'd be a good place to start. If it could sufficiently stem the tide, perhaps it'd lead to a situation similar to what we have with non-notable articles/deficient FAs today, where there's a bunch of stuff in the past to clean up, but ultimately it's a finite backlog with few new entries being added, creating hope we'll someday get through it (cf. WP:SWEEP).
    Hope that's helpful, and good luck with this work! {{u|Sdkb}}talk 00:03, 3 January 2024 (UTC)Reply[reply]
  • (Possible overlap with part of above) - we have a copyright flagging system already (see log) - and allowing more bots to flag is fairly easy to do. Like many have said, building a reliable algorithm for doing the actual checking is a "hard" problem. One problem that came up during prior third party solutions like TURNITIN is that these companies wanted to reuse Wikipedia content without honoring the licensing requirements (e.g. We send them some text, they store it, then they reserve that to other people without attribution). — xaosflux Talk 17:00, 4 January 2024 (UTC)Reply[reply]

Deletion of account is needed[edit]

There should be an account deletion system. Edits made by deleted account should be left with name of the account without a link. 160.238.0.118 (talk) 19:34, 26 December 2023 (UTC)Reply[reply]

For legal reasons related to attribution of material, it is not possible to delete accounts. They can however be renamed in some circumstances: see Wikipedia:Courtesy vanishing. AndyTheGrump (talk) 19:45, 26 December 2023 (UTC)Reply[reply]
Given I can just search for all other edits made by that "name of the account", there is no difference whether or not they have a "link". Sounds like a distinction without a difference. What is it your understanding of what an 'account' actually is? DMacks (talk) 10:03, 2 January 2024 (UTC)Reply[reply]
In the site's terms of service, you agree that the link is sufficient attribution. Remove the link, and the attribution is no longer within what you agreed to, and technically, you can sue the WMF. Snowmanonahoe (talk · contribs · typos) 14:27, 14 January 2024 (UTC)Reply[reply]

Allow soft deletion of unopposed nominations[edit]

Hi. I am wondering what people would think about repealing the "a page is only eligible for soft deletion if it has been PROD'd/deleted in the past" rule. I am not the most active person at AfD, but I invite anyone to go to a random page in the (recent-ish) AfD archives and ctrl+f for the word "ineligible". Uncontroversial nominations (or nominations in which the nominator leaves nothing for further participants to add) get relisted all the time because someone objected to a PROD, or it was previously deleted.

I went through the closed discussions in December, and I found 36 discussions which were relisted as ineligible for soft deletion but were subsequently deleted (usually after a few delete per nom or delete NN !votes, and perhaps some additional relists): 1, 2, 3, 4, 5, 6, 7, 9, 10, 11[a], 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32[b], 33, 34, 35, and 36.

To be fair, I found four bluelinks that were saved because they were "ineligible for soft deletion": 1[c], 2, 3[d], 4. But I don't think that a redirect, a stub, a non-neutral REFBOMB mess, and No Pants Day[e] justify the volunteer hours spent rubber-stamping uncontroversial nominations. Therefore, my idea: let these things be soft-deleted. Even if they were controversial[f] at one point in time, they are not anymore. They would be eligible for WP:REFUNDs, and a single objection in the current AfD debate would still prevent soft deletion. I think it is time to get rid of this WP:CREEPy rule. HouseBlastertalk 05:51, 2 January 2024 (UTC)Reply[reply]

It generally takes a lot more effort to create content than to delete it, so I'd apply strict scrutiny to any proposal to relax the criteria for PRODding. {{u|Sdkb}}talk 22:35, 2 January 2024 (UTC)Reply[reply]
To be clear, this would not change eligibility for PROD. It would only change eligibility for WP:SOFTDELETEing articles listed at AfD for a week (with all the associated notifications).
On a different note, I would also consider that the status quo is those 36+ articles (accounting for batch nominations) are "hard"-deleted. If someone subsequently finds sources, you either have to make a very convincing case to the deleting admin or spend a week's worth of editor-time at DRV. This proposal would make them all eligible for a REFUND. HouseBlastertalk 23:28, 2 January 2024 (UTC)Reply[reply]
Ah, thank you for clarifying, and best of luck formulating the proposal! {{u|Sdkb}}talk 23:31, 2 January 2024 (UTC)Reply[reply]

Notes

  1. ^ with the relist comment Not eligible for soft-deletion (due to contested prod back in 2006 (!) ...
  2. ^ a batch nomination of seven was relisted because one had been dePROD'd
  3. ^ kudos to User:FormalDude for finding sources
  4. ^ closed as redirect after the closer found an appropriate target
  5. ^ Okay, No Pants Day is awesome. I would say it is the exception that proves the rule.
  6. ^ by "controversial", I mean someone at some point in time expressed the idea that the article should exist

Notability reform[edit]

I have a new guideline/policy draft at Wikipedia:Article inclusion criteria, and would love to have some feedback on it. Thanks in advance! Ca talk to me! 09:06, 2 January 2024 (UTC)Reply[reply]

What is the problem that this proposal is meant to fix? 331dot (talk) 09:23, 2 January 2024 (UTC)Reply[reply]
I answered it in a below response. Ca talk to me! 12:06, 2 January 2024 (UTC)Reply[reply]
Any proposal that says Downranking all SNGs to essays will not achieve consensus. Curbon7 (talk) 09:55, 2 January 2024 (UTC)Reply[reply]
Agreed - it have been removed. Ca talk to me! 11:27, 2 January 2024 (UTC)Reply[reply]
I respect the ambition, but realistically any massive change to WP:N would have to have been prompted by some unprecedented event or shift in community sentiment. I don't think people are actually dissatisfied with how notability as a whole works, even if some individual SNGs remain contreversial. Mach61 (talk) 10:12, 2 January 2024 (UTC)Reply[reply]
My feedback is that it looks like you are throwing out all of WP:N and trying to start again from first principles. But why? Barnards.tar.gz (talk) 11:10, 2 January 2024 (UTC)Reply[reply]
My first problem is with length: all the notability guidelines(SNG and GNG) make up for a reading experience nearing typical novellas — WP:N alone contributes 4000 words of reading material. However, all those tens of thousands words of guidance are thrown into a wrench by WP:PAGEDECIDE. How our numerous SNGs interacts with GNGs are defined is lacking, and newcomers are just meant to figure it out themselves. Any attempt to formally define it will inevitably be met with series of no-consensuses. I believe that hints that the way we are defining notability right now is fundamentally flawed. My goal with the proposal is instead of trying to use importance as criterion for inclusion(an insurmountably subjective and unfeasible task), but just to use the pre-existing policies as guidance. Ca talk to me! 11:19, 2 January 2024 (UTC)Reply[reply]
Is "trying to use importance as criteria for inclusion" actually the current standard? WP:N goes to pains to distinguish notability from simple importance (except as reliable sources decide to cover it, which is the current N standard). DMacks (talk) 11:27, 2 January 2024 (UTC)Reply[reply]
That's what it says on the tin, but reading SNGs like WP:BIO and WP:NPROF clearly shows it's more of a importance criteria than anything. Even GNG proves to be an publicity indicator since it does not actually deal with article content. I don't know why we have all these guidelines when it could be replaced with "What is the best possible article that can be made?" Ca talk to me! 11:40, 2 January 2024 (UTC)Reply[reply]
If you feel that the criteria are not being properly applied, have you tried fixing that first before deciding that everything should be thrown out? 331dot (talk) 11:49, 2 January 2024 (UTC)Reply[reply]
I recognize I am in a minority position with this belief but I believe notability as a system is fundamentally flawed for reasons described as above.
I made an attempt to Wikipedia:Village_pump_(idea_lab)/Archive_53#Rewriting_WP:N_to_reflect_community_consensusstandardize SNG and GNG in the past, but it was clear that any wording put forward was will fail to gain enough consensus. Ca talk to me! 12:04, 2 January 2024 (UTC)Reply[reply]
So I'm just wondering what makes you think a broader proposal covering more ground will gain consensus when a narrower proposal didn't. 331dot (talk) 15:04, 3 January 2024 (UTC)Reply[reply]
I would be a fool to think such a radical change like this would gain consensus. I'm poking around with different proposals to gauge community sentiment with notability. Ca talk to me! 15:43, 3 January 2024 (UTC)Reply[reply]
The rationale behind notability is positively defined at WP:WHYN Mach61 (talk) 16:31, 2 January 2024 (UTC)Reply[reply]
I am not sure what point is being made here. WHYN only explains the reasoning behind GNG. Ca talk to me! 16:36, 2 January 2024 (UTC)Reply[reply]
@Ca, what do you make of the sentence at the end of WHYN that says "Because these requirements are based on major content policies, they apply to all articles, not solely articles justified under the general notability criteria"? It's been there since WHYN was created. WhatamIdoing (talk) 01:16, 10 January 2024 (UTC)Reply[reply]
So, practicality concerns aside, I want to engage with the philosophy of this, since that's really what's interesting and it's what you're looking for.
If I'm reading correctly. you see "notability", the term of art we have built up onsite, as fallacious: we claim "notability" to be something robust and objective, independent from "importance"—which is ultimately a subjective notion—but ultimately, "notability" just boils down to just being "importance" in many cases anyway. I agree with this.
However, I'm not really convinced it's a problem that can be solved, and I think your attempt goes a ways to explain why: you've just moved the problem back a step, offloading the subjectivity at the heart of an encyclopedia onto other terms of art: how do we know when we can establish WP:NPOV and WP:V—when we feel like the framing is neutral enough; when we feel like the claim is verifiable enough? Surely these can't be solved by statistical analysis, or whatever—at least I don't think so.
Since we're also axiomizing "what Wikipedia is not"—when does an article stop being an indiscriminate collection of information, or a dictionary? Those are informed by our present policy, and now they have no practical criteria whatsoever.
The consensus mechanism we have to fill in the gaps left by the flexibility of WP:N is ultimately powered by subjectivity, but I think someone here may need to win a Berggruen Prize before we can really tackle that problem. Remsense 09:01, 3 January 2024 (UTC)Reply[reply]
  • I've always been first in line to say that SNGs are a mess and should generally be ignored in favor of GNG, just like discretionary sanctions should be ignored in favor of simply not being a jerk. But the community is attached to their pet SNGs and there's almost zero chance of doing away with them. We got NPORN removed, and even on such an immensely niche topic, it was like pulling teeth. GMGtalk 12:47, 2 January 2024 (UTC)Reply[reply]
  • If I'm understanding correctly, the problem you're trying to address is that there are some articles that meet the notability threshold but nevertheless should not have an article because of one of the reasons at WP:PAGEDECIDE. I agree that that's a problem. My explanation for it would be that, because 95% of AfDs deal with notability rather than PAGEDECIDE, many inexperienced editors use the heuristic "notable → keep" and ignore PAGEDECIDE.
    That said, I don't think a new policy or guideline is the solution. We already have too many of those, and the impulse to replace them with a simplified version isn't going to succeed. PAGEDECIDE already has guideline status as part of the WP:Notability guideline page, so I'd instead encourage you to suggest changes to make it clearer, more easily invoked, and the notability guideline page as a whole simplified.
    Something our policy/guideline pages badly need overall is for someone good at plain English writing to go through them with no agenda except shortening/simplifying/clarifying them. It won't be an enviable task, as everything on every PAG page was added for some reason or another, so there will be a lot of discussion/pushback about how to simplify without losing meaning. But it'd really be a valuable service. {{u|Sdkb}}talk 23:46, 2 January 2024 (UTC)Reply[reply]
    The main thing is, I do not understand why we have the whole vaguely defined concept of notability when PAGEDECIDE supposedly trumps everything. Ca talk to me! 02:15, 3 January 2024 (UTC)Reply[reply]
    PAGEDECIDE doesn't trump WP:N. It says that there are times that while a topic may merit a stand-alone article that it can be shown to meet the GNG or an SNG, there are reasons not to have a stand-alone article (such as when the topic is better covered in concern with a larger topic or similar topics).
    Notability is a guideline and purposely vague because it is meant to encourage articles to start from a point that shows potential for growth so that the wiki-way can be used to expand. But as others have pointed out, this is not a one-size-fits-all approach, due to systematic bias from sources. Masem (t) 05:09, 3 January 2024 (UTC)Reply[reply]
  • If you were seriously proposing to dump GNG and its emphasis on hype and publicity and one-size-fits-all rules over importance, and try to push more subject-specific importance-based guidelines, I might be on board. This goes in exactly the wrong direction. We cannot possibly include articles on all topics about which reliable but local or routine sources have provided enough information to write start-class articles, which is what GNG pretends to do (but in practice doesn't). Instead, we need to use notability to filter out the truly unimportant topics. But because GNG does that based on publicity, it is inaccurate and easy to game. Cutting out all the nuance and making it be one size fits all can only worsen those problems, without solving any actual problem with current practice. —David Eppstein (talk) 04:43, 3 January 2024 (UTC)Reply[reply]
  • I'll echo the above comment: this seems to be a step in the opposite direction from what I could conceivably report. Moreover, it doesn't actually reduce the "subjective" aspect, just pushes it off to a different place. Who decides whether a biography is "negative", or whether all the sources are "marginally reliable", or what counts as "undue weight", or when an article is "unwieldy", or when related topics are "better appreciated" as separate pages, or when a topic is "controversial" instead of "mundane"? If the goal is to reduce the number of different policy/guideline pages, I say we go all out and synthesize WP:V, WP:NPOV, WP:NOR, WP:NOT, and WP:BLP into a single Wikipedia Rulebook. They're only separate pages due to historical accidents; if one were starting a wiki-based encyclopedia project now, with the benefit of Wikipedia's accumulated experience, one could cover the whole ethos in a single document instead of multiple pages that all talk about each other. XOR'easter (talk) 05:35, 3 January 2024 (UTC)Reply[reply]
    A merger of just two of those policies, V and NOR, failed to get support. Since then, we've had nearly seventeen years of inertia for those policies... Mach61 (talk) 05:41, 3 January 2024 (UTC)Reply[reply]
  • Tangential note: A few people have mentioned that it's hard understand notability guidelines due to their length and detail. A couple weeks ago, I began drafting User:Wracking/Notability with the goal of creating bullet-point summaries of each SNG, mainly for my own reference. If this is something anyone wants to collaborate on, please reach out. Wracking talk! 05:47, 3 January 2024 (UTC)Reply[reply]
  • I tend to agree with David and XOR'easter. If I got to rewrite Wikipedia's inclusion guidelines from scratch, I'd go for specific guidelines on specific subjects, based on the consensus of editors knowledgeable about those subjects, and drop the futile quest for a Grand Unified Theory of Notability. The idea that we can use a single standard to classify literally all of human knowledge into the boxes "notable" or "not notable" would sound like complete madness to the average non-Wikipedian. And then try telling them that we think we can do so with just five bulletpoints... – Joe (talk) 09:29, 3 January 2024 (UTC)Reply[reply]
    Yeah, what he said, and also what they said. jp×g🗯️ 11:06, 3 January 2024 (UTC)Reply[reply]
    @Joe, in fairness, the idea that we're going to have an encyclopedia that anyone can write without even needing so much as a user account or email, that also sounds like complete madness. So not sure that "sounds like complete madness" is a strong argument against anything on Wikipedia :-) Levivich (talk) 20:57, 7 January 2024 (UTC)Reply[reply]
  • I actually think that WP:GNG is a good Grand Unified Theory of Notability, since it ties back into core policies with the theory "So, can this topic ever make a core policy-compliant Wikipedia article?". Most alternatives tend to lean into "Someone finds this important" which is a lot more subjective and tends to invite both mass stubs and snobbery. Jo-Jo Eumerus (talk) 13:37, 3 January 2024 (UTC)Reply[reply]
    I know this comment was made last week, but it seems as good a place as any for me to add my piece of the elephant, which is that in my view the GNG works terribly as a Grand Unified Theory because it applies a pseudo-objective threshold to sets of sources and topics that are in themselves quite heterogenous. But an inflexibly-applied, source-based standard will always lead to the overrepresentation of privileged groups in biographies (like white Europeans) because they are overrepresented in reliable sources. And topics like sub-national geographies of human settlement, or national elections, will be treated inconsistently not only because of actual differences in the "reality" of each geographical or social context but even in cases that are objectively similar, simply because sources are easier to find in some instances than in others. Some periodic elections in a specified polity will meet GNG and others in the same polity will not. Some sub-national territories of a Westphalian state will meet GNG and others at the same level of the same state will not. This does not serve an encyclopaedia.
    In my view, one goal of an encyclopaedia is to treat equivalent topics in equivalent ways, and while reliable sourcing is always required, the myth that there is a specifiable level of sourcing that would work for all topics is actively detrimental to the project of building an encyclopaedia of knowledge by and for human beings. Newimpartial (talk) 02:50, 10 January 2024 (UTC)Reply[reply]
    Problem is, "white Europeans are overrepresented" in practice means "not-'white Europeans' are underrepresented". It's a subtle distinction but it does matter: If a topic is underrepresented, you can run into serious difficulties in accurately depicting it. I don't write biographies but I've seen the same problem of a non-Western topic having much fewer sources to use in my area of editing (volcanoes, natural phenomena). Sometimes you get lucky as I did with African humid period and there is enough coverage, but other times you have Nuevo Mundo volcano with much fewer sources than the comparable Crater Flat. Jo-Jo Eumerus (talk) 08:28, 10 January 2024 (UTC)Reply[reply]
    Obviously Wikipedia cannot publish ahead of the reliable sources. But Wikipedia can publish closer to the extent of RS coverage in areas of underrepresentation while being more restrained in publishing in areas of overrepresentation. Grand Unified Theory approaches to GNG operate in the direction opposite to encyclopaedic coverage in this respect, IMO. And this is a great example also of what an encyclopaedia for humans, by humans, can do better than neural networks can. I think the project should do that. Newimpartial (talk) 11:57, 10 January 2024 (UTC)Reply[reply]
  • Counterproposal: rename "Notability". If we're really going to rework the inclusion criteria for an encyclopaedia article here, let's do away with the confusing term "notability" and call it what it is. Right now it's something close to alreadypublishedaboutness (catchy, I know), but if we're going to redo the inclusion criteria, we could either rename the larger body of policy after what consensus agrees on the fundamental criterion is, or just call it inclusion criteria. Folly Mox (talk) 14:16, 3 January 2024 (UTC)Reply[reply]
    That is indeed the title of Ca's original idea here. – Joe (talk) 15:10, 3 January 2024 (UTC)Reply[reply]
    Support this idea much more than dismantling GNG. 🌺 Cremastra (talk) 00:59, 7 January 2024 (UTC)Reply[reply]
    While I'm not convinced this is necessary per se, I'm just going to vomit a few potential terms: the crime of "notability" is arguably the "-ability". However, the term should have as little lexical overlap with "verifiability" as possible.
    How about substantiation, attestation, recognition, corroboration, representation, precedence?
    No, I don't think any of these work: I think "notability" might be the closest, best English word to use for this concept, so that the greatest number of people understand its usage as easily as possible.
    I still think it's likely we just have to live with the subjectivity at the heart of "encyclopedias" as a concept. It's not like anyone else has figured this out! Remsense 01:33, 7 January 2024 (UTC)Reply[reply]
    "Inclusion criteria" is the way to go. One fundamental flaw in "notability" is it suggests a property of the subject that we as editors are discovering (something is notable or not notable and it's up to us to figure out which). In fact, "notability" isn't a property inherent in any subject, it's a decision editors make (we don't discover or learn if a subject is notable, we decide whether subjects are notable or not). "Inclusion criteria" has the advantage of being clear that it's a set of rules made up by editors for the purpose of deciding what topics should be covered--and not some inherent property, or something having to do with the inherent value of topics. Levivich (talk) 21:01, 7 January 2024 (UTC)Reply[reply]
    Oh and the inclusion criteria should be "enough reliable independent secondary sourcing to write an accurate and complete tertiary encyclopedia article" which is what GNG already tries to get at. Levivich (talk) 21:04, 7 January 2024 (UTC)Reply[reply]
    Agree with this. This is why some SNGs, and de facto notability, are bad, because sometimes there isn't enough information out there to write an encyclopedic article. 🌺 Cremastra (talk) 21:14, 7 January 2024 (UTC)Reply[reply]
    WP:NOPAGE literally exists, so this comment makes no sense. Curbon7 (talk) 08:48, 8 January 2024 (UTC)Reply[reply]
    "Standards of inclusion" is the term I suggested once upon a time, in one of the longest discussions on renaming the notability page. One of the reasons was indeed that it emphasizes that it's a Wikipedia standard, not an inherent characteristic or externally defined property. These days I usually use the more clunkier "standards for having an article", due to Wikipedia:What Wikipedia is not, as it is another standard of inclusion based on scope that is typically evaluated independently of the guidance on the notability page. isaacl (talk) 22:34, 7 January 2024 (UTC)Reply[reply]
    I've long said that we should rename it to something that refers to our criteria or requirements for having a Wikipedia:Separate, stand-alone article. Creating redirects for your favorite titles could be a way to identify possible future titles. WhatamIdoing (talk) 01:33, 10 January 2024 (UTC)Reply[reply]
I have to agree with Remsense on most points. While some kind of notability reform is needed, this is not the best way to go about it. Currently, apart from some less stringent SNGs, all articles have to meet GNG. This keeps out of one-reference sub-stubs that are better suited to wiktionary or wikidata. If we remove GNG, then our only rationale for deleting unhelpful articles that wouldn't be notable under GNG is likely a combination of WP:NOTDIC, WP:NOTDB, WP:INDISCRIMINATE, and WP:5P1.
The problem is that all of those are mostly or entirely subjective. Who decides what qualifies as an "indiscriminate collection of information", or what "encyclopedic" really means, when you get right down to it? We already have these disputes (case in point: the Barbenheimer RfC), but if they became commonplace at Articles for Deletion, matters would get worse.
We shouldn't rely on subjective measures. We already do, to a degree (how much coverage is "significant" coverage? How reliable is this source, really?) but implementing such a proposal, and accordingly dismantling the GNG, would intensify existing disputes.
The GNG is not great. But it works, and it's quantifiable, at least more so than allusions to WP:NOT. You need multiple sources, three is recommended, and they have to be reliable and secondary.
I think we should have more, specific, SNGs, that are objective and easily quantifiable. Notability isn't a yes-no question, but if we have more subject-specific notability guidelines, then we can be more accurate. Vaguer standards aren't helpful, because vagueness invariably leads to disputes. 🌺 Cremastra (talk) 18:26, 7 January 2024 (UTC)Reply[reply]
I asked (again) about defining SIGCOV last year. The answers can be divided this way:
  • [the minority view] We have found and cited at least one source that uses words like 'significant' or 'important' to describe the subject. For these few editors, two short sentences are good enough, so long as one of the sentences says something like "He's the most significant Ruritanian player during the last decade". They're wrong, of course, but it suggests that @Ca is on the right track with the idea of editors wanting to screen out subjects they believe are unimportant.
  • [the common view among experienced editors] We have found sources that contain enough facts to write an article, as measured, e.g., by word counts or by information contained. However, editors are afraid of saying how much is enough, because as soon as they say "a subject is notable if it gets a total of 2,908 words spread across four news articles in a newspaper on the top-20 list in the List of newspapers by circulation, counting only one article from each year and at least one of the articles has to contain two photos, plus evidence that the subject was mentioned in a still-extant source on social media by a person who won an Olympic medal", then some horrible spammer is going to destroy Wikipedia by getting their stupid little unworthy subject covered in depth by all of these sources, and then where will we be? Plus, if I say that we require 2,500 words in sources, then that might keep out the stuff that I want to write about. If we leave it vague, I can say that my subject only needs Wikipedia:One hundred words in sources, but your subject needs thousands, and unless you're watching my every word, you'll never notice what a hypocrite I am.
We did not come to an overally conclusion, but I think that
  • we basically agreed that sources containing information that is useless in practice ("I heard a funny joke this morning...") plus content that is potentially useful in an article ("Paul Politician's birthday is on the 32nd of Octember") should only have the potentially useful parts counted towards any sort of word/fact count, and
  • we inched towards the idea that a couple hundred words of potentially useful content is SIGCOV (i.e., not just a couple of sentences, but also not requiring whole books, or even whole chapters).
WhatamIdoing (talk) 01:54, 10 January 2024 (UTC)Reply[reply]
then some horrible spammer is going to destroy Wikipedia by getting their stupid little unworthy subject covered in depth by all of these sources, and then where will we be? Plus, if I say that we require 2,500 words in sources, then that might keep out the stuff that I want to write about. If we leave it vague, I can say that my subject only needs Wikipedia:One hundred words in sources, but your subject needs thousands, and unless you're watching my every word, you'll never notice what a hypocrite I am. Incredible. This describes me exactly. 🌺 Cremastra (talk) 02:15, 10 January 2024 (UTC)Reply[reply]
This is what I'm getting at when I gesture to the subjectivity at the heart of an encyclopedia. The encyclopedia genre is synthetic, no one has ever figured out what "should" be in one. It would be very difficult to stomach an overturning of consensus that has existed the entire length of the site, that the presence of some classes of article is a serious net negative to the site, even if it means being arbitrary and potentially keeping some good articles from being created. I believe this very strongly.
I don't want to edit on a site that looks like Urban Dictionary, Reddit, or the oldest editions of the Brittanica in spots: it feels bad, incurious, lacking united understanding in what we're doing here. It's cultural and subjective, but I'm okay with that. Remsense 02:33, 10 January 2024 (UTC)Reply[reply]
@Cremastra, I think it describes all of us, when we feel like the stakes are high enough. WhatamIdoing (talk) 02:36, 10 January 2024 (UTC)Reply[reply]

WP:Notability is a big vague confusing mess but it mostly works. IMO the way that it really works is that it combines 3 attributes:

  1. Sourcing criteria which ostensibly is the only criteria. This is also used as a measuring stick for #2
  2. Real world importance/notability
  3. Degree of enclyclopedicness .....degree of compliance with Wp:not, above the floor of outright rejection under Wp:not

If we ever want to tidy up wp:notability, we're going to need to acknowledge this as a starting point. North8000 (talk) 14:29, 3 January 2024 (UTC)Reply[reply]

I agree with this. There is a balance between coverage (how often and how in-depth a subject is featured in independent sources) and the importance of the subject. There ought to be or there is more flexibility of sourcing needed for individuals who are at the top of their profession (whether in academics, sport, politics, business) compared with individuals who are active locally, in minor or secondary leagues, or non-executive positions. This is why the SNGs are useful - to help make determinations of real world importance. - Enos733 (talk) 18:23, 3 January 2024 (UTC)Reply[reply]
For example, if two published high school newspapers did lengthy in depth coverage of guitar player John Smith, that fully satisfies GNG but the system might not let that one pass. If the same two writeups were in Rolling Stone magazine, the system would certainly pass him. So the prominence of the sources (combined with the space they dedicated) matters for assessing #2, and #2 matters.
Another example: A town of 1,000 people with no sources other than a couple which (merely) reliably establish it's existence. The system is going to let that one be an article. Some will say it's because "GNG sources are likely to exist" but in reality it's because it's an ultra-enclyclopedic topic. Because it passes wp:not by a mile, and is also mentioned in 5P. North8000 (talk) 21:53, 3 January 2024 (UTC)Reply[reply]
Agreed. When we let the sports SNGs die, we inadvertently opened the door to a lot of minor league baseball players, because minor league baseball necessarily receives a bunch of local/routine coverage which looks like or could be GNG even if the player never comes close to making the major leagues, which was functionally necessary to enter a print baseball encyclopaedia. I'm not generally a fan of SNGs, but the ones that exclude rather than include can be very helpful. SportingFlyer T·C 07:01, 4 January 2024 (UTC)Reply[reply]
I knew at the time that that fix was only going to be 1/2 of a fix. In the "grand wp:notability unification" that I have in my head, it
  • Acknowledges that real world notability/importance is a factor and the coverage is a measuring stick for that as well
  • Calibrates for the ratio of coverage to real world notability in that field. Since in sports coverage is an end/product of itself and so less of / a weaker indicator, coverage in this area is less meaningful and it adjusts for that
  • Calibrate for degree of enclyclopedicness. A typical sports artticl is a bit lower here than a typical enclyclopedia article and it adjusts for that
The net result would be that the standard would be a bit tougher for sports than it currently is. North8000 (talk) 18:14, 4 January 2024 (UTC)Reply[reply]

Adding searching to the nearby page[edit]

Hello, Not sure is this is the correct place to put this, but is it possible to add coordinate or location searching to the nearby page, to allow for location permissions to not have to be granted? Thanks, Geardona (talk) 12:49, 4 January 2024 (UTC)Reply[reply]

Geardona, this can be done manually with the search keywords neartitle and nearcoord. See :mw:Help:CirrusSearch § Geo Search for documentation. Folly Mox (talk) 12:58, 4 January 2024 (UTC)Reply[reply]
Geardona you might want to see Wikipedia:Village_pump_(technical)/Archive_113#Passing_a_location_to_Special:Nearby? Sungodtemple (talkcontribs) 13:01, 4 January 2024 (UTC)Reply[reply]
Ok, did not realise that was a feature, maybe add a search bar to the page itself for more user-friendliness? Geardona (talk) 13:03, 4 January 2024 (UTC)Reply[reply]

Auto-confirmed[edit]

Hi. I’ve realized that it’s insanely easy to get auto-confirmed status… and I thought I had to use articles for creation forever. Would it be a good idea to make it more difficult? Say 50 edits, like on es.wp, or more time editing; one month, maybe? Encyclopédisme (talk) 14:32, 4 January 2024 (UTC)Reply[reply]

@Encyclopédisme
What do you mean that it's easy to get auto-confirmed status ? I've been writing for years now and I still have not had my username confirmed.
Боки Write to me! 14:38, 4 January 2024 (UTC)Reply[reply]
I mean being able, to say, move pages, create pages etc. You need 10 edits and a 4 day old account. That is auto-confirmed. Encyclopédisme (talk) 14:39, 4 January 2024 (UTC)Reply[reply]
@Encyclopédisme Sorry, I mixed it up with auto-patrolled :) My bad !
Боки Write to me! 17:29, 4 January 2024 (UTC)Reply[reply]
Despite the name, the autopatrolled flag is only handed out manually. Some accounts are marked as autopatrolled fairly quickly; others can be active for many years and create thousands of pages without it. Certes (talk) 20:30, 4 January 2024 (UTC)Reply[reply]
@Certes I am one of those in the second group :) The funniest part is that I am interface administrator of Serbian Wikipedia, wrote over 800 articles there yet somehow English Wikipedia needs me to show more values.
Боки Write to me! 22:25, 4 January 2024 (UTC)Reply[reply]
Please don't take it personally. I've created thousands of pages over the last 16 years and am not autopatrolled. The flag is simply a convenience for patrollers, and doesn't allow the account to do anything it couldn't do anyway. Certes (talk) 22:44, 4 January 2024 (UTC)Reply[reply]
@Боки: You have been autoconfirmed since 03:20, 14 July 2020 (UTC). --Redrose64 🦌 (talk) 22:17, 4 January 2024 (UTC)Reply[reply]
@Redrose64 Yeah, I just realized that I have misread the auto confirmed vs auto patrolled :)
Боки Write to me! 22:26, 4 January 2024 (UTC)Reply[reply]
The section refers to autoconfirmed status, which is handed out automatically on the account's tenth edit or four days after registering (whichever is later). That link should show a box top left saying "Your account is autoconfirmed" if you are logged in to an account that is not very new. Certes (talk) 14:46, 4 January 2024 (UTC)Reply[reply]
No any thoughts? Would it be possible? Encyclopédisme (talk) 15:43, 4 January 2024 (UTC)Reply[reply]
I personally would keep the WP:AFC route, until an AFC reviewer recommends the article author directly publishes articles. Having multiple eyes is an asset, not a detriment. I wish sometimes as a niche publisher that more people would review my articles. I say that as someone who is WP:AUTOPATROLLED. But making space for newer article contributors is in the interest of the wider encyclopedia. ~ 🦝 Shushugah (he/him • talk) 17:50, 4 January 2024 (UTC)Reply[reply]
I started creating 2 articles already. One of them was reviewed, the other already edited by other editors. The problem is indeed that niche subjects are widely overlooked, and due to the small audience, often state outright false info (Specifically I created articles about the Inca, also widely touched by this), based on old sources, or works of vulgarisation which don’t correspond exactly with the academic consensus. Encyclopédisme (talk) 17:56, 4 January 2024 (UTC)Reply[reply]
If you're writing about the Inca, I'd look around for editors like User:Smallchief or User:Greenman, since they have both contributed significantly to Inca Empire in the past. WhatamIdoing (talk) 02:04, 10 January 2024 (UTC)Reply[reply]

Mass patrolling[edit]

Hi everyone,

I was just curious if there was any discussion earlier, as I was not able to find it in the archives. If not, is it possible to have mass patrolling done? This could be helpful when dealing with multiple edits, where a user has made minor changes such as adding a specific number or other minor details. Instead of going into each and every single one of the edits, is there any way that mass patrol can be implemented, allowing us to check and approve certain unpatrolled edits more efficiently?

Thanks!

Боки Write to me! 14:37, 4 January 2024 (UTC)Reply[reply]

@Боки: We don't have edit patrolling enabled on English Wikipedia. Only new pages are patrolled, not individual edits. 🌺 Cremastra (talk) 01:01, 7 January 2024 (UTC)Reply[reply]
@Cremastra If I may ask, why not ? How do you manage the information that gets posted on the Wikipedia pages then ? People can just post anything and everything. There has to be a way that this gets managed. Боки Write to me! 15:02, 8 January 2024 (UTC)Reply[reply]
@Боки: That's a good question, but I don't really know the answer. Many users informally patrol RecentChanges to watch for vandalism, myself included. We check our watchlists, and keep an eye on worrisome editors. Things seem to generally tick along fine. 🌺 Cremastra (talk) 20:46, 8 January 2024 (UTC)Reply[reply]
@Cremastra What about the fact if someone makes a mistake or puts some incorrect information ? How do users here correct it ? They redo it or do they just revert the edit ? Боки Write to me! 20:59, 8 January 2024 (UTC)Reply[reply]
Yeah, someone would generally fix the problem or just revert the edit. There has been discussion of enabling edit reviewing lately, but I believe the idea was shot down. I think, in practice, edits are generally reviewed at some time or another, there's just not a special person clicking a "review" button. The process is unofficial and informal. It seems to (mostly) work. 🌺 Cremastra (talk) 21:01, 8 January 2024 (UTC)Reply[reply]
@Cremastra The reason why I am asking is because at Serbian Wikipedia (with a lot less edits, mind you) we have bunch of reviewers (including myself) who review edits of non-auto-patrolled users which brings me to the next point, how does person here on English Wikipedia become auto-patrolled ? Боки Write to me! 21:04, 8 January 2024 (UTC)Reply[reply]
The auto-patrolled right (where your articles are patrolled automatically) is granted by an admin through a formal request process. See WP:PERM/AP. Cheers, 🌺 Cremastra (talk) 22:12, 8 January 2024 (UTC)Reply[reply]
@Cremastra I will definitely work towards that considering I am an interface admin of Serbian Wiki. My only concern is with this amount of edits, does it not "ruin" the reputation of article if someone can easily add something to the article without anyone noticing it for a while ? Боки Write to me! 00:26, 9 January 2024 (UTC)Reply[reply]
does it not "ruin" the reputation of article if someone can easily add something to the article without anyone noticing it for a while Someone easily adding something to an article is how Wikipedia works. 🌺 Cremastra (talk) 00:31, 9 January 2024 (UTC)Reply[reply]
@Cremastra I know but in this occassion, I am referring to person adding, for example, "... and this woman has been involved with my dad" (literally) as part of the article. If this does not get patrolled or checked, then this goes on the article that someone will read and say what is going on. Боки Write to me! 08:45, 9 January 2024 (UTC)Reply[reply]
Well, to a degree that would get checked. I patrol recentchanges fairly regularly, and if I saw that edit, I would revert it. 🌺 Cremastra (talk) 13:11, 9 January 2024 (UTC)Reply[reply]
@Боки, I think you might be unclear on the purpose of auto-patrolled. Most new articles are reviewed by a team of editors, the new pages patrol. When an editor has created a lot of acceptable articles, they can be assigned "auto-patrolled" so the reviewing editors have more time to concentrate on other articles. It makes no difference in editing abilities or rights for the editor who has auto-patrolled. Schazjmd (talk) 00:32, 9 January 2024 (UTC)Reply[reply]
@Schazjmd on Serbian Wikipedia, auto-patrolled means we, as patrollers, do not have to check your edits (whether it's new page or just a simple edit on something) any more and you have gained trust that you will not make meaningless edits and that you know what you are doing on Wikipedia. That's what my definition of auto-patrolled is and that's what I am referring to. Боки Write to me! 08:47, 9 January 2024 (UTC)Reply[reply]
The scale of editors on enwp, as well as automated anti vandalism tool leaves good faith but non-constructive edits. And it generally works on enwp. Incorrect Source verification is probably hardest challenge we have ~ 🦝 Shushugah (he/him • talk) 09:02, 9 January 2024 (UTC)Reply[reply]
@Боки: With a description like that, I think that you are referring to pending changes review. That is not the same as autopatrolled, which concerns only the creation of new articles. --Redrose64 🌹 (talk) 15:22, 9 January 2024 (UTC)Reply[reply]

A Wikipedia journal[edit]

There's a lot of research about Wikipedia, but it tends to be from an 'outsider' perspective: computer scientists and computational social scientists that are interested in Wikipedia because it's a huge, open dataset; critiques of our content, or lack thereof, in specific fields; or, increasingly, experiments in replacing some or all of our work with algorithms. All very interesting and valuable, but what I'd really like to read is more studies of Wikipedia in its own terms. Things like the histories of specific policies, analyses of how processes work, biographies of prominent editors. Research like that does exist (e.g. the WMF's Wikipedia @ 20 edited volume springs to mind), but it's scattered around and hard to find.

If the Wikipedia community was a conventional collective organisation, a scholarly society or a trade union or something, it'd probably already have its own little periodical for that kind of thing. Something like The Signpost, but with bibliographic references, peer review, etc. Written and read primarily be people who are involved in, or at least have a deep knowledge of, the community. It could be hosted on-wiki like the Signpost or, perhaps better for discoverability, somewhere else, as long as it has that rooting in the community. Would anybody else be interested in something like that? – Joe (talk) 08:56, 5 January 2024 (UTC)Reply[reply]

Well, I once wrote something that might fit there. XOR'easter (talk) 18:07, 6 January 2024 (UTC)Reply[reply]
Well as you know I don't really agree with what you wrote there. But certainly that could be one role of a journal like this: counter-critiques to academic critiques of Wikipedia are unfortunately not going to be taken as seriously when they're published on Wikipedia itself. – Joe (talk) 13:59, 8 January 2024 (UTC)Reply[reply]
A nice idea! A major issue is the unpaid aspect of it. On other hand, if an academic is being paid, they can push for open-access, open-data etc.. which is what a lot of meta:Wikiresearch is. I also think about wikinews:Main and the success/challenges it faces ~ 🦝 Shushugah (he/him • talk) 01:11, 7 January 2024 (UTC)Reply[reply]
@Shushugah: Thanks. I'm not sure I follow though, who is(n't) being paid? – Joe (talk) 13:53, 8 January 2024 (UTC)Reply[reply]
Not quite what you're describing, but there are the WikiJournals. The idea there is more about getting wiki contributions to "count for something" by sending them through peer review and formatting them as journal pieces. There was the Wiki Studies Journal which involved several Wikipedians, but it doesn't appear to still be going. Heather Ford kicked off Wikihistories fairly recently -- not sure where that's headed.
Back to your thought, though, it would certainly be interesting. I'd be curious how much enthusiasm there is. I've seen a lot of valuable research projects undertaken by volunteers that would benefit from being cleaned up and formally "published". It may also be useful to provide a forum to publish literature reviews or to critique existing research. — Rhododendrites talk \\ 14:26, 7 January 2024 (UTC)Reply[reply]
My thinking exactly. This is the kind of thing people do already, and especially for users that are also in academia, or plan to be, it would be nice to be able to collect formal citations and credit for it.
Level of interest is the key. If the Wiki Studies Journal was a similar and failed, then it'd be good to know what went wrong. Otherwise, I was thinking of trying to put together an initial issue of invited contributions. If we couldn't find enough contributors, then we have our answer. – Joe (talk) 13:57, 8 January 2024 (UTC)Reply[reply]
Don't think that's a bad idea. A similar organization—the Organization for Transformative Works (which operates the fandom web archive Archive of Our Own) operates its own peer-reviewed academic journal like this. ~ F4U (talkthey/it) 19:16, 7 January 2024 (UTC)Reply[reply]
HaeB knows a lot about research on Wikipedia. I'm not sure that it's true that a lot of it is done by outsiders. WhatamIdoing (talk) 02:07, 10 January 2024 (UTC)Reply[reply]

Bibliography articles[edit]

We have a number of articles titled 'Bibliography of X'/'X bibliography'. Sometimes these are lists of works by a subject, eg Virginia Woolf bibliography. Sometimes they are lists of works about a subject, eg Bibliography of Andrew Jackson. Sometimes they're both, eg Harold Pinter bibliography. Is "both" a desired approach? For example, if I wanted to split out some of the massive bibliography at Virginia Woolf, would I add it to the existing Virginia Woolf bibliography or would I create a new article? And if the latter, what would that be called to distinguish it from the existing article? Nikkimaria (talk) 21:06, 7 January 2024 (UTC)Reply[reply]

That massive bibliography at the Virginia Wolfe article isn't just a bibliography, it is part of the references. The article uses shortened footnotes, so each of those sources is the target of a hyperlink from the short footnotes in the references section. So they can't be moved to another article. Since the term "Bibliography" is ambiguous I would rather articles used the terms Citations / References for the two sections rather than References / Bibliography.
This doesn't answer your question, however. StarryGrandma (talk) 10:19, 8 January 2024 (UTC)Reply[reply]
Many of the works listed at Virginia Woolf#Bibliography are in fact not referred to by any of the shortened footnotes: more than eighty of them, at a quick count. A script like User:Trappist the monk/HarvErrors marks these.
To answer Nikkimaria's question, the only comparative example I can immediately find is Winston Churchill, which has Bibliography of Winston Churchill for works about Churchill, and Winston Churchill as writer for works by him. Caeciliusinhorto (talk) 20:55, 8 January 2024 (UTC)Reply[reply]
Yep, wouldn't be looking at removing any of the sources actually cited, just some of the ones that aren't. Thanks for the example, that's helpful - anyone have thoughts on what the best titling approach would be for these different types of bibliographies? Nikkimaria (talk) 00:04, 9 January 2024 (UTC)Reply[reply]
Post on an article's Talk page when you remove/move anything from an article's improperly-intermingled Bibliography + Works Cited + References section -- that is, so that anything removed was not from the article's history from a state prior to material being cited in-line, such that any residual material failing in-line verification may be more easily cross-referenced to the article's history. (To those to whom this paragraph seems like nonsensical word salad: you know it when you see it -- see major history articles.) SamuelRiv (talk) 00:30, 16 January 2024 (UTC)Reply[reply]

I think spoken versions of articles have some of the most potential for improvement of any area of the site. Of course, the existing paradigm has an obvious central issue: recordings become out of date almost immediately, which dissuades both potential narrators and listeners. I've thought a bit about this, and I have a preliminary idea for a format that could at least exist alongside the existing spoken articles: abridged spoken sections. Especially on good or featured articles, it seems like sections could be excerpted, possibly adapted to be better read aloud (adapted to a "podcast" form, if you like), and then those could be recorded. Because they are their own text—which would also exist as a readable transcript, of course—they wouldn't immediately go out of date, while reflecting both the work put into the accompanying article and the needs of listeners. Remsense 04:47, 8 January 2024 (UTC)Reply[reply]

Why wouldn't these excerpts also be prone to going out of date?
I'd be interested to know how many people prefer listening to an out of date version of an article, versus having a screen reader read the up-to-date version.
I also wonder if effort could be focused on marking up difficult passages to assist screen readers in some way. Barnards.tar.gz (talk) 08:58, 8 January 2024 (UTC)Reply[reply]
Because they would have their own transcript that may be edited to have particular suitedness to being read aloud—they would only meaningfully become out of date if the substance of the part of the article that was abridged changes, not just minor changes in wording or sentence reshuffling.
I think screen readers are the other major reason articles aren't read anymore, but I think—albeit as someone who uses screen readers but does not require them to read—that they're just not as nice a lot of the time? Sure, people can set screen readers to a blistering pace they're still comfortable with, but they still produce errors and best-fit algorithmic awkwardness. There's plenty to explore in a "podcast" presentation to achieve what screen readers cannot. Perhaps the format can diverge even further—during a discussion I was having a few days ago, the possibility of writing for/recording a dialogue format came up, and I think that has potential. Remsense 20:57, 8 January 2024 (UTC)Reply[reply]
I think the best way to solve the outdating issue would be to create a clickable tool or function that would use something like AI or computer speech that would be in-built in Wikipedia that can read the text in all articles exactly as they currently stand. Helper201 (talk) 21:50, 8 January 2024 (UTC)Reply[reply]
As already mentioned, many people already use screen readers that are highly customizable by each individual user: we are discussing a potential form of spoken article that would also be less redundant in the age of screen readers. Remsense 21:53, 8 January 2024 (UTC)Reply[reply]
Screen readers have existed longer than Wikipedia has. They've probably become a bit more mainstream though, with VoiceOver and Google TalkBack being pre-installed on smartphones. As a screen reader user, I'm very text-oriented so I almost never use Spoken Wikipedia and would almost never use spoken excerpts either. I don't think many proficient screen reader users would. Graham87 (talk) 06:08, 9 January 2024 (UTC)Reply[reply]
Thank you very much for your insight. This may seem like an off-topic question, but what about podcasts? Are they perceived as too slow or inferior to (hypothetically) equivalent passages from books using a screen reader as well? If not, what advantages do they have? Are there any advantages for you personally to have something narrated by a person as opposed to a screen reader, or are the disadvantages simply too great? Remsense 06:15, 9 January 2024 (UTC)Reply[reply]
@Remsense: Sorry, just got this. Lots of blind people like them; several use audio players to speed them up. I don't listen to them often though but I'm probably an outlier that way. If both a podcast and a transcript were available, I personally would only listen to the podcast if I wanted to find out what a person's voice was like ... or if the production had sound effects, etc. that couldn't be conveyed through a transcript. As I said I'm a bit more extreme than most blind people this way though. Graham87 (talk) 12:21, 11 January 2024 (UTC)Reply[reply]
I just remembered Wikipedia:Village pump (proposals)/Archive 186#Spoken narrations of the blurbs at Today's featured article (TFA), which has some similarities to this section. Graham87 (talk) 12:27, 11 January 2024 (UTC)Reply[reply]
Pinging @Isaidnoway:, who also uses a screen reader and expressed the polar opposite view from mine in that discussion. Graham87 (talk) 12:30, 11 January 2024 (UTC)Reply[reply]

Third RfC on Vector 2022[edit]

Æo (talk) 20:56, 9 January 2024 (UTC)Reply[reply]

Oh god, why? It's not even the third RfC, it's the eleventh, and when you guys floated this at VPI last you got a resounding please don't. – Joe (talk) 08:49, 10 January 2024 (UTC)Reply[reply]
It's a survey with actual voting. A place to dump all your complaints and wants. Maybe something with more focus will come out of it but I wouldn't want to be the one who had to summarise it. Doug Weller talk 16:59, 10 January 2024 (UTC)Reply[reply]

Workshop: draftifying[edit]

Firstly, Jimbo Wales agrees with me. Well, not with me directly. But with the gist of this argument, and the argument behind unreferenced-PROD. He wrote in 2006:

I really want to encourage a much stronger culture which says: it is better to have no information, than to have information like this, with no sources. Any editor who removes such things, and refuses to allow it back without an actual and appropriate source, should be the recipient of a barnstar.

[1]

Anyways...

As a New Page Patroller, I frequently draftify unsourced articles.

Not unfrequently, the creator of the article moves the draft back to mainspace, or re-creates it with the same content. The topic is frequently fringe, difficult to verify, but not necessarily PRODable or AfD'able.

What to do? There's an unsourced "article" in mainspace. It should be in draftspace, where it should be improved by the creator. It is unfit for mainspace. As one of my favourite essays points out, unsourced content is essentially digital graffitti and should be removed. The WP:BURDEN is on the creator to add references to their claims.

It isn't 2005 anymore. We shouldn't have new unsourced articles being created. They do get created, but are usually PRODed or draftified by NPPers.

Per WP:DRAFTIFY, we aren't allowed to re-draftify an article. Because of this clause, draftifying is essentially useless. All the creator has to do is move it back.

An analogy (or possibly a parable):

Someone dumps a pile of garbage on the sidewalk. There might be some re-usable or recyclable items in there, but it's hard to tell. Per municipal policy, a street cleaner takes it to the waste-dumper's house. It's their garbage.
Instead of throwing it out normally, or sorting out re-usable or recyclable stuff, the waste-dumper takes their garbage out of the facility and puts it right back onto the street. The street cleaner finds it again. Municipal policy states that the cleaner should either sort through it themself or ignore it. Once they have finished sorting, they should keep the recyclable items and take the rest to a waste-management facility, where they will have to write a report detailing why they think the garbage should by destroyed. The waste management facility is called AFD.
This is clearly nonsense. Why should the street cleaner have to sort through someone else's garbage?

I would like to propose disallowing draftified articles being moved back to mainspace if the problem for which the "article" was draftified has not been fixed. Let the street cleaner take the garbage back the waste-dumper's house. 🌺 Cremastra (talk) 15:34, 13 January 2024 (UTC)Reply[reply]

Notified: WT:NPP, WT:Draft, WT:AFD. ~~~~
🌺 Cremastra (talk) 15:42, 13 January 2024 (UTC)Reply[reply]

References

  1. ^ Wales, Jimmy (2006-07-19). "insist on sources". WikiEN-l. Retrieved 2007-01-31.
  • A reminder for those who don't notice the brightly-coloured editnotice: this is the idea lab, so no !votes. 🌺 Cremastra (talk) 15:35, 13 January 2024 (UTC)Reply[reply]
  • I think we discussed this before on Discord, and the main highlighted point was: 'Users have the right to object to draftification and can move it back to the article space.' But it's good to see you here, searching for some additional ideas. – DreamRimmer (talk) 16:17, 13 January 2024 (UTC)Reply[reply]
  • When I check my Draftify log, most articles have been improved & returned to Mainspace. The major challange is the PROD/AfD process. In May, 2023 I setup my PROD/AfD subpage to track articles. I am totally Ok with reverts along with a reasonable explanation. It's a problem for un-explained reverts, and "junk/incomplete" articles remaining in mainspace. And I understand the goal is article improvements. Thanks for this discussion. Regards, JoeNMLC (talk) 16:44, 13 January 2024 (UTC)Reply[reply]
    Looking at one's draftify log is a good idea. Looking at my own, from August 2023, when I only draftified 8 articles:
    • One was redirected, after some confusing duplicated drafts/AfC-dodging and this related AfD.
    • Five remain in draftspace. Two of those drafts have been deleted because they were created by a sockpuppet.
    • One has been moved back to mainspace after being improved in draftspace, and looks pretty good.
    • One was re-created, unreferenced, in mainspace. It was unsuccessfully PRODed by a different user in October 2023; it has no references but is a list of sorts.
    🌺 Cremastra (talk) 16:54, 13 January 2024 (UTC)Reply[reply]
  • I hate that linked essay, and have found that in most cases, unreferenced prose is relatively easily verifiable in published sources that the original editor neglected to include.
    Having said that, I do think the current wording of WP:DRAFTOBJECT is overly strict. I don't think the same reviewer / patroller should draftify the same page more than once, even in the absence of improvements, but if multiple reviewers / patrollers think an article should be returned to draftspace for improvement, that no longer strikes me as "unilateral", and it is in fact the draft creator's moves to mainspace that are "unilateral", and the required next process should be AfC rather than AfD.
    The AfD problem is real, but the garbage analogy is inapt. Unreferenced articles are less "this is garbage" and more "someone didn't fill out the paperwork". (Also, unless you're very nosy, it's usually pretty difficult to determine whose garbage you've happened across littered in the public space, and no municipality I'm aware of requires street cleaners to sort waste streams on pickup, even if it is best practice. Typically, this duty falls on the people who work the recycle and hazmat streams at the transfer station or other facilities, with the acknowledgement that the landfill stream will often contain material that properly ought to be processed differently.) Folly Mox (talk) 18:01, 13 January 2024 (UTC)Reply[reply]
  • To 99% of people having their article moved to draftspace is going to discourage them from ever improving it.★Trekker (talk) 22:16, 13 January 2024 (UTC)Reply[reply]
    Why so? Having it moved to draftspace is a chance for them to fix it without other editors swarming over it with cleanup tags and PROD templates and brightly-coloured banners. 🌺 Cremastra (talk) 22:42, 13 January 2024 (UTC)Reply[reply]
    I believe @WhatamIdoing has the specific numbers, but draftified articles have a dismal return-to-mainspace rate. Mach61 (talk) 01:26, 14 January 2024 (UTC)Reply[reply]
    Draftspace is where articles go to die, and we've known that for years. Steven Walling knows the original research on this best, and if you wanted to get more recent numbers, @Cryptic or someone else at Wikipedia:Request a query could probably tell you what percentage of pages in the Draft: namespace got deleted last year (e.g., created in January 2023 and deleted from the Draft: namespace since then).
    You can also estimate it from the logs. You can find the number of page moves into vs out of the Draft: space in Special:Log/move and the number of articles created and deleted in Special:RecentChanges. The numbers for the last couple of days look like roughly 120 articles created each day, 150 articles moved into the draftspace each day, 150 articles moved out of the draftspace each day, and 150 articles deleted each day. We put 270 articles in, and we deleted 150 of them. That's a 55% deletion rate. Ideally, you'd look at these numbers over the space of at least 7 full days, as there are definitely weekly trends in activity, and things like a holiday weekend, an important football game, a change in the activity level for even one key editor, etc., can throw the numbers off quite a bit. WhatamIdoing (talk) 22:48, 14 January 2024 (UTC)Reply[reply]
    But is all this necessarily bad? I believe in quality over quantity. 🌺 Cremastra (talk) 22:51, 14 January 2024 (UTC)Reply[reply]
  • Issues I believe this proposal would have to first resolve in order to have any chance of gaining consensus: (1) There will probably be a dispute about whether the alleged problem with the article/draft actually existed in the first place. (2) There will probably be a dispute about whether the alleged problem with the article/draft was sufficiently serious to justify draftification. (3) There will probably be a dispute about whether the alleged problem with the article/draft has actually been fixed. In all three cases, the draftifier is not particularly unlikely to be completely on the wrong side of consensus. The fact that the draftifier believes or claims that a page is "garbage" does not mean that the page actually is garbage. To take the example given by the proposer, I have, over the course of many years, seen many articles tagged as "unreferenced", despite that fact that those articles very obviously did have references (presumably because the tagger just did not like the references in the article). I cannot imagine the community supporting the unilateral draftification, with no right of appeal, of articles, where there is a real dispute about the 'appropriateness' of the draftification. James500 (talk) 02:18, 14 January 2024 (UTC)Reply[reply]
  • I don't think this can pass. Judging from previous village pump discussions, about half of Wikipedians don't like draftspace, seeing it as a backdoor to deletion. The de facto options in situations where a very poor article (such as one with no sources) is WP:DRAFTOBJECTed are WP:AFD, or WP:TNT. Hope this helps. –Novem Linguae (talk) 03:58, 14 January 2024 (UTC)Reply[reply]
  • Unsourced articles are an easy target but fundamentally WP:DRAFTOBJECT is not about article content, it's about WP:CONSENSUS. If you think an article doesn't meet WP:V and therefore shouldn't be in main space, but another editor disagrees in good faith (i.e. by reverting your bold move to draft space), then you have to stop and talk about it. There's really no way around that. You can't just insist that you're right and the other editor has to satisfy you, because you're the New Page Reviewer. That's not "the encyclopedia that anyone can edit". Besides, I've seen NPPers wrongly identify articles as unsourced plenty of times, whether because they missed references that looked like something else, a new editor struggling to format their sources, or just didn't read it properly. Folly Mox makes a good point about multiple reviewers being involved above, but still, if multiple editors are involved in a dispute about where a page should be, we'd expect them to discuss it (e.g. at AfD), not get into a move war. – Joe (talk) 07:14, 14 January 2024 (UTC)Reply[reply]
  • Wasn't the "information like this" from the Jimbo quote something about two tech founders throwing pies at each other to settle a dispute? That probably doesn't apply to most of the articles we're talking about, which don't tend to involve unlikely stories about BLPs (the quote is from years before the creation of WP:BLP). A few thoughts:
    • The underlying assumption is that the article creator WP:OWNs the article. This is supposed to be a collaborative project, so why should we treat an unsourced article as "your garbage"? I disagree that unsourced content is always "garbage" or "graffiti", but why don't we think of it as "our" problem? New content is a gift to us and to the world. Some gifts are bigger or smaller, and some are better or worse, but the absence of a little blue clicky number doesn't make it garbage. (My own idea of garbage is misinformation and disinformation.)
    • The belief that an unsourced article is "unfit for mainspace" is not supported by any policy or guideline. It is the personal preference of a fraction of editors, but it's not one of our rules. If we want to build a system based on this preference, then that preference needs to be turned into an actual rule first.
    • I wonder how big this problem actually is. I checked the pages created during the last three days in the mainspace and draftspace, using the visual editor (because there's a tag that makes it easy to check for the addition of a new ref, but it's not available for the 2010 wikitext editor [yet?]). 40% of them were redirects, at least 45% had at least one ref tag added in the first version of the page, and the remaining small fraction either had a ref added later (example, example, example), or not at all (example article, example dab page), or it actually had refs but they weren't autodetected (example, example, example, and pinging User:ESanders (WMF) to see whether that's a bug in mw:EditCheck). This is overall not feeling like a serious problem. Most pages that are supposed to have refs (e.g., they're not dab pages) are already getting refs. In fact, having looked at this, I don't think I would draftify a new article if this were the only serious problem.
  • WhatamIdoing (talk) 23:41, 14 January 2024 (UTC)Reply[reply]
    Unsourced articles are definitely unfit for mainspace in this day and age especially if they don't fall into the evergreen WP:NSPECIES, WP:NPOL and WP:NGEO spectrum. I personally prefer the AFD route than the draftification route, however, it still stands that unless improved, a complete unsourced article is no better than misinformation and disinformation. Sohom (talk) 15:42, 15 January 2024 (UTC)Reply[reply]
    @Sohom Datta, there is no policy or guideline that says all articles must cite at least one reliable source. Wikipedia:Notability explicitly says the opposite: what makes a subject notable is whether sources exist in the real world, not whether sources have been typed into the Wikipedia article. It is true that some individuals personally believe that an article without a source is unfit for mainspace, but that's a personal belief and is not supported by policy.
    BTW, the research on the draftspace indicates that if you want unsourced articles to get sources, you need to leave them in the mainspace. If your goal is to get them deleted with a minimum of fuss and bother, then you should put them in the draftspace. WhatamIdoing (talk) 17:27, 15 January 2024 (UTC)Reply[reply]
    I do agree that articles need to have at least one reliable source cited though. I think what you meant here is that this should not be acted retroactively. CactiStaccingCrane (talk) 17:29, 15 January 2024 (UTC)Reply[reply]
    My point is even smaller than that: Editors should not misrepresent the state of the actual rules by claiming that unsourced articles can't be, or shouldn't be, in the mainspace solely because they are unsourced. The correct (i.e., accurate and honest) process is:
    1. Rules explicitly do not require a source to be cited in a mainspace article.
    2. Get rules changed to require at least one source to be cited.
    3. Tell editors (especially newbies) that their new article is unacceptable because it does not comply with the new rule.
    The process that some editors are currently using is:
    1. Rules explicitly do not require a source to be cited in a mainspace article.
    2. Tell editors (especially newbies) that their new article is unacceptable because it does not meet my personal criteria, while pretending that my personal criteria are the actual rules.
    Whether the new rule is retroactive or not is not really a concern of mine. I am confident that it would eventually become retroactive even if it doesn't start that way. (That's exactly what happened with the rules for WP:BLPPROD: it started off as solely forward-looking, and became retroactive later.) What concerns me is editors claiming that the rules are X when the rules are actually not-X. Either change your claims or change the rules, but don't misrepresent the rules. WhatamIdoing (talk) 17:41, 15 January 2024 (UTC)Reply[reply]
    @WhatamIdoing I think you missed my point about a AFD. The fastest way to get sources to a article in my experience (counterintuitively and unfortunately) is an articles for deletion/discussion, not letting it languish in mainspace (and definitely not draftspace). A AFD puts it on the radar on multiple wikiprojects, which are much more likely to provide reliable sourcing than I will ever be able to provide.
    If even after 2/3 weeks of advertising, nobody (including the article creator) thinks the article is worth saving, that could/should indicate that the article is probably not notable at that current moment.
    Also, I agree that there currently exists no policy that prevents a editor from not including any sources in a article (theoretically). But at a much more practical level, it is not really fair to expect a editor with limited prior understanding of the subject matter to accurately evaluate a articles notability if they have absolutely zero starting points for their search for sources. Sohom (talk) 17:57, 15 January 2024 (UTC)Reply[reply]
    And yet we say that Wikipedia:Deletion is not cleanup, because it's uncollegial and anti-collaborative for an editor to demand that others drop everything they're doing because an article must be sourced this week, or it will be deleted. As you say, editors with limited prior understanding of the subject matter have difficulty accurately evaluating notability for those subjects – so they shouldn't be sending them to AFD in the first place. AFD is for articles that you genuinely believe to be non-notable, not articles you'd like someone else to improve right away.
    Permitting editors to use AFD to demand clean up of subjects they're unfamiliar with is also a source of systemic bias. We've had altogether too many cases of editors sending Asia- and Africa-related subjects off to AFD out of ignorance, thinking that WP:NEVERHEARDOFIT is a good enough excuse and that if other editors want to keep it, then they will cheerfully drop everything they're doing to provide sources. If nobody intervenes, we lose the articles. This is not okay. WhatamIdoing (talk) 18:07, 15 January 2024 (UTC)Reply[reply]
    I personally don't think it is uncollaborative to go "Hey, this article has no sources, and I could not find any based on a few Google searches, what should we do about this ?" (which tends to be most AFDs these days). For all you know, it could be a radioactive peice of hallucinating ChatGPT junk (or other promotional garbage) which needs to nuked out of orbit ASAP, or it could be a documenting an important niche topic that few people have heard about which needs to be preserved. AFD is lot more collaborative than, "well, that's somebody else's problem I guess" and walking away. Sohom (talk) 18:28, 15 January 2024 (UTC)Reply[reply]
    Do you mean that the creating-edit isn't tagged with "adds reference"? That does look suspicious. Am I still allowed to ask you to file a Phab task 🙂 ESanders (WMF) (talk) 18:31, 15 January 2024 (UTC)Reply[reply]
    If y'all end up creating a phab, could you subscribe @soda (me) as well :) Sohom (talk) 18:34, 15 January 2024 (UTC)Reply[reply]
  • Oppose - The remedy here is AFD, not permanent banishment to AFC (followed by eventual deletion without discussion through G13). If you can't see a consensus to delete, please don't harass article creators. ~Kvng (talk) 13:15, 16 January 2024 (UTC)Reply[reply]
    Sigh. You didn't read the banner, or the edit notice, or my reminder, did you. 🌺 Cremastra (talk) 13:24, 16 January 2024 (UTC)Reply[reply]
    @Cremastra, No I clearly did not. Sorry. I'm not sure how to be constructive about this proposal. Does that mean I should say nothing? ~Kvng (talk) 02:14, 17 January 2024 (UTC)Reply[reply]
  • "The topic is frequently fringe, difficult to verify, but not necessarily PRODable or AfD'able." If it would be appropriate to boldly draftify an article, then it would appropriate (when contested without fixing the issue) to nominate it at AfD for draftification. As with a contested blank-to-redirect, for which the appropriate discussion venue is AfD per this request for comment, you don't need to request deletion in order to nominate an article at AfD. And if it is not nominated for the purpose of deletion, then a full WP:BEFORE inquiry about whether the subject is notable and so forth isn't applicable.
I'd like to see the standard draftification messages more explicitly say that if an editor disagrees with the reasons for draftification, they can respond to the reasons for the move and ask (insist) that the article be restored to mainspace until there is a discussion to either delete it or make it a draft. SilverLocust 💬 19:56, 16 January 2024 (UTC)Reply[reply]

Excise the "crime" topic from the "law" topic; make "law, government and administration" an established grouping of topics[edit]

As an editor largely focused on law and legal philosophy articles, I find it hard to locate articles on subjects relevant to me and my expertise on Wikipedia-space pages that are sorted by topic (like WP:FAC, WP:GAC, or WP:RfC).

The undifferentiated inclusion of the very popular "crime" and "law enforcement" categories (here I don't mean categories in the technical sense) in the overall fairly unpopular "law" category means that most articles sorted under "law" turn out to be "Murder of so-and-so" or "Jimminy Bob Killman (murderer)"-type articles which require no particular expertise in any country's law or any field of legal scholarship. In terms of the skills required to contribute, they are perhaps closer to biography or history articles, or they might be in a category of their own.

Since law is a fairly underappreciated topic on Wikipedia (I find), it may be appropriate in some contexts to group it together with related topics. I would suggest that "government" be split off from "politics and government" where it exists, and merged with "law" and a newly minted category, "administration", forming the "law, government and administration" category. ("Government" would thenceforth focus more on formal/institutional political science and social philosophy, whereas "politics" would be retain people, parties, ideologies, and what have you.)

I'd be interested in alternative solutions, though, since two objections to the idea do spring readily to my mind: For one, public law is arguably the only government-/administration-adjacent field of law, and private law should be appended to the "economics" topic or something. For another, grouping law with government might create the impression that the law is definitely an institution of the state (the government), or even synonymous with it. This is not undisputed; especially some jurists in the common law tradition (found e.g in England and the U.S.) maintain that it is fundamentally a set of societal traditions, influenced by, but not derived from, the state.

What I'll be looking for when I put the finished proposal out is a consensus that the new categorization/grouping practices be recommended for internal and public-facing categorization purposes.

Again, I'd love to hear perspectives and suggestions. If you could tell me which user communities I should notify about this post or the finished proposal so that consensus is representative and the relevant people are aware of this re-systematization, I'd also be much obliged.

§§ LegFun §§ talk §§ 22:20, 13 January 2024 (UTC)Reply[reply]

I was going to recommend -deepcat:"Crime", but there are too many subcategories and the search fails. Folly Mox (talk) 02:44, 14 January 2024 (UTC)Reply[reply]
Have you tried searching incategory:"WikiProject Law articles" -incategory:"WikiProject Crime and Criminal Biography articles" ? Make sure you are searching in the Talk: namespace or maybe just go through Category:WikiProject Law articles 115.188.140.167 (talk) 10:16, 14 January 2024 (UTC)Reply[reply]
Are you looking for articles to edit, or specifically for FAs and GAs? SuggestBot can look at your contributions and suggest a list of articles that you might be interested in. WhatamIdoing (talk) 23:44, 14 January 2024 (UTC)Reply[reply]
Thanks for those recommendations! SuggestBot and the specific phrasing of the search terms provided, in particular, sound useful for my purposes. I was hoping to help bring already high-quality law articles into the spotlight by reviewing/supporting them in the FA/GA processes, which is why I was looking there specifically. I also want to get a general idea of the structure and weak points of jurisprudential Wikipedia.
That said, I'm not sure if looking for articles listed by WikiProject Law will help in every case. Wherever articles of all topics are listed for specific encyclopedic purposes, the poor categorization of law will make things hard; which is why I wanted to achieve a general consensus before I charge ahead with shifting around categories. Additionally, I'm not too well-versed in Wikipedia's more technical side, so any list-producing bots that automatically compile articles into categories would need to be adjusted by more adept minds.
Incidentally, it occurs to me that perhaps I ought to get WikiProject Law on board first . . .
§§ LegFun §§ talk §§ 08:04, 15 January 2024 (UTC)Reply[reply]
I agree that making friends over at WPLAW is a good idea. Also, take a look at Wikipedia:Version 1.0 Editorial Team/Law articles by quality statistics. The top- and high-priority articles are probably the ones you want (or, the ones you want should probably be considered top- and high-priority; the categorization is not always perfect). Wikipedia:WikiProject Law/Popular pages might also be useful to you. WhatamIdoing (talk) 17:32, 15 January 2024 (UTC)Reply[reply]

New tags on recent changes.[edit]

Hello, Would it be possible to add the tags "previously undone edit", "possible edit war" and "likely repeated vandalism". All would be activated in slightly different ways, the first would be activated when the same user undos the reversion of their original edit; the second would be activated when 2 separate versions of a revision are rapidly changed between; the third would be an escalation for the first, a high ORES score edit being consistently reverted and re-added.

 Thanks Geardona (talk to me?) 02:37, 14 January 2024 (UTC)Reply[reply]

Definitely maybe. The first might be possible in Special:AbuseFilter (about which you can inquire at Wikipedia:Village pump (technical)). I think that detecting the others might (a) have to be done in MediaWiki code, like the Reverted tag, and (b) not be available instantly, especially if you want it to work if there have been unrelated intervening tags. WhatamIdoing (talk) 23:47, 14 January 2024 (UTC)Reply[reply]
Probably useful to file a phab for this, and maybe something that the ModTools team would be interested in looking into :) Sohom (talk) 15:17, 15 January 2024 (UTC)Reply[reply]
How would I file a phab request? Geardona (talk to me?) 19:29, 16 January 2024 (UTC)Reply[reply]
WP:BUGS. --Redrose64 🌹 (talk) 19:32, 16 January 2024 (UTC)Reply[reply]
 Thanks Geardona (talk to me?) 19:34, 16 January 2024 (UTC)Reply[reply]

Also WP:Crime labels. Both new. -- GreenC 06:33, 15 January 2024 (UTC)Reply[reply]

Can Wikipedia Provide An AI Tool To Evaluate News and Information on the Internet[edit]

(refactored from Help desk)

The integration of a neutral and curated AI platform with search engines has the potential to revolutionize information access and combat misinformation. By providing context, flagging biases, and promoting critical thinking, these platforms can empower users to navigate the complex information landscape of the internet Perfedge (talk) 06:06, 15 January 2024 (UTC)Reply[reply]

Wikipedia is an online encyclopaedia, not a search engine, so that would be out of scope for the project. And while the Wikimedia Foundation might possibly have the funding to finance such a project, if it were feasible, they don't have much of a reputation for creating complex software with any degree of rapidity. And frankly, the evidence that current AI is even capable of fulfilling the objectives you describe would appear to be lacking. 'Critical thinking' will, for now, have to be facilitated by actually doing the thinking ourselves. AndyTheGrump (talk) 06:17, 15 January 2024 (UTC)Reply[reply]
Oddly enough, that statement sounds like something a chatbot would spit out. Or a "corporate mission statement." But I have to object to using "curated" "revolutionize" "combat" "promoting" "empower" and "landscape" in just two sentences.
Also, no. Per AndyTheGrump, we will have to continue to use brains. 🌺 Cremastra (talk) 22:40, 16 January 2024 (UTC)Reply[reply]

Ignoring the lead comment (which no offense to user, sounds like machine-generated businessbabble, hold the synergy), I'm intrigued by the header question. I don't think ChatGPT in any nearcoming commercial iteration will be particularly useful as a dynamic tool for evaluating sources, not least because it doesn't get updated fast enough. However at its most basic, I'd like to see if it can give some intelligent analysis (at either the individual article level or of an outlet given a sample of articles) that's better than some of the crud at Media Bias/Fact Check. (Have MBFC's writeups gotten better in the last couple years? Their methodology for comparing outlets actually in theory seems not bad.)

To start I tried asking ChatGPT (3.5) in a couple prompts to evaluate bias and reliability on article from an inner-metro local paper, the Dorchester Reporter, figuring that the author is unknown and the paper would not be much evaluated in any secondary media GPT may have seen. I'm currently having trouble getting the output to say anything of substance (it almost seems to be telling me to do my own homework!) (and yes I'm including the full article plaintext in the prompt). If anyone else is interested, see if they can get anything substantial -- I have then prepared a regional cross-section of local outlet stories on the same topic, and a also spread of sample stories from a new outlet, to see what kind of reports it will give. SamuelRiv (talk) 01:54, 17 January 2024 (UTC)Reply[reply]

Export to PDF, Epub, Odt and LaTeX[edit]

Hi, I would like to include mediawiki2latex in Wikipedia. It exports wiki articles to PDF, Epub, Odt and LaTeX. Yours 11:43, 15 January 2024 (UTC) Dirk Hünniger (talk) 11:43, 15 January 2024 (UTC)Reply[reply]

Improving Wikipedia's Editing Process[edit]

I'm an expert in an emerging technology called "General Collective Intelligence" or GCI platforms, which aim to significantly increase the collective intelligence of groups, and hence to significantly increase impact on any collective outcome in general that a group might target, per dollar of resources spent. This has the potential for radically disruptive positive impact in a vast range of areas from addressing poverty, to sustainable economic development, to addressing the environmental degradation that has been associated with climate change, but I'd like to begin with improving Wikipedia's editing process. In summary, a general collective intelligence identifies different problem solving strategies and the domains in which each strategy is optimal in order to switch to the optimal strategy for each problem. Some typical issues that plague Wikipedia (like edit warring) are just symptoms of a clash between problem-solving strategies rather than the collective intelligence to cooperate in using the optimal strategy. A short summary is linked below. I look forward to your guidance.

https://en.wikipedia.org/wiki/User:CognitiveMMA/sandbox CognitiveMMA (talk) 14:39, 15 January 2024 (UTC)Reply[reply]

Please provide citations to appropriate third-party sources supporting your claims concerning the abilities of 'General Collective Intelligence'. AndyTheGrump (talk) 14:44, 15 January 2024 (UTC)Reply[reply]

To achieve Collective Intelligence one needs a lot of permutations and most Wikipedia articles tend to reach a "good enough" state, then don't deviate much beyond that due to the work involved in rewriting existing text. It's actually harder to rewrite an existing article than create a new one. In rewriting, you have to consider and incorporate what is already there, plus add new content around it. Very time consuming. Most users don't undertake that unless they are really dedicated, or the existing requires WP:TNT. Most articles reach a point then don't evolve much. (Ignoring all the maintenance and trivial edits). A better example of GCI is generative AI because it speeds up the evolution process to (literally) light speed, one can quickly see collective intelligence generating new things of value. -- GreenC 16:24, 15 January 2024 (UTC)Reply[reply]

Uh, what? CactiStaccingCrane (talk) 17:31, 15 January 2024 (UTC)Reply[reply]
Sounds like doi:10.1007/978-981-19-2840-6_18. It's probably not a notable subject, and I doubt that it would work. That is, it would only work if nearly all the participants were truly open to any outcome, which is basically never the case when there's a dispute. It would have us be hyper-rational when people are being emotional. It could probably turn into a decent consulting career specializing in things like creating corporate mission statements and such, though. WhatamIdoing (talk) 17:52, 15 January 2024 (UTC)Reply[reply]
Indeed, uh what? Or to put it another way, 'citation needed'. I know it's fashionable these days to claim that AI can solve all of the world's problems before tea-time, but we actually need concrete verifiable evidence to justify using it on Wikipedia. AndyTheGrump (talk) 17:56, 15 January 2024 (UTC)Reply[reply]

Understandablifying advanced math articles -- GPT and the Holy Grail[edit]

Forever ago (2007) I created a few theoretical comp sci and neurosci articles -- stressing the content is mostly math. One of which I (mis?)remember, but cannot seem to find now, attracted some comment on the Talk page like it was completely incomprehensible and the worst article they had ever read on WP. (I had thought it was Flashsort, which I wrote on WP:REQuest and attracted some mockery, but I might be confusing stuff.) I had a good laugh at this stuff -- if you check the Talk pages on WP math articles, such comments aren't exactly unusual -- I always wondered in my time educating and writing if there wasn't some way to just break through this seeming dense incomprehensibility barrier that's almost universal in rigorous technical text.

Fastforward to today and people figuring out use cases for ChatGPT (3.5). I type in the following: Rewrite the wikipedia article on Flashsort into simplified English

ChatGPT 3.5: Flashsort into simplified English

Flashsort is a way to arrange numbers or data in order, from the smallest to the biggest. It's a special method that does this really quickly.

When we want to sort things, like a list of numbers, there are different methods to do it. Some are faster than others. Flashsort is one of the faster ones.

Here's how it works:

  • Dividing the Range: First, we look at the numbers we want to sort and find the smallest and biggest numbers. This helps us figure out how big our sorting range is.
  • Creating Buckets: Imagine we have a bunch of boxes, and each box can hold a certain range of numbers. We create these boxes, also called "buckets," to fit the range of numbers in the list.
  • Distributing Numbers: Now, we go through our list of numbers and quickly put them into the right buckets. We do this by calculating where each number should go based on its value.
  • Sorting Inside Buckets: After we've put all the numbers in their respective buckets, we sort the numbers inside each bucket. We can use any sorting method we like, but inside each bucket, the numbers are already close to each other, so we don't have to do a lot of work.
  • Joining Buckets: Finally, we take the sorted numbers from each bucket and put them together. When we do this, the numbers are all sorted from smallest to biggest.

Advantages of Flashsort:

  • It's fast, especially when dealing with a large number of items.
  • It's efficient because it doesn't waste time comparing every number to every other number.
  • It works well when the numbers are distributed unevenly, meaning some numbers are way bigger or smaller than others.

Limitations of Flashsort:

  • It might not be the best choice for sorting very small lists.
  • It needs some extra memory space to create the buckets.

In simple terms, Flashsort is a clever way to quickly sort numbers by dividing them into buckets, sorting those buckets, and then putting everything together. It's like sorting toys into different boxes based on their size and then arranging the toys inside each box neatly. This helps us find what we need faster.

The output is mostly accurate. More to the point, while it doesn't sound completely precise in a mathy sorta way, when I went over it a few times I couldn't think of anything wrong with it, or even necessarily disqualifyingly imprecise about the text. It reminded me of a really really well-written teen/lay science book, except such a really esoteric topic that nobody would ever put it in a teen or pop book.

All I can think is that this is a game changer, especially with GPT-4. The lede of almost every major, but densely written, technical article can be run through this and then curated, since the lede does not have to be 100% precise on any topic -- history, agriculture, sports, whatever -- that's for the body. Even more significantly, this could be an engine to draft minor articles for review for Simple English WP, which is still anemic by comparison.

Again, to emphasize, (as if anyone anywhere serious disagrees or hasn't considered this:) all generated material is a draft that must be reviewed in its entirety for content; and (lesser consensus on this:) any lengthy prose retained verbatim for WP or other publication should include attribution to content generation tools used. SamuelRiv (talk) 01:59, 16 January 2024 (UTC)Reply[reply]

I think the "mostly" in "mostly accurate" is a problem. ChatGPT and these things have a known reputation for saying untrue things. In practice, I think this suggestion will fail because it's a lot of effort to check each claim, and people will often forget to do so. Jo-Jo Eumerus (talk) 08:48, 16 January 2024 (UTC)Reply[reply]
And let's not forget that the sheer existence of English Wikipedia and Simple English wikipedia, is one of the reasons this model even knows how to do this. —TheDJ (talkcontribs) 09:34, 16 January 2024 (UTC)Reply[reply]
You really don't see "anything wrong" with "It works well when the numbers are distributed unevenly"? I'm also going to take issue with why it says "Sorting Inside Buckets" is fast, the very existence of the "Joining Buckets" step, and (relatedly) that it doesn't say anything besides "quickly" about how this differs from a histogram sort, the explanation of which forms the bulk of the article. —Cryptic 10:09, 16 January 2024 (UTC)Reply[reply]
I know the idea lab is for positive feedback, but this example is not great. There are basic errors: not only does flashsort not work well when the numbers are distributed unevenly (because some buckets remain big), but "uneven" doesn't mean some numbers are way bigger or smaller than others (1, 2, 10 is uneven; 1, 1001, 2001 isn't). Worse, it sounds credible: having sucked the reader in with a few truisms, it hallucinates plausibly. Even some true facts are misleading: it doesn't waste time comparing every number to every other number, but even bubble sort isn't that dim. Less importantly, the tone is unencyclopedic, but that covers up the fact that even the correct statements are vague and imprecise (fast rather than O(n), etc.). One day we may be able to curate good AI-written articles, but we seem to be a long way from that yet. Certes (talk) 14:16, 16 January 2024 (UTC)Reply[reply]
Some things to clarify:
First, I did all this several months ago and only have notes of the prompt and output. I should have reread instead of going just from memory. Yes, the "Advantages" and "Limitations" are awful, so I shouldn't have flatly said there wasn't "anything wrong".
Second, we can show it with a better article example (one I actually cared to remember anything about), and also on GPT-4, but what I'm impressed by is the ability to put technical text into a readable format while still accurately describing steps of the algorithm. I'd still say on a quick re-read there's nothing disqualifyingly wrong in what is affirmatively stated (except in the two sections noted), except there is significant omission (it's a very short output for a long technical article) -- my point is that it can help write good ledes where you will always have imprecision and omission. I'm not sure where specifically you see hallucinations per se -- it doesn't go off into outer space in the sections noted.
Third, you can fine-tune tone with prompts, and you can fine-tune how much mathematical language you want to have in there too (or just add it yourself in the curation process. For the prompt I used, copied literally, I'd say it output something expected.
Fourth, who said anything about "AI-written articles"? I'm talking about a tool to aid in writing -- the kinds of things the whole rest of the ChatGPT-using population is using it for. SamuelRiv (talk) SamuelRiv (talk) 18:06, 16 January 2024 (UTC)Reply[reply]
I agree that a simple prompt like the one you have provided can easily be improved to get better results. Pinging @JPxG who has a lot of experience in this area. — Qwerfjkltalk 19:37, 16 January 2024 (UTC)Reply[reply]