Jump to content

Help talk:Searching/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 5

Some questions

  • Is it possible to do a boolean "Turing AND homosexuality" search
Not easily. This works however: "(?s)(Turing.*homosexuality)|(homosexuality.*Turing)". The search will take forever.
  • Can this article be split into "basic" and "advanced" searching advice, since bringing up regular expressions so early on might be a culture shock for non-Unix weenies :)
  • Though the new search is faster, but the quality of the search result is extremely low compare to the old slow search. For example, I just (Oct 18, 2001) searched "Jim Henson" and there were pages and pages and pages of hit, but none of what I searched for show near the top 30. On the other hand, the slow search produce the results correctly. Have anyone tested or fine tuned this new search engine on wikipedia? It is quite unusable as is.
    • See Wikipedia commentary/Search Engine. The new search indexes the articles periodically, so new pages don't show up for a bit. The old search simply uncompresses the entire database and greps it, so it will find changes as soon as they happen... if it ever finishes searching. Neither system is perfect, so hopefully there will be improvements. --Stephen Gilbert 19:05, 18 October 2001 (UTC)

Opera 8 Doesn't Work

I tried using the instructions for adding a Wikipedia Quick Shortcut to Opera 8, and it doesn't work at all. If I try typing in "w test" into my address bar, Opera tries to go to wtest.com.

Multilinugal Searching

For all people speaking more than one language, it would be a help if the search engine returns hits (no full text search, I think) from other languages, if no article is found. 192.33.101.239 14:23, 20 Jul 2004 (UTC)

Google searching of the English Wikipedia

The search page's Google search form should be fixed to search en.wikipedia.org instead of www.wikipedia.org since most English pages are now indexed by Google at the new domain.

"self" not found

Why can't I find the word "self"? How is that badly formed?

see Wikipedia:Common words, searching for which is not possible, sorry. If you put in your browser [[[[www.wikipedia.org/wiki/ and your searchword, it will work. try http://www.wikipedia.org/wiki/Self Fantasy 16:54, 13 Aug 2003 (UTC)
PS: Or use Google in combination with Wikipedia.

== ignoring words ==]]]] While length is not a useful way to decide what goes in the index, as above postings have made clear, certainly the most common words should be excluded. But why not just ignore such words rather than force the user to delete them and try again? Google does this, simply informing the user on the results page that "XXX is a very common word and was ignored." -- Michael Shulman



To do this, I've have to include the entire MySQL stoplist in the Wikipedia software itself. I'm not sure that's worth the effort. LDC


I come from the German wikipedia, and that language has, like many others, diacritical letters, which are often expressed by some other means, e.g. use 'ss' instead the sharp s 'ß', or 'ae' instead of 'ä'. The situation is similar in Spanish (e.g., á é í ó ú), ... Names are often written in the spelling of the original language (Perón of Argentina), or simply as Peron.


i think that is one error to change artcyclopedia , because before i was able to find , for example :DAUBIGNY , at all museums of the world, please don´t change painters online, is the better of i never have seen in internet, , http://www.guillermograndal.com ,


Thus it would be great to have 'equivalent characters', which permit a user to say 'o' and 'ó' are to be treated identical in this search.

Have there been any thoughts in this direction? -- Schewek

Yes, that's planned for the new improved search engine. I can't guarantee when this'll be ready, though. --Brion VIBBER

Searching for Down fails. But not all four-letter searches fail. I know there's an entry labeled Down, why does the search fail? -- Zoe


I've mentioned the arbitrary length to Lee on the mailing list; he agreed that it was a problem (consider these searches: malcolm x, george w bush, pi). Anyway, he promised to look into it. Koyaanis Qatsi 16:54 Jul 22, 2002 (PDT)

But why can I find blue, fire, cats, etc.? I can't find will, don't even know if that's an entry. It isn't a standard that all four-letter searches fail. What's the criterion? -- Zoe

You can find "blue", "fire", and "cats" just fine. And the new software produces useful results for "cat", "ct", and "pi". MySQL still won't index single letters, so "Malcolm X" does find "Malcolm X", but only because it finds every Malcolm. "Will" is a problem, because it is in MySQL stoplist, i.e., common words like "the" and "have" that are not indexed. That's the problem with "down" as well (though that makes a little less sense. I'm not sure what MySQL's criteria were for their stoplist). LDC


Oh, so you've fixed that already. I missed the announcement, sorry. Thanks for all your work, BTW. Koyaanis Qatsi

Will

Wrote a page on will. Also wrote a page on poverty, but I'm worried about the fact that Wikipedia is not a dictionary. Mswake 12:31 Jul 26, 2002 (PDT)


Funny to find discussion on will (which, btw, you still can't find by searching) because I can't access my article on free will by searching. What gives?

Phrases

Is there a way to search for phrases? Bob Jonkman


I don't really know about these things, so I won't do it myself, but should the link at the bottom of this page to Google reference wikipedia.org rather than wikipedia.com now? --Camembert

Nope - .org isn't completely indexed by Google yet. Compare "United States" site:www.wikipedia.org with "United States" site:www.wikipedia.com. Searching the .org address doesn't even find our United States article. --mav

Mozilla

Can anyone tell me exactly what to do to get mozilla to search wikipedia by default? --the semi-computer literate KQ

Add the Wikipedia (EN) search plugin, available at Mycroft site, then go to Edit->Preferences->Internet Search and make it your default plugin by selecting it from the dropdown list. :) --Unforgettableid | talk to me 05:43, 15 December 2005 (UTC)

British spelling

In order to make searching work reasonably, we have to be aware of American / British spelling differences. For example, if you search for "electronic colour code", you fail to find the article electronic color code, which was presumably originally written by a USAite. As the text is written, there is no conventient way to slip the word "colour" into the body of the text so that it gets found in a search.

I've tried adding text in html comments <!-- electronic colour code --> which seem to work as comments if on a line by themselves. but not if embedded mid paragraph. Search doesn't find them. Is there a way of adding "keywords" for searching to an article? Is there a way (like misspelling) of automatically making a search for either color or for colour actually search for "(color or colour)"? -- SGBailey 22:26 Dec 26, 2002 (UTC)

Just make a redirect (as I just did) and mention the alternate spelling on the first line. Then searches will work. --mav
What you actually appear to have done is to create a new article electronic colour code which is a redirect to the US spelling and then have linked to the redirect from Talk:electronic color code to prevent it being an orphan. -- Fine. Which FAQ should this tit-bit of information go in? (I'm happy to put it there. -- SGBailey 22:57 Dec 26, 2002 (UTC)
See Wikipedia:Contributing FAQ (do a find on "American"). This FAQ does need help. Your other questions will have to be answered by a developer. --mav

What do we do for "significant" search keywords which are not in the article name? As an *example* if there was an articel 'Famous actors', we might have text "theater" in the article but want "theatre" to also work in searches. -- SGBailey 22:57 Dec 26, 2002 (UTC)


Four color theorem

Another one on searches: Try searching for the four colour theorem: The following are rejected by SQL:

  • "four color ( theory or theorem )"
  • "( theory or theorem ) four color"
  • "( theory or theorem ) and four and color"

yet the follwoing work:

  • "( theory or theorem ) four and color"

Why? -- SGBailey 22:37 Dec 26, 2002 (UTC)

SQL requires each boolean word (and, or, (), not) separated by words being searched (non-boolean words). The format is boolean your words boolean your words boolean ....
ALL your first 3 failed searches didn't meet the criteria:
  • "four a boolean here color ( theory or theorem )"
  • "( theory or theorem ) four a boolean here color"
  • "( theory or theorem ) another word here and four and color"

Can the search engine developers confirm this? And for SGBailey, Try Wikipedia:Searching -- User:kt2

Short answer: the boolean magic in our search engine is very fragile; one of these days we're going to throw it out and replace it (possibly by upgrading to MySQL 4.0, which has built-in boolean magic in its fulltext search). Until then, boolean searching is more of an art than a science.
In this particular case, the "four" is causing trouble, as it's in MySQL's "stopword" list: it's one of a number of common words that it assumes won't bring useful search results, so they aren't indexed. The way our fragile search works does separate matches on each word and then ands/ors them together; searching a stopword thus gives _no_ results for that word's match, and for the 'and' common case gives a non-intuitive total result (ie, nothing!). So, we silently strip stopwords from your query before parsing it: thus "four color ( theory or theorem )" becomes "color (theory or theorem)". Note that 'and's are implicitly added most of the time, but parentheses muck up the works: search explicitly for "color and (theory or theorem)" and you'll get your man. --Brion 08:21 Dec 27, 2002 (UTC)

Hi!
I'm puzzled by the search box at the bottom of each page. There are two words (to the right of the box) SEARCH and GO. But they seem to do the same thing. I've never seen two words to choose from on any other web site.Please explain! Arpingstone 10:32 Jan 28, 2003 (UTC)

If you type the exact title of an article in the search box and hit the go button, you will be taken directly to the article. Whereas if you hit the search button you will be shown a list of pages containg the text. Mintguy

Problem

There seems to be something wrong with the search engine. Not a major functional problem - it's finding things all right - just with the way it's displaying the results. It used to be that each item on the search results list would have the article title followed by an extract from the article with the search term/s highlighted; what I'm getting now is the article title followed by 50 characters from each of the first five lines of the article, which isn't often very helpful. Does anybody know what's going on? -- Paul A, 4 Feb 2003 8:30 UTC

Side effect of a quick performance hack I added. I'll try to fix it. --Brion 08:34 Feb 4, 2003 (UTC)

It also seems to be displaying things in a different order. In fact, I can't work out the logic of the order it's displaying things at all. For example, I just searched for Lou Harrison, and the first fourteen article text results had "Lou" and "Harrison" in them, but not the whole phrase "Lou Harrison". As a result, a lot of irrelevant stuff is given prominence. Sorry if this has been brought up somewhere else, I've not spotted it. --Camembert

Actually, now I check again, many (maybe even all) of those first 14 results don't have "Lou" in them at all, just "Harrison". --Camembert 20:21 Feb 7, 2003 (UTC)
Searching for phrases is unfortunately not possible, and additionally Boolean "and" search has temporarily been disabled. Now there is no point in putting "Lou Harrison" in the search box, is works as "or". Choose "Lou" or "Harrison", whichever you think occurs less, to minimize the number of undesired results. - Patrick 13:10 Feb 9, 2003 (UTC)
Ah, I didn't realise that "AND"s were no longer being quietly inserted between search terms and that Boolean searches were in any case disabled. However, this meta page says "Results with all terms will be returned preferably, but partial results should show up as well, further down in the list", but this isn't what is happening - results are just getting mixed up in any old order, so that reults with all terms might be at the bottom of the list. But anyway, if full functionality will be restored in time, that's fine. --Camembert

Acapedia

rm acapedia, because the cached version redirects to the current wikipedia version for some links - MyRedDice

What do you mean? Google caches a 2nd copy of wikipedia articles through acapedia, as far as I have seen, and a better one: it does not have the problem that Google text interferes with the top of the article text. - Patrick 22:30 Mar 18, 2003 (UTC)

I went through a few links from such a google search, and I clicked on the "Cached" link, and I found myself redirected to the live, non-cached, wikipedia page. For example, click on the "Cached" link here.

It seems you have searched the whole web for the term "acapedia", but you have to search acapedia (take my link) for some other term, for example "rijngouwelijn". Then you get http://www.google.com/custom?hl=en&lr=&ie=ISO-8859-1&cof=&domains=acapedia.org&q=rijngouwelijn&btnG=Google+Search&sitesearch=acapedia.org

The second cached result is

http://216.239.39.100/custom?q=cache:PZohTDY8i6QC:acapedia.org/aca/Light_rail+rijngouwelijn&hl=en&ie=UTF-8

which is better than

http://216.239.39.100/custom?q=cache:kkgOR6oDp00C:www.wikipedia.org/wiki/Streetcar+rijngouwelijn&hl=en&ie=UTF-8

Congratulations

Congratulations on making the wikipedia totally unusable in all practical respects. If you can't search the database then what's the point in having one? I'm not going to bother writing article when nobody can find them for a month. Goodbye. KJ 01:12 Apr 4, 2003 (UTC)

The search function has been disabled during peak hours because of performance problems - the idea is that if it's left fully functioning, the database grounds to a halt and cannot be used at all. Having no up-to-date search function for much of the day is indeed annoying, but having no working database whatsoever would be even worse.
This is very much a temporary thing, and the problem is being worked on - apart from ongoing efforts to improve the performance of the site in general, there's supposed to be a new server being installed which, if I understand things correctly, is going to make the search function available once more. Come back when it's up and working again, won't you? It'd be a shame to lose you. --Camembert
By the way - if you know the name of the article you want to go to, you can type it in the search box and hit "Go" - that function is still working. --Camembert

Wikipedia namespace

It would be good if the wikipedia namespace would be included in the titles search when Wikipedia's internal full-text search facility is temporarily disabled.

Even when Wikipedia's internal full-text search facility is on, a title search option would be useful (faster if here are many hits in the full text).

Patrick 12:04 Apr 16, 2003 (UTC)

Search page too complicated

I think this page is overly complicated, and the striked paragraphs make it even more confusing

A simple rewrite would be based on examples, possibly similar to the How to edit a page page.. I will try to simplify it.. -- Rotem Dan 21:22 Apr 16, 2003 (UTC)


Curious - shouldn't we search using bomis.com, since they're kind enough to provide us hosting?

Search broken

Moved from Wikipedia:Village pump

Is it just me, or is search completely broken at the moment (ie. returning no article title matches for keywords that I know should have matches)? --rbrwr

It's not just you, that's for sure. -- John Owens
I went ahead and disabled the title search yesterday so I could actually get at pages with less than a fifteen minute wait. Selfish of me, I know. --Brion
Well, it doesn't seem to be working very well anyway, as a fix for that. ;) -- John Owens 19:26 Apr 22, 2003 (UTC)


Why not tie the search directly into Google, instead of letting users think nothing has been found for their search? Cgs
Agreed; search directly into google, or at least have some sort of notification on the results page that title search has been disabled rather than just saying "no results found"... it took me almost two days of "gee, why isn't there an article about that??" before I realized what was going on. :) kwertii
Ugh! Yes, either fix the title seach or label it as broken, please! Logotu 20:41 Apr 24, 2003 (UTC)
I figured out that the title search was broken purely because it was telling me that there were no article title matches for articles that I knew perfectly well were already there. I'm just glad Tim Starling told me about a couple of workarounds that I could use instead, like the "Go" button and the Google search--though I still prefer the article title search. I can do without the article text search, but I did miss not being able to search titles! Susurrus 08:58 Apr 29, 2003 (UTC+10)
I always use google search now, even when wikipedia search is active. It's much faster, and it provides better control over AND vs OR, and similar things. I'll sacrifice up-to-the-minute search info for those benefits... Martin 23:36 Apr 28, 2003 (UTC)

Go button

Why is it that when I click the "go" button, it almost invariably takes me to the Talk page instead of to the article page? -- Zoe

I'm not able to reproduce this with a few random search words. Can you give some examples of words which do produce this effect? Is it consistent with the same word? --Brion 02:43 May 5, 2003 (UTC)
If the Go button finds no direct match with the exact title you entered (in some capitalization variants), it does a namespace-independent nearest title match. If the first title match is in the talk namespace, this is what it shows. That basically depends on the storage order in the database, which is more or less random. A smarter behavior might be to do another title match search for an article in another namespace until it finds one, and only display the talk namespace if there is none, but that would also mean more tries -- more queries -- less performance. Probably not significantly less, though. Eloquence 03:01 May 5, 2003 (UTC)
Ahh, that would be a problem particularly for pages which had been deleted and recreated, or renamed. Would tossing an "ORDER BY cur_namespace" in the query help? This would preferentially return the pages from non-talk namespaces in nearly all circumstances. That might mess up the search results order, though. Perhaps simply upping the LIMIT so the single query returns several results, which we pick through ourselves? --Brion 03:26 May 5, 2003 (UTC)
Yes, the manual-picking-through seems to be the only reliable way to do it -- we don't want to lose MySQL's relevance sorting. --Eloquence 03:58 May 5, 2003 (UTC)
I never even notice the Go button before. Kingturtle 03:03 May 5, 2003 (UTC)

Stopwords in titles??

The list of words not used in searches seems reasonable for words IN articles. Is there anyway that the search of article TITLES could use ALL words? -- 217.24.129.50

When we upgrade MySQL to version 4 (which has much better fulltext search capabilities, including exact phrase searching), we'll try to reduce or remove the stopword list. This'll have to wait a bit, as the last couple of revisions have had bugs which specifically affect types of queries that we use. --Brion 17:34 Feb 13, 2003 (UTC)

searching for worcester

I'd like an easy way to search for articles containing the text "worcester" that do not link to Worcester, England, Worcester, Worcestershire, etc, so that I can link them properly (if relevant, of course). Is there any way to do this? Martin 18:45 Feb 22, 2003 (UTC)

What I do in such a case is change my preferences for "Lines to show per hit:" (under "Search result settings:") from the default 5 (I think it is) to something like 150. This means it will return every hit in the first 150 lines of the article, instead of the first 5 lines. Unfortunately this preference has a rather unintuitive name. You'd think that if there were 5 hits in an article, then it would display all of them in the default setting, but rather it only returns the ones in the first 5 lines. It is then easy to read whether the text "worcester" is part of a link or not. I don't know whether this taxes the server too much, but maybe someone in the know would like to comment. --snoyes 22:33 Feb 22, 2003 (UTC)
snoyes, you are a star. Thanks! Martin


Typo

There is a typo on the search stub page that is in place while searching is disabled: "perfromance". I assume that text isn't accessible to ordinary Wikipedians, but if it is, just let me know how and I'll fix it. -- Jketola 21:05, 1 Aug 2003 (UTC)

The search is up, Watson?

Full-text search is back up, with no apparent slowing of the server - what happened? Did a bug just get fixed? -Smack 05:52, 8 Aug 2003 (UTC)

See announcement on the mailing list. Search is temporarily running off a copy of the search index table on the other server. It's a static copy so it will slowly become more and more out of date. --Brion 06:25, 8 Aug 2003 (UTC)
Will it ever be updated? -Smack 06:32, 8 Aug 2003 (UTC)
To reiterate: "temporarily". --Brion 06:38, 8 Aug 2003 (UTC)
Ah, yes. I missed that. But then what? -Smack 06:47, 8 Aug 2003 (UTC)
Quote from above-linked message: "I'd prefer to be running these sorts of things on a third machine, capable of being a full live backup database server, but we don't yet have one." --Brion 07:05, 8 Aug 2003 (UTC)
Thank you for bringing full-text search back, but there are some strange problems. If you type "west nile virus" or "history of germany" and click on "Search", then the desired result (article of exactly this name) is not displayed in the first place. The article West Nile virus is the 49th hit to be displayed. This may be a weak example, since you can access the article by clicking GO, but shouldn't the search function find articles with occurrences of all words first? Sorry, if this should have been discussed before. -- Cordyph 10:11, 8 Aug 2003 (UTC)
See section "limiting results" in Wikipedia:Searching. --Brion 10:16, 8 Aug 2003 (UTC)
Thank you. Sorry for asking the same questions again and again, but it is almost impossible to observe all articles in the Wikipedia namespace. -- Cordyph 10:19, 8 Aug 2003 (UTC)

search is not case sensitive

Can't you make searches non-case sensitive by default? Phys

have a look at Wikipedia:Searching#Search_is_case-insensitive, Fantasy 20:46, 13 Aug 2003 (UTC)
Search is not case-insensitive when there are capitalized words in the article title. For example, searching for 'nuts in may' returns no results even though the article with title 'Nuts in May' does exist. Surely it would be desirable to return this article as a match under these circumstances. Koyna

Can't search abbreviations

Searching for abbreviations, such as "MP3" or "USA" doesn't work. Why is this?

short word don't work, see Wikipedia:Searching#Avoid_short_and_common_words, sorry.
Try instead by just typing in the URL directly:
Hope this helps, Fantasy 08:30, 17 Aug 2003 (UTC)

Thanks, I'll remember that, though I still think this search engine desperately needs revising.

Search is disabled?

Moved from Wikipedia:Village pump on Thursday, September 25th, 2003.

So when is it coming back? Wikipedia is virtually unusable without some kind of search capability. Even a link to Google would be nice, like the last time. RickK 19:47, 20 Sep 2003 (UTC)

It'd be useful. But until they link Google, try adding site:wikipedia.org to your Google search. --Menchi 19:53, 20 Sep 2003 (UTC)
Google search form is back. --Brion 20:18, 20 Sep 2003 (UTC)

Server overloads?

Moved from Wikipedia:Village pump on Thursday, October 9th, 2003.

Why do I get so many 'server overloads' when I try searching for any article? It would seem to me that the problem may be lack of bandwidth. If that is the problem then why is it not being dealt with?

The 'go' function will take you to a page if it finds an exact match, but text search is disabled entirely pending server upgrades. --Brion 11:24, 1 Oct 2003 (UTC)
"If that is the problem then why is it not being dealt with?" People can't pull new servers out of thin air. If you want to contribute got to [1]. CGS 14:14, 1 Oct 2003 (UTC).

Where do searches go?

This is probably a simple data error, but I don't yet know how to fix it myself: When I enter "ct scan" in Wikipedia's mini-search bar I end up on "Ultrasound scan" (a related but different subject). There is a much more relevant page available, computed_axial_tomography. Searching for "CT scan" takes me there. Is it possible for a mere site-visitor to change where a search will take me? --195.22.85.154 14:43, 9 Dec 2003 (UTC)

Weird, when I enter "CT scan" (CT in caps) I get redirected to Computed axial tomography from CT scan, in lower case I get the same result as you do &mdash even though there is no ct scan. Strange. Anyway, you can also press on "search" instead of "go" and you can do a proper search for the words you entered. --snoyes 15:32, 9 Dec 2003 (UTC)
Heh, it happens because of #redirect. Ultrasound scan which redirects to Medical ultrasonography contains both "CT" (as the ending of #REDIRECT) and "scan". So, that's what it finds :) Maybe stuff like #redirect should be excluded from searches? Zocky 15:40, 9 Dec 2003 (UTC)
Yes, it is possible. Just create a wikipedia:redirect. Martin 23:16, 16 Dec 2003 (UTC)


Search Log

Back before the Wiki Search was taken down, there used to be a Search log, where we could see what things people were searching for. Is that still available? RickK 06:32, 23 Dec 2003 (UTC)

That was taken out in mid-2002. It wasn't really directly usable for clicking to create articles as most of the entries were misspellings and/or not exact titles (lowercase, missing articles, with extra terms for 'search engine' style). Further, I don't think most people expect that typing something into a search engine will record their query publicly for posterity. There are privacy issues. --Brion 08:50, 24 Dec 2003 (UTC)
That's too bad. I would think a log ranking search phrases by number of times it was queried would be helpful in creating useful redirects, or renaming pages, or indeed creating new articles. Even the mispelling information seems potentiall useful. I don't see why it has to be a privacy concern if no user information is associated with a term. Callistan 20:49, 13 July 2005 (UTC)

Search disabled permanently?

Is the text search of Wikipedia permeanently disabled? Every single time I go to use it, it says:

"Sorry! Full text search has been disabled temporarily, for performance reasons. In the meantime, you can use the Google or Yahoo! searches below. Note that their copies of Wikipedia content may be out of date."

Or am I doing something wrong? LUDRAMAN | T 17:31, 29 Mar 2004 (UTC)

It was disabled prior to the arrival of the new (fast) servers, but has not been re-enabled. — Jor (Talk) 17:37, 29 Mar 2004 (UTC)


More specifically, (as I understand it) the search was intended to use the one server which wasn't replaced, and is still awaiting return from repairs. - IMSoP 01:19, 30 Mar 2004 (UTC)
Rumour has it the old server is repaired and will be installed on Saturday. I don't think this guarantees the search will be back on though. Angela. 21:03, Mar 31, 2004 (UTC)
Update: that server still has problems. See the Geoffrin woes thread on Wikitech-l. Angela. 18:11, Apr 6, 2004 (UTC)

Case-Sensitive Questions

As I understand it, search is supposed to be case-insensitive, as referenced in Wikipedia:Searching#Search_is_case-insensitive -- but when I search for "dj leslie", the DJ leslie entry doesn't come up. Am I missing something, or should I make a bug report? -- Twiin 15:22, 05 May, 2004 (UTC)

I am finding the same problem University of York is not found by a search for 'University Of York' or 'university of york', and some articles have re-directs to deal with this such as Morse code and 'Morse Code' --John Bracegirdle 21:12, 16 May 2004 (UTC)
Search is case-insensitive, but searching is currently disabled. The Go button treats cases differently to the search option; it goes through six stages trying various capitalisation options before taking you to a page. See Wikipedia:Go button for full details. Angela. 07:31, May 17, 2004 (UTC)

Full-text search working

Full-text search appears to be working again, thanks to the new hardware. Very nice. Perhaps an announcement on Wikipedia:Announcements should be in order? - Plutor 19:27, 22 Jun 2004 (UTC)

I've updated the project page accordingly. --Diberri | Talk 04:17, Jun 24, 2004 (UTC)


I miss the google search box, can we have that back as well as the wikipedia searching? Spare a thought for those who can't spell well Dmn 17:30, 25 Jun 2004 (UTC)

Seconded. I too prefer the google search (which is sorted by relevance) to the page text search (but I still prefer Wikipedia's article title search). Having both Wikipedia's builtin search (title and page text) and the google search box would be the best of both worlds. cesarb 20:58, 26 Jun 2004 (UTC)
Where is the google search??? It was so much better... Sam [Spade] 21:46, 27 Jun 2004 (UTC)
Grr, I hate this search thing. Bring back google. See what happens when I typed Hilary Clinton instead of Hillary [2]. This isn't fair on those who can't spell well  :-(( 62.49.5.21 23:17, 27 Jun 2004 (UTC)
Like me! :D Sam [Spade] 23:19, 27 Jun 2004 (UTC)
Bring back the Google box. It made researching copyvios so much easier. - Tεxτurε 19:49, 7 Jul 2004 (UTC)
Here is your Google box: Now stop whining :) Adam Bishop 00:33, 8 Jul 2004 (UTC)

Perhaps you misunderstand? I like how I could so quickly find articles w similar words, yet w different spellings, etc.. The old way of searching was just so handy, and the new way... not :*( Sam [Spade] 00:46, 8 Jul 2004 (UTC)

I, for one, am extremely happy to see our own search engine back. Although it does have some faults, and Google is still very useful for searching outside sources, our own handy engine is much more useful for searching out misspellings, or finding out what needs linking. I commend the developers for bringing it back. Eclecticology 04:11, 8 Jul 2004 (UTC)
Thanks heaps to those responsible for getting Wikipedia's full text search facilities back online; I've recently been comparing Britannica Online to Wikipedia, and (apart from reliability of servers), text search was the only real big technical advantage they have. Having said that, I would support having a Google search option linked to from our native search; sometimes Google searches have their advantages. — Matt 17:51, 8 Jul 2004 (UTC)

Exactly, I didn't mean to say it's bad to have the full text thing, I just don't prefer it in exclusion of the old google option. Google is more handy for finding similar spellings of a given search. Sam [Spade] 04:53, 9 Jul 2004 (UTC)

This is driving me crazy, please give us a little google search box as well - i spend 10 minutes looking for Laser Dmn 20:35, 3 Aug 2004 (UTC)

Hear, all ye good people, hear what this brilliant and eloquent speaker has to say! Sam [Spade] 20:39, 3 Aug 2004 (UTC)

Wikipedia Lookups from IE Address Bar

I've discovered a cool way to directly go to Wikipedia articles from the IE address bar in Windows XP. First go here and download TweakUI.exe on the right-hand side:

Microsoft PowerToys for Windows XP

Install it and then run it. Open the Internet Explorer node on the left side, then click Search. Click the Create button, and enter these in the fields:

 Prefix: wp
 URL: http://en.wikipedia.org/wiki/Special:search?search=%s&go=Go

For the politically-minded of us, you can create a similar shortcut for going to Wikipedia namespace articles, like this:

 Prefix: wpw
 URL: http://en.wikipedia.org/wiki/Special:search?search=Wikipedia: %s&go=Go

Then you can type, for example, "wp Wikipedia" in the address bar to visit the article on Wikipedia, or "wpw Village pump", for this page.

Deco 05:12, 11 Jul 2004 (UTC)

That's cool! --Yacht (talk) 05:50, Jul 11, 2004 (UTC)


...and from Mozilla

How to do the same thing in Mozilla, Firefox, etc.:

  • Choose "Manage Bookmarks" from the Bookmarks menu.
  • Press "New Bookmark".
  • Fill in the Location field with one of the URLs given above (exactly the same format).
  • Fill in the Keyword field with the prefix you want, e.g. wp.
  • Press OK, and close the Bookmarks Manager.

That's it-- you don't need to download anything. Marnanel 16:51, 13 Jul 2004 (UTC)

Or, at least with Firefox, you can go to the link that says "Add engines" in the pull-down menu on the search bar. Just search for Wikipedia and add it to the bar.  – Jrdioko (Talk) 01:12, 14 Jul 2004 (UTC)

The Go, the Search, and the Ugly

Today is July 26, 2004 and I wanted to search the Wikipedia: namespace for articles on identity and anonymity. So I typed "anonymity" in the search box (I use the monobook skin) and clicked "Search" (not "Go"). This gave me a "Search results" page with dozens of irrelevant hits, most of which are not in the Wikipedia: namespace. But this was ok. At the bottom of the page was the form that I was looking for. I unchecked the Main namespace and all other namespaces and only checked the Wikipedia: namespace. In this form, there was no "Go" button, only a "Search" button. So I clicked it. And I immediately landed on the Main:Anonymity page, as if I had clicked a "Go" button. It turns out that the first HTML form had two <input type=submit> buttons. One with value=Go name=go and the other with value=Search name=fulltext. But the HTML form at the bottom of the Search results page had only one <input type=submit> button, featuring value=Search name=searchx. I think this "searchx" should be "fulltext" and that there should be a "go" button next to it. -- LA2 26 Jul 2004

Does anyone know why the default search (i.e. punching something into the text field and hitting "Search") searches Template Talk? --Ben Brockert 21:55, Jul 26, 2004 (UTC)

Hello?

Wikipedia_talk:Searching#Google_search seems to have a concensus of "lets bring back the google search". Sam [Spade] 23:32, 8 Aug 2004 (UTC)

I tried adding the Google form to MediaWiki:Powersearchtext but it didn't work correctly. If you want the form added, can you suggest how and where it should be added please. The old Google form will only work when searching is disabled. Angela. 22:18, Aug 11, 2004 (UTC)
Please bring it back. I'm dying here :-( Dmn / Դմն 23:25, 12 Aug 2004 (UTC)
Yay, google is back. Long live google search. Dmn / Դմն 19:15, 18 Aug 2004 (UTC)

People are always quick to moan about what they've lost, but not so quick to cheer for what they've got. Is there any way of still using the Wikipedia search? It was always great for searching the wiki coding on each page. I now miss that.
SimonMayer 13:59, 26 Aug 2004 (UTC)

The full-text search is still available (at least it was yesterday). I recall that it was disabled during peak times, however, but I can't find the reference. --Diberri | Talk 15:18, Aug 26, 2004 (UTC)

That is ok then. I was just concerned that we'd lost something important, but if it's on at late night, I'll cope.
SimonMayer 16:24, 26 Aug 2004 (UTC)

I am currently, together with User:Sj, establishing the page Wikipedia:Tools. The goal is to given an overview of tools for browsing and editing the wikipedia. I would suggest to integrate some parts of this page into Wikipedia:Tools and link to that page here. Anyhow, it is important to coordinate the contents of this two pages, to avoid overlap and confusion. -- 217.82.181.205 23:57, 24 Aug 2004 (UTC) 23:34, 24 Aug 2004 (UTC) (that is de:Benutzer:Duesentrieb)

Search availability randomness

Why is it that the search function seems to be randomly disabled and enabled every day? One moment it works, then later the same day it just offers the Google/Yahoo search. Is this an automatic load-dependent regulation? Gzornenplatz 15:35, Aug 26, 2004 (UTC)

"Search Wikipedia from a sidebar tab" removed - base64 is not readable enough to be safe

I removed this section because telling people to run a block of unreadable base64 code is not safe or wise. I don't have Firefox or Opera, so I can't test it, but when I un-base64'd and un-percent-quoted it, it didn't seem to have any problems, but still, since data scheme url's don't need to be in base64 it is better for readability for them not to be. Please find some safer way to write this; it is a good thing to have.

===Search Wikipedia from a sidebar tab===
Works with: Mozilla Firefox, Opera 7.

Wikipedia can also be searched via a sidebar tab of its own. To install the tab, copy the text below into the web address bar and press Enter, then click "Add Sidebar":

 data:text/html;base64,PGEgcmVsPXNpZGViYXIgdGl0bGU9IlNlYXJjaCBXaWtpcGVkaWE
 iIGhyZWY9ImRhdGE6dGV4dC9odG1sLCUzQ2Jhc2UlMjBocmVmJTNEJTIyaHR0cCUzQSUyRiUy
 RmVuLndpa2lwZWRpYS5vcmclMjIlM0UlM0NsaW5rJTIwcmVsJTNEc3R5bGVzaGVldCUyMGhyZ
 WYlM0QlMjJzdHlsZSUyRm1vbm9ib29rJTJGbWFpbi5jc3MlMjIlM0UlM0NoMSUzRVdpa2lwZW
 RpYSUzQyUyRmgxJTNFJTNDZm9ybSUyMGFjdGlvbiUzRCUyMndpa2klMkZTcGVjaWFsJTNBU2V
 hcmNoJTIyJTIwdGFyZ2V0JTNEJTIyX2NvbnRlbnQlMjIlM0UlM0NoNCUzRVNlYXJjaCUzQyUy
 Rmg0JTNFJTNDaW5wdXQlMjBuYW1lJTNEc2VhcmNoJTNFJTNDJTJGZm9ybSUzRSI%2BQWRkIFN
 pZGViYXI8L2E%2B

JesseW 02:07, 19 Oct 2004 (UTC)

I encoded it as base-64 so that there was no issue with line wrapping, and it's not much more readable using quoted printable encoding. It's probably best if I move it to my own website and simply link to it. --Carey Evans
Ah! That makes a lot of sense. I have some bookmarklets I list on my User page, that are pretty unusable by direct copying due to this problem. Moving it to your own site would work, since that way people can at least know they only have to trust you, not any random wikipedia vandal. Thanks for taking the criticism well, and thanks for writing the sidebar tab. JesseW 10:25, 28 Oct 2004 (UTC)

Recommendation

Clicking "Search" without entering any text should take you to an advanced seach screen (with the ability to limit by namespaces, etc.; the same screen you get when you type something in to search) rather than an oblique database error message. -Fastfission 05:39, 8 Jan 2005 (UTC)

I have a proposal for a policy improvement; people who are looking for music-related information to make an article about should use CDNOW to find out about music albums and their notability. There is an article about a famous album series that is full of redlinks and hasn't been improved for so many months; click Jock Jams and improve it in any way you can. --SuperDude 04:49, 21 May 2005 (UTC)

Font size in firefox

The search snippets are surrounded by 'small' tags, which are tiny and almost unreadable in the firefox browser. Are other people getting this? --Quiddity 09:04, 16 Jun 2005 (UTC)

How long before article can be found using Search?

I've been developing a page for a few weeks now, but neither the wiki- search nor Google can find it, using terms one might use to find such a page, or even searching for the specific name. If neither search engine can find the page, someone looking for information there would only find it if they happened upon a link on another page.

Is this because the page has only been started recently? How long before the searches will find the page? It doesn't make sense to me that there would be a time delay, but I cannot figure out any other explanation. Can someone explain? Thanks, Laszlo Panaflex 23:30, August 13, 2005 (UTC)

Redirects

How often is it that you search for something but the good answers droown in redirected pages? My idea: figure out some way (at least give the option) to eliminate all redirected pages from searches. HereToHelp 23:10, 17 September 2005 (UTC)

Good idea. -- Ec5618 17:43, 24 October 2005 (UTC)

While this page explains searching Wikipedia in full, it might be helpful if some simplified version were available, for new users in particular. If no-one objects I'd like to create a stub, at least. -- Ec5618 17:43, 24 October 2005 (UTC)

ignore diacritics

I am admin at the lingála wiki and I am contributor at the german and alemanic wikis. There is a problem with the search engin. When I cannot write the diacritics (as a user) or the use of diacritics is not know to all the users (not in german, french or english, but in alot of not teached languages (p.ex. kikongo, lingala, ciluba, kiswahili, ....), I cannot find an article. Example from the german wiki: If I search lingala, I am linked to the article Lingala. When I use the lingala spelling of lingala lingála there is one 7%-result (list of languages of the world). Example from the lingála wikipedia: If you are congolese and you don't no how to type ɔ and ɔ́ you cannot find the article about your country in your language: Kɔ́ngɔ - even there is in some older dictionarries the spelling Kongó. Well there is a possibility to make for each article 4 or 5 redirects with different spellings. In german there is a redirect from Fluß to Fluss, but one from Strasse to Straße. There is obviously no rule (in Germany and Austria: Fluß, Straße; in Switzerland and Liechtenstein Fluss, Strasse). If the wiki search engine could learn that letters with and withou diacritics are (more ore less) the same, that ɔ, ss and ɛ are similar than o, ß and e, it would be grat and very helpful.

  • o search also ö ó ô ǒ ɔ ɔ́ ɔ̂ ɔ̌
  • e search also ë é è ê ɛ ɛ́ ɛ̂ ɛ̌
  • a search also ä á â ǎ
  • u search also ü ú û ǔ
  • i search also ï í î ǐ
  • ss search also ß
  • ß search also ss

--Etienne 14:46, 20 December 2005 (UTC)

poor searches

the search engine is very poor. a search for "Swallowed in the Sea", a song from coldplay, is nowhere in the first 10 results. 59.93.129.176 16:50, 6 January 2006 (UTC)

There is no article by that name. You may be looking for X&Y or Twisted Logic Tour, both of which mention the song. -- Ec5618 17:05, 6 January 2006 (UTC)

If you want, you can added this:

  • Wikimedia-Search search the realtime index for every project with suggest function

Articles I've created can not be found via search engine !?!

I've created a couple of articles such as Aleksandr Zinovyev, Mikhail Meltyukhov, Leonid Stolovich and Wilhelm Külz. But to my great astonishment, I recently discovered that none of them can be found by searching (if I just type the name of the article and press GO, then, of course it works, but if I press SEARCH, nothing is found). What's the matter??? Constanz - Talk 10:46, 29 January 2006 (UTC)

The Wikipedia search utility hasn't updated yet, which is why the full text search won't work. A simple 'go' search however locates a file matching the exact search query, and returns the right result. The project page intro explains it all. -- Ec5618 13:17, 29 January 2006 (UTC)



This is not the place to ask questions. Please see Wikipedia:Look it up if the article is confusing.

infertility

if all tests are normal why pregnency is not occuring?

estado moderno

Articles for deletion?

It appears that the discussions of articles nominated for deletion are saved somewhere. How can I search for past article-for-deletion discussions? Kestenbaum 21:57, 17 February 2006 (UTC)


Search box on this page?

The new version of the front page will have a link called "Searching" to here, Wikipedia:Search. Because it now seems unlikely that a search box will be placed prominently on the new Main Page, a certain percentage of people, trying to search but not seeing the box on the left, will click on "Searching" and get to this page. Because of that, I think it makes sense for the top of this page to include a large and prominent search box, above the "Wikipedia contains articles..." paragraph. Any thoughts? zafiroblue05 | Talk 02:13, 26 February 2006 (UTC)

That sounds like a good idea to me. It could definately be helpful for those poor lost and confused individuals. --Paulie Peña 02:09, 1 March 2006 (UTC)

Is the term "bookmarklets" used correctly?

I always thought that bookmarklets were javascript code in a bookmark (the Bookmarklet wikipedia article says as much) and that the those Mozilla and other Gecko browsers called their keyword searches "Quicksearches." Therefore, shouldn't we change the headline "Search Wikipedia using a bookmarklet" to "Search Wikipedia using a Quicksearch" and mention "bookmarklets" under the "Javascript in Bookmarks" headline? Does anyone agree or disagree? --Paulie Peña 02:09, 1 March 2006 (UTC)

Further discussion..

Please see Main Page/Development for more discussion on this page's current development. --Quiddity 01:00, 8 March 2006 (UTC)

Search result page option?

Newbie question, sorry if it has been answered somewhere ... By default, it seems the searching returns the page with exact name (if exists) that matches the search text. Is there any preference setting to always return the list of all pages with names containing the search text instead? Thanks. --Elo0000 23:11, 22 March 2006 (UTC)

  • The "Go" button under the search box (or hitting enter) will take you the exact match, if it exists. However, the "Search" button will give you the list of pages with the keyword you entered. I'm not sure of any preferences that would make hitting enter give you the search results list, but hope the search button will do. --Aude (talk | contribs) 23:28, 22 March 2006 (UTC)


How can Wikipedia be so awesome but have such a sucky search?

Comments?

Why doesn't Wikipedia integrate the Google search into the Wikipedia search function? Honestly, the search function is far and away the facet of Wikipedia that I have the most trouble with...Google offeres the "Did you mean to search..." function as well as the ability to find phrases, etc., etc., etc. Bottom line is that it's much better though...Anyone have any thoughts/ideas/answers? Jarfingle 09:54, 9 April 2006 (UTC)

It is at least an informal policy of the Wikimedia Foundation (which provides the money to the run the servers) to use open source software. Google search is not open source. I agree an exception could/should be thought about in this case. -- Rick Block (talk) 15:58, 9 April 2006 (UTC)
I agree, the Wikipedia search is horrendous and Wikipedia would be infinitely times more useful and beneficial to society if the search yielded the correct results... the case sensitive thing is absolutely ridiculous... I think that if the community voted on a measure that would implement the Google search it would pass with nearly 100% of the vote... Hoopydink 13:03, 14 May 2006 (UTC)

I

Gotcha, is there a protal for requesting exceptions be made somewhere? I'd like to do what I can to change this... Jarfingle 03:35, 10 April 2006 (UTC)

Contact information for the Foundation is at http://wikimediafoundation.org/wiki/Contact_us. -- Rick Block (talk) 04:17, 10 April 2006 (UTC)

Not seeing any "List redirects" tickbox

The project page says that users can "Check or uncheck the tickbox 'List redirects' ... at the bottom of a search results page" but I'm not seeing any such box. Has this feature been removed? —Chris Chittleborough 01:12, 15 April 2006 (UTC)

I've never seen such a feature, although it's a great idea. I'll remove it from the page. Melchoir 22:16, 16 April 2006 (UTC)
Hmm... looks like the comment dates from September 2002. I certainly wasn't around then... maybe there was such a feature. Melchoir 22:21, 16 April 2006 (UTC)
There's a request for this feature in mediaZilla — see item #3174. (I've experimented with adding -"#redirect" to search strings, but it doesn't seem to help.) Oh, well. CWC(talk) 00:52, 22 April 2006 (UTC)

what's up with the search engine link? it says that you are going to a page that is not involved with wikipedia, yet when you click on it, it takes you to another page on wikipedia. please fix. ill attempt to fix it........ - Bagel7

definition of "self"

Person within and attached to a particular body

You might be interested to read this [3].

"Additional CAP partners include The New York Public Library, one of the most renowned libraries in the country; Project Gutenberg, the Web's oldest producer of free electronic books; University of Michigan's OAIster project, which provides hard to find academic collections; UCLA's Cuneiform Digital Library Initiative (CDLI) with content documenting Babylonian history back to 3500 B.C.; Wikipedia, a free, multilingual online encyclopedia with articles in more than 50 languages; and the National Science Digital Library (NSDL), the National Science Foundation's online library, with more than 250 collections that improve the way Americans learn about science, technology, engineering, and mathematics. The OYEZ, CDLI and NSDL projects are all federally funded in part or in whole by the National Science Foundation."

Dori | Talk 17:36, Mar 2, 2004 (UTC)

I am surprised this has generated so few responses. Wikipedia is under Yahoo!'s Content Acquistion Program... does this mean they will index us a lot... or something else? Pete/Pcb21 (talk) 23:18, 2 Mar 2004 (UTC)
Jimbo has commented on the mailing list:
"Yes, I was in negotiations with Yahoo about this last week.
We have a contract in which we supply them with an XML feed (which I will have Jason construct) and they stick us in their index. They make no promises as to the placement of our urls, of course, as that's entirely up to their editorial department. But of course we have absolutely maximum quality content, so it is thought by all that we will rank very high in their index.
The area that this will benefit us most is when news breaks on some topic about which there is little information on the net -- an area in which we excel anyway.
I tried to get their PR person to feature us more prominently in the press release, but alas, she didn't listen to me.
--Jimbo"
Pete/Pcb21 (talk) 23:27, 2 Mar 2004 (UTC)
This sounds incredible! It also sounds like we will need to be even more vigilant on RC, especially concerning articles being highlighted by Yahoo! news. :-) Jwrosenzweig 23:32, 2 Mar 2004 (UTC)

Google search not finding main entries

Back in October, I wrote new entries for Carl Spaatz and Lyman Lemnitzer. Today I decided to do a search and see if there were any mentions of their names that were not linked back to the main entries. I did, in fact, find two such mentions. But I also found that a google search under "Spaatz" or "Lemnitzer" failed to provide a hit on either of the main entries for these men. Obviously both names were mentioned several times in the relevant entry. Other entries with links to these entries were listed (such as List of people associated with World War II). Google even had the links from my user page which post-dated the creation of these entries. So why doesn't google pick up on them? MK 15:34 (EST) 30 November 2003

One month isn't that long for Google to find something, particularly if the pages that link to it have a low page rank. Angela 23:37, 30 Nov 2003 (UTC)

Saving a page, but it doesn't appear in Google

What am I doing wrong? I have started, written and saved a page. I then log out, clear the computer of cookies, and do a google search for the page I have written. Google finds it, but always opens it in the edit mode, rather than as a completed document. Is this normal or am I doing something wrong? Ragussa 13:25, 10 May 2004 (UTC)

You're not doing anything wrong. Google just doesn't update it's links that frequently. Give it a little while, and it'll show up on Google just fine. theresa knott 13:38, 10 May 2004 (UTC)
The other part of this is that Google seems to index the edit pages for non-existent articles. Don't know why, there's a meta tag that says not to. -- Cyrius|&#9998 13:47, May 10, 2004 (UTC)
Anyone have any stats on how long it takes, on average, for a newpage to get indexed, and what variables affect the time taken. The Village Pump was indexed on the 8th May (two days ago as I write), but is a very frequently updated page. Clements Markham I started on April 7 and is now #1 google hit for the name. Ranulph Fiennes, on the other hand, I started on April 13, and it appears not to have been indexed yet. Pete/Pcb21 (talk) 13:51, 10 May 2004 (UTC)
I believe "more worthy" pages are indexed more frequently, where "worthy" is a combination of frequently-changing, well internally connected, and well-linked-to from outside the site. Of these the last probably carries the greatest weight. The Fiennes article's version on Nationmaster has been indexed: [4] -- Finlay McWalter | Talk 14:58, 10 May 2004 (UTC)
According to Google's_FAQ_page, depending on when the page was submitted and the timing of its web crawls, it may take 6 to 8 weeks for a new page to be added to Google's index. GUllman 21:37, 10 May 2004 (UTC)

Wikipedia Talk namespace and Google

moved from the Reference desk by IMSoP 17:26, 6 May 2004 (UTC)

Why is it that Wikipedia Talk pages are invisible to Google, even to Google searches within the Wikipedia domain?

Please give specific examples. I've seen many WP pages returned by both a general google search, as well as ones limited to WP. Google, unlike some other search engines, does suppress certain inherently transient pages such as VfD. Niteowlneils 06:10, 5 May 2004 (UTC)

So far as I know, they aren't. I've seen them come up in search results. RickK 03:42, 5 May 2004 (UTC)

Out of interest, how is the suppression of VfD done? Is it a X-archive:No type thingy (I know I've probably got that slightly wrong, but a tecchy willl know what I mean) or some other method? Article link will do. --bodnotbod 15:26, May 6, 2004 (UTC)

Specific example: The Talk page for the article egg white definitely contains the words albumin, albumen, Eiweiß, and Wikipedia, but when I searched in Google for "albumin albumen eiweiß wikipedia" only one page turned up, and it was most certainly not the page I was attempting to access via Google.

Let me explain how google interacts with wikipedia. First off, most of those words weren't in that talk page until they were added today. Google only crawls wikipedia occasionally (theoretically weekly, but that's very variable). And google's search function can't report results for pages which its crawl function hasn't visited. It certainly hasn't had time to see those particular ones. Moreover, google doesn't crawl all the pages in the entire wikipedia at any one time (infact, it doesn't necessarily crawl all the pages ever). Google's algorithm for figuring out what to crawl, and how frequently, while related to pagerank, is secret, variable, and frequently site-specific. So we don't know what they chose to crawl, and what they chose not to. If you simply search for "talk" you'll see that google has crawled and indexed lots of talk pages. Another thing: talk pages are very unlikely to be linked to from a source outside the wikipedia - while folks will link readily to an article (from their website, their blog, or whatever). Google generally values links from outside much more highly than internal ones, and this may well explain why lots of talk pages are deemed "unimportant" by google, and thus not crawled at all. You wouldn't be the first person to express frustration at wikipedia using the fallback of the google search engine, rather than mediawiki's own search function - but we're (perpetually) hardware-poor, and the built-in search function isn't something we can afford to enable. -- Finlay McWalter | Talk 22:11, 5 May 2004 (UTC)
Folks, should I add a page (like Wikipedia:Google issues, linked to from Wikipedia:Searching) which explains stuff like this (as this isn't the first time I've answered this kind of question)? -- Finlay McWalter | Talk 22:17, 5 May 2004 (UTC)
Sounds good to me, but it's you that has to put the work in ;op --bodnotbod 00:10, May 7, 2004 (UTC)
Oh, the last crawl appears to be May 3rd. Wikipedia:Searching says the crawl is monthly, not weekly, but that might be out of date. -- Finlay McWalter | Talk 22:25, 5 May 2004 (UTC)
Is there any place that has a discussion of what it would take to have our own search engine with an index that gets updated perhaps once a day? How much hardware would it take to enable the built-in search function? nroose Talk 18:42, 30 May 2004 (UTC)

Google/Wikipedia search engine problems

I have a question about the current Google/Wikipedia search engine, or comment. Namely, it seems to produce very inconsistent, incomplete, or paradoxical responses to inquiries. A few examples:

  • Oftentimes, I will read an article, and then do a search on the title of that article (EXACTLY as it is in the title, verbatim, down to caps even) and it fails to be found by Google to be on wikipedia. I find this very strange. Sometimes, nothing is found; other times, other articles, that only very indirectly link to the article, are produced. For instance, if one does a search on "Modular group" or even "Modular group Gamma", one doesn't get a link to the article entitled "Modular group Gamma", instead one gets a link to a user page, where "Modular group Gamma" is listed among several hundred pages created and/or edited by the user. What is going on here?
  • Another example: If you do a search on "Gauss" (or even "Carl Gauss"), then you won't get the article on Gauss the person for at least a couple pages (if that, I gave up after a while), you get lots of articles with "Gauss" as a keyword, or linking to "Gauss", but not to the article on Gauss himself. This seems very strange.
  • Many times, when looking for a specific article, (to see if it's there) I do a search and get absolutely nothing. But then, I say, "well, Google has failed me in the past, let me try directly" and I type in the actual URL of what should be the article page, and up in comes!! There it is!

This is what I find most disconcerting about the search engine. Someone will look something up, not get any results, and just assume that it is not present in the wikipedia. They won't know the little tricks about following other search results, going to more "meta-" pages (e.g. in math, going to major mathematical pages and looking around), or typing in URLs directly. This doesn't give a bad impression to newcomers, but it certainly fails to take advantage of everything that IS here. And it's a major inconvenience to people who use the wiki.

I would like to know if I am the only user that this happens to. I only bring it up in the village pump because it has been a common, persistent, recurring problem for me ever since I started (or ever since the Google/wikipedia page came up). It's not just an isolated incident with a few searches. Revolver 15 Nov 2003

I think this is because google is confused about www.wikipedia.org, en.wikipedia.org, and en2.wikipedia.org. I suspect it will settle down some in the following weeks. Also, google will often (some say always) be inconsistent on results for a website that (like this) changes often - different search servers at google (all which _appear_ to be www.google.com) are looking at subtly different sets of crawl-data. So sometimes (especially during the "googledance", when they progressively update these database-copies) two identical queries will produce different results. And as to the "right link being way down the search", that's a function of google's (secret, and utterly arcane) pagerank algoritm - there's not much we can do about that, as manipulating google's rankings (for good or ill) is notoriously difficult. -- Finlay McWalter 20:51, 15 Nov 2003 (UTC)
I also had similar experiences. I have a theory for this. According to my theory, Google/ whatever might consider a page for searching only some time after the article is written, probably because it allows time(which seems to be around a month or so) for vetting by enough number of people. But the links to user pages and individual words in articles comes into the realm of search because in most cases the user page and the words that link exist much prior to the creation of the article. However, having said that, one of pages which had been there for sometime in the third page in Google suddenly disappeared for me totally, even after including the word wikipedia. I can't figure this one out. KRS 05:06, 16 Nov 2003 (UTC)
I duuno about that. I've seen articles I've written pop up in the index within a day or two, but they have been ones that were listed (I think) for a day or so on the main page. -- Viajero 13:39, 16 Nov 2003 (UTC)
Google only shows an article when it rescans that portion of the site. An article with few or no links may take a lot of time to be found and indexed on Google. An article listed on the main page will be found and indexed very quickly. To help, it's useful to link to articles from their parent subjects and to link to related articles, so Google and other search engines can follow the web of links between related items. Jamesday
Google supposedly has automatic functions to remove pages that "spam the engine", by creating synthetic cross-links or by posting many identical pages on different domains. Because of the license model, there are many fairly identical copies WP pages on non-WP sites. And of course WP has heavy internal cross-links. Could this be causing a 'false positive' in the Google spam-killer? Anjouli 07:28, 18 Nov 2003 (UTC)

Google Indexing Update

It seems to me it's been a while since Google updated its Wikipedia index. Is that our fault (i.e., did we accidentally tell its robots to go away in one of our files), or is it their fault, or is it my psychotic delusion? -- Someone else 11:22, 16 Nov 2003 (UTC)

  • Having a look at the stats for this month [5] (second to last table) the googlebot has made over 90,000 visits and accounts for 1.25% of all hits. Going by percentage and comparing with previous months this is about normal. However going from www. to en. seems to have affected the rankings of wikipedia pages in google. Also having both en. and en2. addresses doesn't help. But it will hopefully settle down in a month or two. -- Popsracer 11:54, 16 Nov 2003 (UTC)

Google Appliance?

Maybe this is obvious and has been discussed before, but have we considered using an appropriate Google Search Appliance [6]. This is actual hardware that would need to be purchased that would sit in the racks of our servers and could be setup to index the entire Wikipedia every day. I don't know how expensive this solution is or whether "we" can afford it, but it looks like an ideal solution to the problem. Any comments? -- FrankH 17:24, 30 May 2004 (UTC)

Atomz Search?

Has anyone explored the Atomz Search application? (see also their FAQ). I use the free Express version for my personal site, and it works very well, is extraordinarily customizable, easy to integrate into a site, and you control when content is indexed. I've seen it on many other pro sites as well. It's a pay service for sites with more than 500 pages (heh), and of course there's no pricing on their website -- it's ye olde "contact our sales staff" routine. It may be too pricey for a site as large and index-intensive as this one, but it should at least be worth exploring..... especially since the search application in hosted on their servers, not ours. And who knows, they may be willing to negotiate a deal with a site as prominent as ours is becoming. Perhaps Jimbo or someone else with an idea of how much we would be willing to spend to have a reliable internal search mechanism could contact them...? --Catherine - talk 19:31, 17 May 2004 (UTC)

Google results: Mirrors vs Wikipedia

I'm sure this must have cropped up before, but I can't find it; can anyone point me to a relevant discussion? Anyway, I did a search on Google today for Lucifer cipher, and in the top 10 results were no less than 7 mirrored copies of the Lucifer (cipher) page, but not the Wikipedia article itself, which surfaces at position 70. This seems to happen a lot for various articles, and is somewhat annoying (especially since the mirrored pages are out of date and advert-laden). Anything Wikipedia can do? Feel free to point me to the previous discussions... — Matt 13:42, 18 May 2004 (UTC)

Many people are PO'ed about this, and I have no idea how it could be fixed (short of someone at google taking some action). If those pages were respecting the GFDL to the letter, they would link to the exact Wikipedia article, which should raise the pagerank of the Wikipedia article, and eventually bring it to the top, but this does not seem to be happening. Dori | Talk 14:06, May 18, 2004 (UTC)
[From one of the PO'd peeps]. This is a relatively recent problem (roughly since thefreedictionary.com came along) but must be costing us traffic, and is a long term threat to the continued growth of the GFDL corpus. What is odd is that, despite at least some of the mirrors linking to WP and thus making it probably the most linked to version of the page, WP comes so low. Are the other sites so good at search exchange optimization/google-breaking? Has WP somehow fallen foul of a negative points score due to being seen as a link farm somehow? Pete/Pcb21 (talk) 15:42, 18 May 2004 (UTC)
They have as many links as we do (as a mirror), but it's just that they were up when wikipedia wasn't and google spidered them first. Dori | Talk 16:22, May 18, 2004 (UTC)


How does Google index Wikipedia?

Wikipedia would seem to be part of the "deep Web" and hence inaccessible to Google. That is, there isn't any static page that links to all the other pages (or a static tree of such links). So how does Google's spider find articles? Does it watch special:newpages, or does it have a Wikipedia-specific search procedure (perhaps based on special:allpages), or what? The speed with which new Wikipedia articles get indexed is astonishing.... Dpbsmith 16:11, 9 Jun 2004 (UTC)

New articles will be found because other pages have links to them. Usually when someone creates a new page, they link to that page from a pre-existing article, which is already in Google. - DropDeadGorgias (talk) 16:18, Jun 9, 2004 (UTC)
Unless someone with access to the apache logs undertakes a detailed study, we really don't know how google spiders wikipedia. I understand that google maintains customised crawler preferences for the top websites (tuning things like search depth, frequency, and which things to ignore) but I've no evidence that they've done this for wikipedia. I agree with DropDeadGorgias' suggestion - ephemeral things like special:newpages and special:recent_changes change too quickly to be of much use to the crawler (which visits most sites no more frequently than weekly). -- Finlay McWalter | Talk 16:34, 9 Jun 2004 (UTC)
I have realized, that lonely pages did not get indexed. So, google is another reason to make special:lonelypages shorter.
uhm, first of all: to make special:lonelypages work. Or are there orphans no more?
You might use User:Topbanana/Reports/Nothing_links_to_this_article as the temporary alternative. And the other items on User:Topbanana/Reports give a lot of work for those who like cleanup work - spelling mistakes, missing interwiki, most wanted articles etc. andy 22:25, 11 Jun 2004 (UTC)


Countermeasures: Page rank

I wasn't sure about something, and wanted to bring it up here. Basically, mirror versions are appearing much higher in google than we are. The explanation people give for this is that they're somehow manipulating the pagerank system. My question is not "how" (I'm not technical enough to really grasp), but rather "Could we do this too?". [[User:Meelar|Meelar (talk)]] 13:44, 2004 Jul 21 (UTC)

One key way they seem to do this is by including a series of phrases such as (for an article called Stuff): "What is Stuff? Information about Stuff. Stuff definition..." — Chameleon My page/My talk 15:07, 21 Jul 2004 (UTC)

How could we do this without putting it in the article text? [[User:Meelar|Meelar (talk)]] 19:42, 2004 Jul 21 (UTC)

They often include it in the page <title>, and presumably also in the meta tags. It could also be incorporated in small text at the bottom of the article. — Chameleon My page/My talk 20:24, 21 Jul 2004 (UTC)
By the way, the book "Google Hacks" includes some basic SEO tips in the final chapters. (Just thought I'd mention it.) Lucky Wizard 02:09, 30 Jul 2004 (UTC)
I read somewhere on Wikipedia that at least once someone wrote to google about a mirror having a higher rank than the real wiki, and the people at google fixed it. Not sure if this is possible for the entire wikipedia, though. -- Chris 73 | Talk 22:16, 21 Jul 2004 (UTC)
Good idea, how about a few GOOGLEBOMBS too?
I propose we collectively draft an official letter to Google on the matter, as well as working at our end to boost our ranking to the level it deserves to be at. — Chameleon My page/My talk 10:44, 22 Jul 2004 (UTC)
Such a thing is not unheard of; for special things, Google will give preference to certain sites over others, and these things are built in. See: UPC search, definition search. So it's not like Google would just dismiss it out of hand, and in fact, I think they might enjoy more integration. But how about WE get that integration, instead of one of the cheap ripoffs? I support this and think it should be done quickly. --Golbez 05:16, 23 Jul 2004 (UTC)
Google has a feature that allows you to only search websites that have to do with certain subjects, like Mac, linux, U.S. Government, etc. I think that we should ask them to do a similar thing for Wikipedia. But is telling the difference between Wikipedia and a mirror really that hard? Search results that are from Wikipedia look like: ARTICLE NAME HERE - From Wikipedia, the free encyclopedia. [[User:Mike Storm|Mike Storm (Talk)]] 02:09, 23 Jul 2004 (UTC)
The real problem here is that potential users of Wikipedia and thus possible long term contributors to the knowledge base are regularly diverted away from what is the real source of the information in the first place, and to which they maybe one day would otherwise contribute. However, as a follow up to my suggestion that we might think about a PD search engine to slaughter Google, it looks like the cathedral once again is out to do for the bazaar: [7] Conceptually nice, however the thing seems to be down or broken a lot at the moment. However, I think this shows that the cathedral has had enough of the antics of the bazaar and has decided to act already. Sjc 10:57, 23 Jul 2004 (UTC)

Wikipedia:Database download gives the technical reasons we are almost assured a low Google ranking: because we're database-bound, crawlers are restricted to one access per second. Our mirrors are typically flat HTML, so can be crawled much faster.

I suggest that there's not much point worrying about our Google ranking until we are confident we have the server power (enough Squid frontends, I would guess) to handle the traffic. Remember that the deal with Yahoo doubled our load in a week - David Gerard 10:40, 23 Jul 2004 (UTC)

I just found my article on effeminacy on the free dictionary.com. I don't see where they referenced wikipedia nor myself. I wish I could get credit for all my hard work.WHEELER 23:58, 27 Jul 2004 (UTC)

Wikipedia is referenced at the bottom, below the stuff like "free dictionary browser". Lucky Wizard 02:09, 30 Jul 2004 (UTC)

I would encourage everyone to submit the articles they care most about to DMOZ, the basis directory for Google and other search engines. This may eventually ameliorate some of the problems related to searchability. -- Stevietheman 17:39, 29 Jul 2004 (UTC)

Finding non-WP sources of information

Is there a trick to googling information that is not Wikipedia-derived? I try searching "foo -wikipedia -gnu" but it still finds thefreedictionary, etc. even though thefreedictionary has the word wikipedia in it. I look online for information about something and I have to wade through 2 pages of non-obvious Wikipedia clones before I find something written by someone else.  :-) - Omegatron 17:06, Aug 10, 2004 (UTC)

Lack of Wiki hits in Google

Is anyone bothered by the fact that the 'pedia no longer appears anywhere near the top of searches in Google? When I put in Syagrius or magister militum I get any number of sites containing copies of the wiki text, but not this site itself - in these two cases I gave up looking. The info on these sites is presumably copied at some moment in time and therefore "frozen", and is therefore less likely to be accurate. Please forgive me if this is a subject that has been raised before, but I couldn't find any mention of it. Djnjwd 23:02, 22 Aug 2004 (UTC)

It has been mentioned before, but what can we do? We can't force google to put us top. Theresa Knott 00:08, 23 Aug 2004 (UTC)
Yes, you can. Go to google and put in a word. Look at the top-right of the screen; that word is hotlinked. It takes you to selected definitions from certain sites. Would it be too much to ask Google that they give us the same consideration? Maybe I will, but it'd be nice if someone official did it. --Golbez 03:19, 23 Aug 2004 (UTC)
We can send google quality complaints. Basically, these other sites optomize for google and we don't; that's why they kill us in the google rank →Raul654 00:15, Aug 23, 2004 (UTC)
Is it true that a possible reason for the low Google rating is low reliability? The other dictionary sites are more consistently, available than Wikipedia. I think uptime has been pretty good for a while now, but the site still rates relatively poorly... David Remahl 00:22, 23 Aug 2004 (UTC)
Similarly, latency might figure in, if relevence is considered equal. I think our only real hope is to enforce our license so that people can get from any mirrored page to the "live" page. We may consider modifying our license slightly to ensure that the link is prominent (many are at the bottom of long articles in a tiny font). Derrick Coetzee 00:51, 23 Aug 2004 (UTC)
I hope something like this is done. Google ranking wouldn't matter all that much if when people had read a mirrored article once they knew where it came from originally, and that the mirror was inferior, and came here in future. — Trilobite (Talk) 00:56, 23 Aug 2004 (UTC)
I'm wondering if we could legally sue the people who own these domains (do a detailed WhoIs to find who it's registered to). I'm not sure if Wikipedia could get a team of lawyers, but is it actually possible? Ilγαηερ (Tαlκ, cοηtrιbs) 02:17, 23 Aug 2004 (UTC)
Well, you certainly don't need to start with suing. There is a standard license enforcement sequence, starting with polite requests, to more sternly worded, then threatening legal action. I know it has worked with a number of sites. I don't know where on wikimedia, but this has another place it is being discussed actively, somewhere on meta I'm sure. - Taxman 02:45, Aug 23, 2004 (UTC)
Wikipedia:Mirrors and forks has a non-compliance process, including a Wikipedia:Standard GFDL violation letter. -- Chris 73 Talk 02:58, Aug 23, 2004 (UTC)
But even the compliant sites shouldn't be above wiki. Suing aside, this is sort of a problem. Ilγαηερ (Tαlκ, cοηtrιbs) 04:26, 23 Aug 2004 (UTC)
Why not? From Google's perspective thefreedictionary.com is a better site than Wikipedia. It has virtually the same content, faster response time, better use of tooltips/metatags/etc... Yeah, Wikipedia is becoming well-known and has a gazillion links to its homepage, but links to specific pages aren't that common, so the clones don't lose out from this perspective either. AFAIK Google doesn't have a weighting for being the "original". Pcb21| Pete 09:15, 23 Aug 2004 (UTC)

There's discussion on this ongoing at Wikipedia:Send in the clones. [[User:Meelar|Meelar (talk)]] 06:38, 2004 Aug 23 (UTC)

There is also a valuable discussion about problems with Google on Wikipedia:External search engines. Some of those discussions began last year, and it looks as though each intake of new editors asks the same questions - and gets the same answers! May I suggest these pages plus this current Parish Pump discussion are somehow consolidated (by an administrator?) and placed on the Community Portal page with a heading like 'Wikipedia and search engine difficulties'. That way we have somewhere to keep an eye on it. It might be noted in whatever welcome material we sent new editors to draw their attention it.

There are also the regular pages Search engines and Google which so far as I can see do not touch on this problem. Apwoolrich 13:20, 25 Aug 2004 (UTC)

I tend to believe that there's also a software (Google compatibility) issue involved. I did a Google search for the first paragraph of our Cohortative mood article - it's a vanity thing, and Firefox make such searches very simple - and got three results: 2 from thefreedictionary.com (which does link to the Wikipedia article, although the article has been since moved) and one from wikiverse.org (which doesn't). This means that not only Google does not rank the Wikipedia article highly, it is also entirely unaware of its existence (the same can be asserted using a Google cache query). Worse still, particularly when Wikipedia's search is disabled, is that, naturally, Google is also unaware of the article when doing a Wikipedia-specific search. The article is also not particularly new; presumably, Google scans the Web every 30 days, and the article is seven months old. One reason for this (and for other issues) is possibly Wikipedia's Crawl-delay value set at robots.txt. While it not particularly high (in fact, it is minimal), Wikipedia is pretty big, which might discourage even usually-reliable Google. -- Itai 14:31, 25 Aug 2004 (UTC)

Improper Google Define linking to Wikipedia

Google Define (that is, typing "define:{term}" at a Google prompt brings up "definitions" (as Google sees it) for {term}. It frequently uses WP for these (if there is a WP entry for {term}, I've never *not* (forgive the double-negative) see Google Define pull it up. The problem is when one wants to click-through to the entry on WP. If it is a multi-word entry, the link is invariably malformed. Instead of using underscores ("_") for spaces in the WP link, Google uses plus signs ("+"), by which WP will fail to find the entry.

How to correct this?

1) Contact Google about this & get them to fix it.

2) Work around it when WP sees that an entry containing "+" is 404 & the link referrer is Google.


Attempts at correction:

1) I have been contacting Google for about 6 (six!) months (as of this entry), & they have not contacted me back. I have sent them about a dozen requests, but the links are still wrong.

2) This entry is my first step toward the second solution. I have no idea where else to begin, & I am hoping for feedback to point me in the right direction.

TSamuel (talk) 20:44, 2 February 2008 (UTC)

Have you tried clicking them? The links seem to work fine for me - just because you see a + doesn't mean it's not getting translated to a space at some point - it works for me --Random832 (contribs) 16:02, 16 April 2008 (UTC)

badware?

http://www.google.com/search?sitesearch=en.wikipedia.org&ns0=1&q=test&fulltext=Advanced+search lists http://www.google.com/interstitial?url=http://en.wikipedia.org/wiki/v:de:Kurs:Software-Test uh? --78.34.4.125 (talk) 14:52, 31 January 2009 (UTC)