Hawaiʻi's Technology Community

Japanese & Chinese in URLs

Do any of you have an opinion on the use of Japanese and Chinese in URLs? Note, I'm not talking about Japanese domain names, but use of Japanese in the path and query portions of the URL. We are working on a system that has a wiki component. Ideally with a wiki your page titles map fairly directly to the URL (replace spaces with + and leave the rest unchanged) in an easily readable form. I could simply use RFC 2396 URL encoding for Japanese and Chinese page names, but this leads to long, ugly and unreadable URLs. Confluence uses generated page numbers for non-ASCII page names while MediaWiki uses the unaltered (kana/kanji/hanzi) Japanese and Chinese characters in its URLs. Most popular Japanese sites still seem to be using romaji page names, perhaps because older browsers switch the input mode to romaji when you enter the URL field. I noticed this no longer happens with IE7.

What are you guys doing with your Japanese and Chinese web app URLs?

Replies to This Discussion

Permalink Reply by Scott Murphy on July 25, 2008 at 4:43pm

A year ago, I would have said go with unaltered Japanese characters for its SEO benefits but now I think it might be a matter of preference.

I experimented with Japanese url paths about a year ago after suspecting that having unaltered Japanese characters in the URL was helpful in yahoo ranking. Large players like Amazon Japan started including Japanese characters in their URL for search engine benefits about the same time. (here is the japanese article). At the time, I found that it generally helped with rankings.

As an example, if you searched 'ビリーズブートキャンプ' on yahoo last year, almost all the top 10 results were either Japanese domains or had url paths with unaltered Japanese characters. You do the same search today and wikipedia is the only domain that returns Japanese characters.

Things changed when yahoo Japan made an algo change (think it was the end of last year). They made the change probably because were too many MFA (made for adsense) and spammy sites that took advantage of having Japanese characters in the url path. (*Note this is all based on my experience and is speculation).

So in terms of a SEO advantage, in my opinion, having Japanese characters in the URL was effective. Today, it probably won't make that big of a difference.

That being said, I still use Japanese characters in my url at times. I recently developed a site using mediawiki and found that I really couldn't avoid it because it passes the title variable in the url.

However, for stuff that I build from scratch, most of the time I try to keep it my URLs as short, clean and readable so I end up using numbers. So www.something.com/category/12 instead of www.something.com/category/%E3%83%93%E3%83%87%E3%82%AA. I think this much easier on the eyes and less to work with if someone wants to link to my page.

Not sure if that helps at all but those are just my ideas on this subject.

Permalink Reply by Daniel Leuck on July 25, 2008 at 5:17pm

Hey Scott. Thank you for your detailed and well informed response. I've been asking Japanese and Chinese speakers here at MIC, and the consensus seems to be in line with your assertion that it is now a matter of preference. Given that human readable URLs are more in the spirit of wikis, I think I am going to use the Japanese and Chinese characters without URL encoding. The fact MediaWiki is taking this approach gives me confidence it is a reasonable choice.

Have a great weekend!

Scott Murphy said:

A year ago, I would have said go with unaltered Japanese characters for its SEO benefits but now I think it might be a matter of preference.

I experimented with Japanese url paths about a year ago after suspecting that having unaltered Japanese characters in the URL was helpful in yahoo ranking. Large players like Amazon Japan started including Japanese characters in their URL for search engine benefits about the same time. (here is the japanese article). At the time, I found that it generally helped with rankings.

As an example, if you searched 'ビリーズブートキャンプ' on yahoo last year, almost all the top 10 results were either Japanese domains or had url paths with unaltered Japanese characters. You do the same search today and wikipedia is the only domain that returns Japanese characters.

Things changed when yahoo Japan made an algo change (think it was the end of last year). They made the change probably because were too many MFA (made for adsense) and spammy sites that took advantage of having Japanese characters in the url path. (*Note this is all based on my experience and is speculation).

So in terms of a SEO advantage, in my opinion, having Japanese characters in the URL was effective. Today, it probably won't make that big of a difference.

That being said, I still use Japanese characters in my url at times. I recently developed a site using mediawiki and found that I really couldn't avoid it because it passes the title variable in the url.

However, for stuff that I build from scratch, most of the time I try to keep it my URLs as short, clean and readable so I end up using numbers. So www.something.com/category/12 instead of www.something.com/category/%E3%83%93%E3%83%87%E3%82%AA. I think this much easier on the eyes and less to work with if someone wants to link to my page.

Not sure if that helps at all but those are just my ideas on this subject.

Permalink Reply by Brooke Fujita on July 27, 2008 at 3:56am

Just to add a bit more to this thread, I took a look at the link from Scott, and then spent a few hours just googling a bit more and seeing what the consensus is like at present here in Japan vis a vis Japanese chars in the URL beyond just the domain names.

Keeping in mind that the algorithms being applied are all black box (if any of us knew for sure, then this thread wouldn't even be happening), a lot of people in Japan still feel that having Japanese chars in the URL per Amazon.co.jp does have some positive effect on the ranking engines. I don't do much in the way of SEO as most of my architecting and design is more centered on users getting a task done rather than attracting eyeballs, but the blogs and writings I've seen still seem to show that Scott's comment on SEO benefits is still widely held here in Japan.

But hey! I was surprised to see that Firefox 3 actually renders any UTF-8 escaped bits in the URL as regular chars in the browser's address and status bar. Kind of jarring in a way, but nice. Wonder of IE7 (I still don't want to upgrade) can do that?

Permalink Reply by Daniel Leuck on July 27, 2008 at 6:59pm

After a few days of research we have made our final call. We are going to use the native character sets rather than numbers, escape codes or a romanizing conversion scheme.

I was surprised to see that Firefox 3 actually renders any UTF-8 escaped bits in the URL as regular chars in the browser's address and status bar. Kind of jarring in a way, but nice.

We discovered this during our research. I really like FF3. Almost 20 years after the creation of the world wide web the clients are finally starting to play well with CJK.

Brooke: But hey! I was surprised to see that Firefox 3 actually renders any UTF-8 escaped bits in the URL as regular chars in the browser's address and status bar. Kind of jarring in a way, but nice. Wonder of IE7 (I still don't want to upgrade) can do that?

I recommend you upgrade. IE7 is much more standards compliant and it doesn't change your IME mode when you enter the address bar. When I am working on CSS and Javascript code (which is unfortunately every day thanks to our new ooi venture), I have very little trouble getting things to work the same in IE7 and Firefox 2/3. I cringe every time I have to test in IE6.

RSS

Welcome to
TechHui

Sign Up
or Sign In

Or sign in with:

Japanese & Chinese in URLs

Replies to This Discussion

Sponsors