Amarok2中的MySQL-真相(渣翻譯)

KDE 是一個強大的圖形桌面環境,各項關於 KDE 使用上的問題或討論歡迎在此提出。

版主: AceLan, Franklin

Amarok2中的MySQL-真相(渣翻譯)

文章訪客 » 週五 8月 27, 2010 10:26 pm

http://csslayer.tk/wordpress/kde/bad-tr ... e-reality/

原文連結: http://amarok.kde.org/blog/archives/812 ... ality.html

08年的老文,看有人聊到 KDE 裡面的 MySQL 用的很多時搜索了一下,結果發現了這篇文章。讀完之後,突然想翻譯一下了。看到最後感覺感觸頗深啊,真的,開發過一些小玩意,面對著或多或少的真正用戶,心有戚戚焉,雖然這麼講也許有點妄自尊大,可是還是忍不住,想要做出更好的東西給大家……翻譯完了之後略修改了一下,結果發現有很多看起來很囧的錯誤……有不少地方是意譯,歡迎指出錯誤。每段都附上原文對照。

最近關於 Amarok 將 MySQL 作為唯一 SQL 後端這件事上有許多討論。相當的一部分是對於不確定的恐懼——有些僅僅是人們對於改變的抗拒,有些 僅僅是不理解這個決定。有些討論(特別是 Adriaan 的博文)很有意思,並且有著深刻見解,但是卻忽略了為什麼做出了這些改變。本文嘗試去解釋為什麼做出了這個決定,這個決定對於一個最終用戶意味著什麼,為什麼你應該喝口茶歇一歇。

There has been a lot of chatter lately regarding Amarok’s switch to MySQL as its only SQL backend. A decent amount is FUD — either by people simply pushing back against change, or by people that simply don’t understand the decision. Some of it (particularly Adriaan’s blog post) has been insightful and interesting, but miss the mark in terms of why this change was made. This post attempts to explain why this decision was made, what it really means for you the end-user, and why you should have a cup of tea and relax.

我想首先指出的是,我說過 MySQL 將會成為 Amarok 的唯一 SQL 後端。Amarok 2 的收藏系統很強大。看看有多少各種各樣的音樂源吧,來自Shoutcast的、Jamendo的、Magnatune 的、Ampache 的、MP3Tunes 的,還有來自本地的,比如來自 iPod和本地的檔案系統,它們在 Amarok 2 當中都被同等對待了。Amarok1的收藏僅僅是個收藏,並且被它自己所聲明的功能限制了(當然,他能提 供它自己的特定功能)。雖然我不這麼想,但是同時還有一個基於 Nepomuk 的收藏選項沒有被啟用。(注:這個i don’t think看的我一頭霧水)所以振作一點,這個變化只是影響 Amarok 的內部 SQL 收藏,但不包括其他來源(儘管如果那些來源樂意,他們也可以把資訊存儲在 SQL 資料庫中)。

I want to point out first that I said that MySQL is going to be Amarok’s only SQL backend. A2′s collection system is very powerful. Just take a look at how varied music sources from Shoutcast, Jamendo, Magnatune, Ampache, MP3Tunes, as well as local sources like iPods and your local file system, are treated as equals in A2. A collection is a collection, and is limited only by what capabilities it advertises it can support (and of course, it can supply its own custom capabilities). It’s not currently enabled, I don’t think, but there’s a Nepomuk-based collection option too. So take heart — this change only affects Amarok’s internal SQL collection, and not other sources (although those sources can store information in the SQL database if they wish to cache information).

由於我提到了 Nepomuk,我們也來談談另一個普遍的問題/需求/抱怨:KDE 已經有了很好的 Strigi-Nepomuk 在工作,為什麼我們不用它來搜索音樂並且存儲資訊?這 有很多原因。第一個原因是 Strigi 和 Nepomuk 是可選的,並非必要的。(更新:Strigi 是必要的,但Soprano不是,所以Nepomuk 從整體上來說還是可選的。)我們不能指望用戶安裝了這些,而且就算他們安裝了,我們不能指望使用者正確地配置了他們(注意由於我們的計畫是跨平臺,這使得這 件事更不大可能了)。第二個原因是速度:Amarok 的自訂收藏掃描的速度非常快,並且利用 taglib 收集特定的資訊。和 Strigi 比較起來,Strigi 的速度就太慢了(它需要計算所有檔的雜湊值,這意味著它將會讀取整個檔),並且取得較少的資訊。(更新:照Strigi開發者的說法,而不是在kde-apps.org, 維基百科,甚至在作者自己主頁上的描述,Strigi 預設並不計算檔的雜湊值。這使得對於 Strigi 來說,如果配置得當的話,和Amarok的內置掃描相比能夠一樣快。儘管我不知道 它是否將所有需要的資訊都提取出來。但如果它配置成需要計算所有檔的SHA1雜湊值,那麼它肯定會非常慢。)在本地硬碟上,這並不是一個大問題,但是當 你將基於網路的存儲納入考慮範圍內的時候,這就是個大問題了,而且這是一個非常普通的場景。同時儘管我不記得細節了,但我也聽說,和一般的 SQL 資料庫相 比,對Nepomuk的查詢是很慢的。同時記得當基於 Nepomuk 的收藏完成的時候,來源於基於 Nepomuk 的收藏將會把它們的中繼資料修改存回 Nepomuk 當中。所以 SQL 收藏並不是Nepomuk的替代——它們是完全獨立的。(更新:我忘記提到基於Nepomuk的收藏已經存在了。它已經由 GSoCer 在暑假開發完成。我不確定它的狀態是否可以趕上2.0的發佈,但是我們這些 Amaroker 都很喜歡 Strigi/Nepomuk,並且對於 打開應用程式之後所有音樂都正確可用,並且不需要任何預先配置這個點子感到興奮。但是 SQL 收藏也確實佔有一席之地。正如我說的:它們都是很好的技術。)

Since I mentioned Nepomuk, it’s time to discuss another common question/demand/complaint: KDE has this nice Strigi-Nepomuk thing going on…why aren’t we using it for scanning music and storing information? There are a couple main reasons. The first is that Strigi and Nepomuk are optional, not required. (Update: Strigi is required, but Soprano isn’t, so Nepomuk as a whole is still optional.) We can’t rely on the user installing them, and even if they are installed, we can’t rely on the user to configure them properly (remember that we’re going cross-platform, making it even less likely). The second reason is speed: Amarok’s custom collection scanner is extremely fast and pulls out specific pieces of information with TagLib. Strigi is, by comparison, very slow (it calculates hashes of all files, which means it needs to read the entire file) and pulls out less information. (Update: According to the Strigi developer, and despite what is said on kde-apps.org, Wikipedia, and even the author’s own home page, it does not calculate hashes by default. So it’s possible that Strigi, if properly configured, could be as fast as Amarok’s internal scanner, although whether it would pull out all necessary information, I don’t know. If it’s configured to calculate SHA1 hashes of all files, then it will indeed be far slower.) On a local hard drive, it may not be a big issue, but it sure is a huge issue when you throw networked storage into the picture, which is a very common scenario. I’ve also heard, though don’t remember specifics, that querying and such through Nepomuk is rather slow, compared to a normal SQL database. Regardless, though, remember that when the Nepomuk-based collection is finished, tracks sourced through a Nepomuk-based collection will have their metadata changes saved back to Nepomuk. So, it’s not that the SQL collection is in place of Nepomuk — they are entirely independent. (Update: I forgot to mention that a Nepomuk collection already exists. It was developed by a GSoCer over the summer. I’m not sure what its status is as far as making the 2.0 release, but we Amarokers both like Strigi/Nepomuk and are excited about the idea of opening up the app and having all your music available right then and there with no pre-configuration. But there is a place for the SQL collection too. As I said: they are complimentary technologies.)

我們已經跑題了,現在回到正題上。

With those topics out of the way, on to the meat.

首先,理解兩個關鍵的事實是很重要的。第一,我們不是搞資料庫的。是的,我們可以把資料存進資料庫,並且或多或少的提供一個工作的模式(注:Schema,因為平日和別人討論都用這個詞的英文(注之注:我是搞資料庫的)……我一直糾結於要不要直接用Schema,後來還是用了模式這個詞),但是我們當中沒有一個是專家/天才/絕地武士/等等。這導致了第二個事實:維護三個資料庫快把我們整瘋了。 每次一個很小的模式變動都需要對三類資料庫進行編碼。對於模式的修改對於一個資料庫可能就是些瑣碎的小事,對於另一個就可能非常困難(甚至不可能)。人們 會回報那些我們無法重複的 Bug,最後僅僅發現因為我們不能理解這些資料庫是如何具體工作的(或者在另一些情況中,沒有任何一個活躍的開發者使用那個類型 的資料庫)。還有許多類似的事情。所以在 Amarok 2 開發之初(在 Amarok 1 開發時的幻想中),我們就知道我們只想要一個資料庫。

First, it is important to understand an important pair of facts. Number one: we are not database guys. Sure, we can store data in them, and more or less come up with a working schema, but none of us are gurus/wizards/jedis/etc. This leads in to number two: maintaining three databases was driving us crazy. Every time a minor schema change was needed, it had to be coded up for all three types of databases. Modifying a schema could be trivial for one database type, and super difficult (or impossible) for another. People would report bugs that we couldn’t reproduce, only to find out that it was because we didn’t quite understand how one database or another behaved (or in some cases, none of the active devs were using that type). And so on. So from the beginning of A2 development (and in our fantasies during A1 development) we knew we wanted just one database.

(我們確實看了那些抽象層,例如 QtSQ L等等。我不想過多的評論他們,因為我並沒有對他們進行評估,但是總的說來如果不進行一些特定的 SQL 編程,他們並不靈活到足以應付我們所有需求(特別是在一些修改模式的任務上),這導致我們沒有採用他們。如果你確實想知道更多資訊或者堅持認為他們足夠應付任務,問問eean,我想他對他們做了評估。)

(We did actually look at abstraction layers like QtSQL and others. I’m not going to comment on them much, as I didn’t do the evaluation, but in general they were found to not be flexible enough to handle all of our needs without doing some custom SQL coding (especially in the cases of things like schema changes), which kind of defeats the point. If you want to know more/want to insist that they are, try asking eean, as I think he did the evaluations.)

現在我們必須選擇一個資料庫。第一眼看來,SQLite 是個不錯的選擇。使用事務的話,他的速度很快。十分穩定(那些抱怨奇怪的 MySQ L的 bug 的人應該和markey談談,他是1.4的SQLite後端維護者,可以證實SQLite有他的一席之地)。但是還是有一些問題把它踢出局了。第一個問題 是性能。儘管對只有一個小收藏的人來說它工作的很好,有著大量收藏的的人在換到MySQL或者PostgreSQL之後彙報了他們在進行複雜或者很多查詢 的時候獲得了很大的速度提升,例如把很多項目加入播放清單的時候,掃描檔的時候,以及對收藏進行搜索/過濾的時候。由於我們想要調和那些有著大收藏和小收藏的人的需求,並且數位音樂收藏的規模並沒有縮小的趨勢,有著大收藏的人的數量增長速度是很重要的。許多我們的開發者,在切換到 mysqle(我們這麼叫它,儘管這不是官方名稱)之後,在每日的 Amarok 2 使用中,注意到了巨大的速度提升,因此嵌入式伺服器和一般的伺服器相比也能帶來速度提升。這是對 SQLite 的第一拳。

Now we had to choose the type. At first, SQLite seemed like a good choice. Using transactions, it’s decently fast. It’s pretty stable (those that complain about odd MySQL bugs should talk to markey, as he, being the SQLite maintainer in 1.4, can attest that SQLite’s had its fair share). However, there were a few problems that in the end knocked it out of the running. The first problem is performance. Although for people with small collections it performs fairly well, people with large collections that switched to the MySQL or PostgreSQL backends in A1 would report enormous speed gains when operations performing complex or many queries were performed, such as adding many entries to the playlist, scanning files, or filtering/searching in the collection. Since we want to accommodate users with large collections just as well as those with smaller collections, and since digital music collections aren’t getting smaller, the speed increase for our users with large collections was quite important. Many of our developers, after the switch to mysqle (as we call it, though that’s not the official name), have noticed huge speed increases in their day-to-day use of A2, so that speed increase is carrying through to the embedded server as well as the normal server. That was the first knock against SQLite.

另一個使我們不採用 SQLite 的原因是完全不同的。許多用戶(包括我自己在內)有許多電腦,但是卻只有一個 Amarok 資料庫。假定所有電腦都可 以通過一個掛載地點訪問音樂(其他東西也配置正確),這使得你可以只掃描一次,但能在其他所有地方播放,無論在那裡播放音樂都可以對同一個評分進行更新, 還有其他很多事。就算你並不在多台電腦間共用資料庫,許多使用者由於速度,安全性,和備份的原因,也想要把資料庫存儲在一個特定的伺服器上。如果你認為這並 不是一個通常的用例,那麼你就錯了。MySQL 和PostgreSQL 對於這樣的工作量工作的很好。對於 SQLite 就行不通了,因為它是為了一個不同的 目的而設計的。因此SQLite被兩擊直拳重擊,KO。

The other blow for SQLite came for a totally different reason. Many users (myself included) have multiple computers sharing a single Amarok database. Assuming all the computers have access to the music at the same mount point (and a few other things are configured right), this allows you to scan once, play everywhere, update the same ratings no matter where you play it, and more. Even if your aren’t sharing the database among multiple computers, many users want their database stored on a particular server for speed, security, or backup reasons. If you think either of these isn’t a common use-case, you’d be quite wrong. MySQL and PostrgreSQL were quite happy with this workload. It’s a total no-go for SQLite, simply because it’s designed for a different purpose. So SQLite had two big knocks against it. K.O.

但是,正如我們不能指望用戶正確的設置了 Strigi/Nepomuk 那樣,我們不能指望他們設置好了 MySQL 和 PostgreSQL 的資料庫的表。因此我們需要資料庫能夠嵌入式的運作,這樣它就能夠在不做任何其他設置的情況下工作了。MySQL,和libmysqld,在4.1系列當中有著初步 的支持,在5.0當中它就工作的很好了,並且在5.1中就被完全支持(據我所知)。PostgreSQL,相比較而已,沒有這種功能。(但他有一個他們自 己的有趣的嵌入式 SQL 的概念。更新:顯然這現在成為了 SQL 標準的一部分。還是很酷。還是和我們認為的我們剛才談到的嵌入式伺服器不是一個東西。)

However, just as we can’t rely on the user to set up Strigi/Nepomuk correctly, we can’t rely on them to get their tables set up in MySQL or PostgreSQL. So we needed the database to be embeddable, so that it could just work for the user without any setup necessary on their part. MySQL, with libmysqld, had the seeds of this in the 4.1 series, it works decently in 5.0, and it’s becoming fully supported (AFAIK) in 5.1. PostgreSQL, on the other hand, does not have any such thing. (They have an interesting and cool concept of their own of embedded SQL though. Update: apparently that is part of the SQL standard. Still pretty cool. Still totally different from what we mean when we are talking about an embedded server.)

這使我們只剩下一種選擇——正如你所猜想的那樣 —— MySQL。它也許不是某些人最喜歡的資料庫(儘管它是大多數人的),而且我並不瞭解究竟在運行嵌入式模式的時候多少真正的額 外開銷,但它確實完全符合要求。它同時可以按嵌入式在本地運行和或者用獨立模式在其他的機器上運行(是的,這還沒有在 Amarok 2 當中被支持,但這會被 支持的)。對於大收藏來說,它的速度很快,也很健壯。它被開發組的人員充分瞭解。最重要的,它可以作為唯一的後端解決我們所有的需求。

So this leaves us with — as you guessed — MySQL. It may not be any particular person’s favorite database (although it is for plenty), and I don’t know how much overhead it really has in embedded form, but it fit the bill. It’s both embeddable and can run standalone on the local or a separate machine (yes, this is not supported yet in A2, but it will be). It is fast and robust for large collections. It is well understood by the development team. And most of all, it is a single-backend solution that fills all of our needs.

如果你還是對於我們決定感到不愉快,我得說聲抱歉。我們嘗試去滿足大多數人,但是沒法滿足所有人。但是是我們在開發並支持它,所以我們才基於我們開發者的需求,和來自成千上萬在過去幾年和我們溝通聯繫的用戶共同的現實用例做出了這個決定。請記得就算大部分在 Dot 上的留言,或者在這篇文章後面的評論(例如很多突然出現的回應)是來自於對這個決定不滿的人,這個決定事實上還是適合大多數人的,和我們其他選擇相比,它是對於大多數我們的用戶來說更好的決定。

If you’re still unhappy about our decision, I’m sorry. We try to please most and can’t please everyone. But we’re the ones that develop and support this thing, and so we made a decision based both upon our needs as developers and the real-world use-cases from the collective feedback of thousands of users that have contacted us over the last few years. Please remember that even if most of the comments on the Dot, or to this post, (i.e. much of the sudden visible feedback) are from people that are unhappy with our decision, it is a decision that will actually suit the vast, vast majority of our users better than the other options we currently have.

我們這個項目眾所周知對用戶很友好——我們傾聽他們,儘量實現他們想要的特性,儘量回應與支持。這是使我們走到今天的一個因素。所以拜託,親愛的讀 者們——請相信我們。這並不是一個簡單的決定——我們討論過,爭論過,扔過東西,和解了,我們在和解後進行了一兩次狂歡——但最終我們全體都認為這是一條 正確的路,並且我們認為,終究,這將使得 Amarok 更好。希望你們也這麼認為。

We’re a project that is known for being good to our users — we listen to them, we try to implement features they want, try to be responsive with support. It’s one of the things that got us where we are today. So please, dear readers — put some faith in us. This has not been an easy decision — we’ve discussed, we’ve argued, we’ve thrown things, we’ve made up, we’ve had an after-the-make-up orgy or two — but in the end it’s what we collectively felt was the right way to go, and we feel that, in the long run, it will make Amarok even mores awesomer. Hopefully you’ll feel that way too.
訪客
 

回到 KDE 一般討論

誰在線上

正在瀏覽這個版面的使用者:沒有註冊會員 和 1 位訪客