Closed
Bug 608880
Opened 14 years ago
Closed 8 months ago
Reading style.left and style.top is slow compared to Chrome (Peacekeeper balls tests)
Categories
(Core :: DOM: CSS Object Model, defect, P5)
Tracking
()
RESOLVED
FIXED
People
(Reporter: developer, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(6 files, 2 obsolete files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
Build Identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101101 Firefox/4.0b8pre
I've created a small test case that creates 10 divs with style:
position: absolute
width: 32
height: 32
left: random
top: random
If I run it in Chrome7 and FF4, I get:
FF4: 4713ms
Chrome7: 894ms
Where lower is better.
This is what causes FF4 to be slower than Chrome in test 5 of peacekeeper.
Reproducible: Always
Steps to Reproduce:
1. Load the attached test case in FF4 and Chrome7
2. Notice that Chrome is much faster
3.
Actual Results:
Chrome is faster
Expected Results:
FF4 should be as fast or faster than Chrome
Also, this test case uses divs and doesn't try to animate them. In the real peacekeeper test, they are images (same styles) and their positions change.
Blocks: peacekeeper
![]() |
||
Comment 3•14 years ago
|
||
Hmm. Are you sure the initial test is gated on the reads?
In any case, profiling the attached testcase now.
Status: UNCONFIRMED → NEW
Ever confirmed: true
![]() |
||
Comment 4•14 years ago
|
||
OK, long story short:
2% jit-generated JS code
2% in js::GetPropertyWithNativeGetter
8% the .style getter (self time and JS-wrapping)
3% nsDOMCSSAttributeDeclaration::GetCSSDeclaration
46% nsString::AppendFloat (41% under Modified_cnvtf, of which 25% is calling
PR_dtoa and the rest is memory traffic; some AppendASCIItoUTF16).
25% Replace/ReplaceASCII called from nsCSSValue::ToString
and some minor stuff.
![]() |
||
Comment 5•14 years ago
|
||
Er, called from nsCSSValue::AppendToString.
Yeah, it was a little hard to imagine that it was the reads that caused the slowdown. But compare:
http://www.peacekeeper.therichins.net/test4b.html
http://www.peacekeeper.therichins.net/test4c.html
The only difference between the two files is test4b.html has:
var x1 = ball1.style.left.replace("px", "");
var y1 = ball1.style.top.replace("px", "");
var x2 = ball2.style.left.replace("px", "");
var y2 = ball2.style.top.replace("px", "");
while test4c.html has:
var x1 = this.positions[i].x;
var y1 = this.positions[i].y;
var x2 = this.positions[j].x;
var y2 = this.positions[j].y;
The functionality is the same otherwise. On my computer, my numbers are:
test4b.html
FF4: 27.504 fps
Chrome: 32.567 fps
test4c.html
FF4: 46.161 fps
Chrome: 46.032 fps
![]() |
||
Comment 7•14 years ago
|
||
Indeed. Thank you!
Mind attaching those here too, just so we don't depend on that server staying up? It'd be much appreciated.
If you're interested, by the way, the bugs this bug depends on are tracking the various issues that the profile shows; I didn't want to just cc you on them because I'm not sure whether you do or don't want the bugspam. If you generally do, let me know and I'll cc you from now on.
This file has the position, width, and height style lines commented out. It ran 732ms on my machine.
This is the same testcase but it just has the position style code commented out. I got 748ms.
Attachment #487797 -
Attachment description: Testcase with no position, width, height → Testcase with no position, top, left
Reporter | ||
Comment 10•14 years ago
|
||
I added some modified test cases to show what happens when some CSS styles are removed. The position: absolute adds a little time on, but it is the top/left that affect it the most.
Reporter | ||
Comment 11•14 years ago
|
||
Reporter | ||
Comment 12•14 years ago
|
||
![]() |
||
Comment 13•14 years ago
|
||
Yeah, on the "no position" test (which is really a no top/left test) we're 2x _faster_ than chrome. OK, good. So there's hope, if we fix the dependent bugs. ;)
Reporter | ||
Comment 14•14 years ago
|
||
I've added the html files, but they do require some other files (such as the image).
Also, is there a way to do profiling on Windows XP?
Attachment #487799 -
Attachment description: Testcase with no position → Testcase with no top/left
![]() |
||
Comment 15•14 years ago
|
||
> I've added the html files, but they do require some other file
For future reference, the best way to upload such things is to upload the image first, then change the HTML to point to the image attachment, then upload the HTML. It takes some more work, but given how much work you're clearly putting into these peacekeeper reductions already... :) (And I _really_ appreciate the reductions, by the way!)
For WinXP... Xperf doesn't do a great job, sadly. Maybe try https://developer.mozilla.org/Profiling_with_AMD_CodeAnalyst ?
Reporter | ||
Comment 16•14 years ago
|
||
image used in test4b.html and test4c.html
Reporter | ||
Comment 17•14 years ago
|
||
Changed to use attachments in bugzilla
Attachment #487800 -
Attachment is obsolete: true
Reporter | ||
Comment 18•14 years ago
|
||
Changed to use attachments in bugzilla
Attachment #487801 -
Attachment is obsolete: true
![]() |
||
Comment 19•13 years ago
|
||
Having done some benchmarking on the original test case today, the current status looks like (Xeon X5550 @ 2.67 GHz, 64-bit Linux, GCC 4.6.1)
Firefox: ~1980 ms
Chromium: ~400 ms
A perf report doesn't reveal anything especially interesting:
8.66% firefox libpthread-2.15.so [.] __pthread_mutex_unlock_usercnt
8.41% firefox libpthread-2.15.so [.] pthread_mutex_lock
7.95% firefox libnspr4.so [.] dtoa
3.68% firefox perf-10014.map [.] 0x7f597ca7e608
3.50% firefox firefox [.] realloc
3.04% firefox firefox [.] arena_dalloc
2.89% firefox libxul.so [.] nsAString_internal::ReplaceASCII(unsigned int, unsigned int, char const*, unsigned int)
2.54% firefox libxul.so [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLazyCallContext*, unsigned int*)
2.12% firefox libxul.so [.] nsIDOMElementCSSInlineStyle_GetStyle(JSContext*, JSObject*, long, JS::Value*)
2.03% firefox libxul.so [.] _ZL14Modified_cnvtfPciid.constprop.21
2.00% firefox libxul.so [.] nsCSSValue::AppendToString(nsCSSProperty, nsAString_internal&) const
2.00% firefox firefox [.] arena_malloc
1.69% firefox libc-2.15.so [.] __memcpy_ssse3_back
I assume the locking is some artifact of PR_dtoa and friends; it goes away in a profile with bug 608915 addressed.
![]() |
||
Comment 20•13 years ago
|
||
> A perf report doesn't reveal anything especially interesting:
Nathan, how did you measure the original testcase? It does a bunch of setup, then runs the actual timed part of the test....
If I profile the timed part on a current Mac build, I get 48% under string AppendFloat() and 15% under ReplaceASCII. Similar to comment 4.
Now that time is scattered over lots of different functions, which is why a non-hierarchical profile doesn't tell you much.
![]() |
||
Comment 21•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #20)
> If I profile the timed part on a current Mac build, I get 48% under string
> AppendFloat() and 15% under ReplaceASCII. Similar to comment 4.
This is probably what was screwing me up; I wasn't timing just that bit, but a lot of other stuff too (e.g. browser startup).
![]() |
||
Comment 22•13 years ago
|
||
With the patch in bug 608915 the time under AppendFloat drops to 30% and the time under ReplaceASCII goes up to 26%.
That ReplaceASCII call is from us appending "px" to the value, which triggers an extra realloc. I wonder whether it would make sense to preallocate a few extra chars there....
Note that as far as I can tell Chrome has a hashtable mapping their equivalent of CSSValue to serialized strings, so they only do the serialization once here and then they just hit the hashtable. I suppose we could do that too, if we really want to be fast here.
![]() |
||
Comment 23•13 years ago
|
||
For what it's worth, I can get about a 20% win by just doing a big enough SetCapacity call before the AppendFloat, as expected...
David, thoughts on comment 22?
![]() |
||
Comment 24•13 years ago
|
||
You might be able to squeeze a little extra out by making DoAppendFloat end like so:
int length = builder.position();
AppendASCII(builder.Finalize(), length);
Out of interest, how do you profile just the timed part?
![]() |
||
Comment 25•13 years ago
|
||
1) Get a box with OS X 10.6 on it
2) Make sure Shark is installed
3) Compile an opt build with --enable-shark
4) Add a startProfiling() call in the testcase before the start new Date call and
stopProfiling after the stop new Date call.
5) Put shark in "Programmatic (Remote)" mode.
6) Set the sampling interval to 20us.
7) Load the testcase.
Comment 26•13 years ago
|
||
You can manually click Start/Stop in Shark as well with Firefox already running and setup when you click start. xperf on Windows can do the same.
![]() |
||
Comment 27•11 years ago
|
||
Nightly (608915 fixed) / Chrome 30
Testcase - 1762ms /825ms
Testcase with no position, top, left - 310ms / 650ms
Testcase with no top/left - 330ms / 660ms
test4b.html - Peacekeeper original test - 30.0 fps / 35.6 fps
test4c.html - Peacekeeper test with modifications - 44.1 fps / 48.8 fps
![]() |
||
Comment 28•11 years ago
|
||
Note that WebKit just stores the strings for all their CSS stuff, instead of serializing on get, so they can do the get fast. On the other hand, they end up using a lot more memory for it....
![]() |
||
Comment 29•11 years ago
|
||
What's actually more interesting is whether the original test is still gated on this.
![]() |
||
Comment 30•11 years ago
|
||
For what it's worth, on the "test4b.html - Peacekeeper original test" I see us slightly faster than Chrome on Mac....
Comment 31•11 years ago
|
||
I'm getting results between 650ms and 800ms in Nightly 29.0a1 (2014-01-17) and about 500ms in Chrome 32 for the original testcase from comment 1.
Comment 32•7 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046
Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.
If you have questions, please contact :mdaly.
Priority: -- → P5
Updated•2 years ago
|
Severity: normal → S3
Comment 33•8 months ago
|
||
Nightly Vs Chrome :
testcase : 240ms / 920ms
Testcase with no position, top, left : 60ms / 170ms
Testcase with no top/left : 60ms / 170ms
test4b.html - Peacekeeper original test : 44FPS / 39FPS
test4c.html - Peacekeeper test with modifications : 47FPS / 47FPS
We are 3x faster in the microbenchmarks, and faster on the actual peacekeeper benchmark.
Calling this fixed.
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•