Open Bug 608880 Opened 14 years ago Updated 2 years ago

Reading style.left and style.top is slow compared to Chrome (Peacekeeper balls tests)

Categories

(Core :: DOM: CSS Object Model, defect, P5)

x86
Windows XP
defect

Tracking

()

People

(Reporter: developer, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
Build Identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101101 Firefox/4.0b8pre

I've created a small test case that creates 10 divs with style:

position: absolute
width: 32
height: 32
left: random
top: random

If I run it in Chrome7 and FF4, I get:

FF4: 4713ms
Chrome7: 894ms

Where lower is better.

This is what causes FF4 to be slower than Chrome in test 5 of peacekeeper.

Reproducible: Always

Steps to Reproduce:
1. Load the attached test case in FF4 and Chrome7
2. Notice that Chrome is much faster
3.
Actual Results:  
Chrome is faster

Expected Results:  
FF4 should be as fast or faster than Chrome
Attached file Testcase
Also, this test case uses divs and doesn't try to animate them.  In the real peacekeeper test, they are images (same styles) and their positions change.
Blocks: peacekeeper
Hmm.  Are you sure the initial test is gated on the reads?

In any case, profiling the attached testcase now.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OK, long story short:

  2% jit-generated JS code
  2% in js::GetPropertyWithNativeGetter
  8% the .style getter (self time and JS-wrapping)
  3% nsDOMCSSAttributeDeclaration::GetCSSDeclaration
 46% nsString::AppendFloat (41% under Modified_cnvtf, of which 25% is calling
     PR_dtoa and the rest is memory traffic; some AppendASCIItoUTF16).
 25% Replace/ReplaceASCII called from nsCSSValue::ToString
  
and some minor stuff.
Er, called from nsCSSValue::AppendToString.
Depends on: 608914
Depends on: 608915
Yeah, it was a little hard to imagine that it was the reads that caused the slowdown.  But compare:

http://www.peacekeeper.therichins.net/test4b.html
http://www.peacekeeper.therichins.net/test4c.html

The only difference between the two files is test4b.html has:

var x1 = ball1.style.left.replace("px", "");
var y1 = ball1.style.top.replace("px", "");
var x2 = ball2.style.left.replace("px", "");
var y2 = ball2.style.top.replace("px", "");

while test4c.html has:

var x1 = this.positions[i].x;
var y1 = this.positions[i].y;
var x2 = this.positions[j].x;
var y2 = this.positions[j].y;

The functionality is the same otherwise.  On my computer, my numbers are:

test4b.html
FF4: 27.504 fps
Chrome: 32.567 fps

test4c.html
FF4: 46.161 fps
Chrome: 46.032 fps
Indeed.  Thank you!

Mind attaching those here too, just so we don't depend on that server staying up?  It'd be much appreciated.

If you're interested, by the way, the bugs this bug depends on are tracking the various issues that the profile shows; I didn't want to just cc you on them because I'm not sure whether you do or don't want the bugspam.  If you generally do, let me know and I'll cc you from now on.
This file has the position, width, and height style lines commented out.  It ran 732ms on my machine.
This is the same testcase but it just has the position style code commented out.  I got 748ms.
Attachment #487797 - Attachment description: Testcase with no position, width, height → Testcase with no position, top, left
I added some modified test cases to show what happens when some CSS styles are removed.  The position: absolute adds a little time on, but it is the top/left that affect it the most.
Yeah, on the "no position" test (which is really a no top/left test) we're 2x _faster_ than chrome.  OK, good.  So there's hope, if we fix the dependent bugs.  ;)
I've added the html files, but they do require some other files (such as the image).

Also, is there a way to do profiling on Windows XP?
Attachment #487799 - Attachment description: Testcase with no position → Testcase with no top/left
> I've added the html files, but they do require some other file

For future reference, the best way to upload such things is to upload the image first, then change the HTML to point to the image attachment, then upload the HTML.  It takes some more work, but given how much work you're clearly putting into these peacekeeper reductions already... :)  (And I _really_ appreciate the reductions, by the way!)

For WinXP... Xperf doesn't do a great job, sadly.  Maybe try https://developer.mozilla.org/Profiling_with_AMD_CodeAnalyst ?
Attached image ball.png
image used in test4b.html and test4c.html
Changed to use attachments in bugzilla
Attachment #487800 - Attachment is obsolete: true
Changed to use attachments in bugzilla
Attachment #487801 - Attachment is obsolete: true
Having done some benchmarking on the original test case today, the current status looks like (Xeon X5550 @ 2.67 GHz, 64-bit Linux, GCC 4.6.1)

Firefox: ~1980 ms
Chromium: ~400 ms

A perf report doesn't reveal anything especially interesting:

  8.66%  firefox  libpthread-2.15.so           [.] __pthread_mutex_unlock_usercnt
  8.41%  firefox  libpthread-2.15.so           [.] pthread_mutex_lock
  7.95%  firefox  libnspr4.so                  [.] dtoa
  3.68%  firefox  perf-10014.map               [.] 0x7f597ca7e608
  3.50%  firefox  firefox                      [.] realloc
  3.04%  firefox  firefox                      [.] arena_dalloc
  2.89%  firefox  libxul.so                    [.] nsAString_internal::ReplaceASCII(unsigned int, unsigned int, char const*, unsigned int)
  2.54%  firefox  libxul.so                    [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLazyCallContext*, unsigned int*)
  2.12%  firefox  libxul.so                    [.] nsIDOMElementCSSInlineStyle_GetStyle(JSContext*, JSObject*, long, JS::Value*)
  2.03%  firefox  libxul.so                    [.] _ZL14Modified_cnvtfPciid.constprop.21
  2.00%  firefox  libxul.so                    [.] nsCSSValue::AppendToString(nsCSSProperty, nsAString_internal&) const
  2.00%  firefox  firefox                      [.] arena_malloc
  1.69%  firefox  libc-2.15.so                 [.] __memcpy_ssse3_back

I assume the locking is some artifact of PR_dtoa and friends; it goes away in a profile with bug 608915 addressed.
> A perf report doesn't reveal anything especially interesting:

Nathan, how did you measure the original testcase?  It does a bunch of setup, then runs the actual timed part of the test....

If I profile the timed part on a current Mac build, I get 48% under string AppendFloat() and 15% under ReplaceASCII.  Similar to comment 4.

Now that time is scattered over lots of different functions, which is why a non-hierarchical profile doesn't tell you much.
(In reply to Boris Zbarsky (:bz) from comment #20)
> If I profile the timed part on a current Mac build, I get 48% under string
> AppendFloat() and 15% under ReplaceASCII.  Similar to comment 4.

This is probably what was screwing me up; I wasn't timing just that bit, but a lot of other stuff too (e.g. browser startup).
With the patch in bug 608915 the time under AppendFloat drops to 30% and the time under ReplaceASCII goes up to 26%.

That ReplaceASCII call is from us appending "px" to the value, which triggers an extra realloc.  I wonder whether it would make sense to preallocate a few extra chars there....

Note that as far as I can tell Chrome has a hashtable mapping their equivalent of CSSValue to serialized strings, so they only do the serialization once here and then they just hit the hashtable.  I suppose we could do that too, if we really want to be fast here.
For what it's worth, I can get about a 20% win by just doing a big enough SetCapacity call before the AppendFloat, as expected...

David, thoughts on comment 22?
You might be able to squeeze a little extra out by making DoAppendFloat end like so:

int length = builder.position();
AppendASCII(builder.Finalize(), length);

Out of interest, how do you profile just the timed part?
1)  Get a box with OS X 10.6 on it
2)  Make sure Shark is installed
3)  Compile an opt build with --enable-shark
4)  Add a startProfiling() call in the testcase before the start new Date call and
    stopProfiling after the stop new Date call.
5)  Put shark in "Programmatic (Remote)" mode.
6)  Set the sampling interval to 20us.
7)  Load the testcase.
You can manually click Start/Stop in Shark as well with Firefox already running and setup when you click start. xperf on Windows can do the same.
Depends on: 920209
Blocks: 920659
Depends on: 934544
Nightly (608915 fixed) / Chrome 30
Testcase - 1762ms /825ms
Testcase with no position, top, left - 310ms / 650ms
Testcase with no top/left - 330ms / 660ms
test4b.html - Peacekeeper original test - 30.0 fps / 35.6 fps
test4c.html - Peacekeeper test with modifications - 44.1 fps / 48.8 fps
Note that WebKit just stores the strings for all their CSS stuff, instead of serializing on get, so they can do the get fast.  On the other hand, they end up using a lot more memory for it....
What's actually more interesting is whether the original test is still gated on this.
For what it's worth, on the "test4b.html - Peacekeeper original test" I see us slightly faster than Chrome on Mac....
I'm getting results between 650ms and 800ms in Nightly 29.0a1 (2014-01-17) and about 500ms in Chrome 32 for the original testcase from comment 1.
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: