Got to spend some quality time with my testing 16 core today:
FJ hitting all 16 cores, memory utilization ~8GB = instant gate recalcs :)
while at it, we made some performance improvements:
This is the first time I'm showing a timing from FJ v10 (Java.) This version is currently in Alpha but we hope to have public beta soonish (Spring?)
Please note that this drastic change in performance in 7.6.4 and V10 will only be observed on:
- computers with more than 8 CPU's. I saw no change with 2/4 CPU models.
- .. "distributed engines" setting in performance preferences turned ON
- .. storage device fast enough to saturate all your CPU's with data
the main reason for this improvement was the architecture of FJ's distributed engines - they ran in series, now - parallel. Also, the improved timings show FJ as running in 64 bit mode. I am not yet sure if 7.6.4 will ship in 64 bit compatible mode, but 10 definitely will. 64bitness is NOT required for this improvement to work.
Maciej
Could you describe the test conditions that caused V10 to execute in 90 seconds or so. Were you processing a single file with 10 or 20 million events?
Posted by: Bob Zigon | January 25, 2011 at 10:11 AM
Hi Bob,
this test involved 282 small FCS files ( ~1GB total) being dumped into a template which calculated 24 gates on each file.
the data resided on a RAMdisk in all 4 tests, and the only changes in between executions were in our threading code.
Posted by: maciej simm | February 01, 2011 at 10:53 AM
I should mention.. the engine used by Chimera has been in development for a while, so the small improvement between 7.6.4 and V10 shows some of the optimization we've been putting in, that is separate from the new threading work.
Posted by: maciej simm | February 01, 2011 at 11:15 AM